5.14. SIP-WebRTC Gateway

WebRTC is a relatively new protocol suite added to the VoIP technology that makes a telephone out of every capable web browser. As a result, users can click-to-dial a company representative, easily access video-telephony from within other web applications and receive calls from any web-browser, be it on their PC, smartphone or Internet café.

All of that while enjoying confidentiality widely available to consumers as never before in telephony’s history. Both analog and digital telephony were inherently insecure, mobile telephony secured at least the wireless hop, yet rather weekly. SIP’s security protocols, PGP, S/MIME and Identity (RFC 4474) desperately failed to be adopted. With WebRTC, we have proven web-based crypographic protocols that just work!

The key missing piece for connecting Web clients to the SIP telephony is a SIP-WebRTC gateway – see the left-most element in the Figure Integration of RTC, SIP and PSTN Networks using the RTC Gateway. The gateway connects the populations of web users, SIP telephony users and traditional telephony users behind PSTN gateways. The gateway also provides a practical and yet fairly secure communication model: on the “internal” SIP-based side of the gateway, traditional IT practices for securing controlled networks can be used, while on the public Internet facing side proven cryptographic protocols are used. That is where the ABC SBC comes in: its border control instruments in combination with built-in RTC gateway allow to form a viable security model.


Figure 1: Integration of RTC, SIP and PSTN Networks using the RTC Gateway

The gateway anchors signaling and media and performs translation between different standards for WebRTC and traditional VoIP, particularly security, codecs and signaling protocols as shown in Figure WebRTC Gateway Protocol Stack.


Figure 2: WebRTC Gateway Protocol Stack

Integrating a gateway in a SIP network is fortunately straight-forward. When a SIP-WebRTC gateway is installed and configured to connect to an existing SIP services (PBX, public SIP service), WebRTC clients can immediately reach and be reached from the SIP service. The existing SIP service does not need to be modified at all – it treats WebRTC traffic from behind the gateway as regular SIP traffic.

The rest of this section is split in the following parts: brief introduction to the WebRTC protocols and network architecture is given in Section WebRTC Network Architecture and Protocols. Configuration of the gateway is explained in subsequent sections: WebRTC Network Configuration, WebRTC Credentials Configuration, and WebRTC Rules Configuration. Eventually we provide guidelines for starting an RTC gateway using the Amazon Elastic Cloud services in Section Amazon Elastic Cloud Configuration Cookbook. We offer several methods using either predefined configurations or using manual configuration, and starting a single gateway or a whole failsafe cluster. We also provide recommendations for starting a geographically-dispersed service.

If you plan to start the RTC gateway service in front of an existing SIP service rapidly, best proceed directly to the Section Amazon Elastic Cloud Configuration Cookbook.

5.14.1. WebRTC Network Architecture and Protocols

The WebRTC protocol suite for telephony specifies use of the following protocols:

  • G.711 and OPUS (RFC 6716) for audio codecs. Opus is a lossy compression, low-delay, codec with constant and variable bitrate ranging from 6kbps to 510 kbps. G.711 is legacy PSTN audio codec at 64 kbps.
  • VP8 (RFC 6386) for video codec. VP8 is an irrevocably royalty-free codec.
  • SRTP (RFC 3711) for secure real-time media transmission.
  • DTLS (RFC 4347) for keying. - SIP over Websockets (RFC 7118) as one of possible protocols for signaling. It is slightly aligned SIP using websockets as transport. It is particularly easy to translate to and from legacy SIP.
  • ICE (RFC 5242), STUN (RFC 5389) and TURN (RFC 6062) for NAT traversal. STUN is a probing protocol that allows clients to detect how it is reachable over NATs. TURN is a STUN-based protocol that allows a client behind NAT to allocate a publicly reachable IP address from a server and tunnel traffic from and to it. ICE is methodology for finding the best combination of IP addresses to communicate between clients.

At the time of publication of this handbook, Firefox (version 23 and above) Chrome (version 28 and above), Opera (version 20 and above) and Safari (Preview, June 1017) were supporting this protocol stack and have demonstrated mutual inteoperability. Several Javascript applications [1] emerged that implemented signaling using SIP over websockets.

In the simplest scenario, two browsers can use the protocol stack to interconnect with each other. Most of this document is however concerned with the case when one party is using a WebRTC capable browser, and the other party is using a SIP phone or a PSTN phone behind a SIP gateway. This is the most complicated and also critical scenario because it connects the web telephony users to existing population of SIP users. The key component in this scenario is WebRTC-to-SIP gateway which translates signaling and media between the WebRTC and non-WebRTC SIP protocol stacks.

The WebRTC clients use the protocol stack is shown in Figure RTCWeb Protocol Flows. Initially the client registers itself to become reachable for incoming calls. It does so by sending a SIP REGISTER message over websockets. It is that simple.


Figure 3: RTCWeb Protocol Flows

When the browser user wants to make a call, it is a more complicated process. The browser starts the ICE process in which it learns IP addresses under which it can be reached. The IP addresses include the WebRTC client’s own, its own as seen on the Internet and learned using the STUN protocol, or even a completely different IP address belonging to a TURN media-relay. When the browser initiates SIP signaling, it offers all IP addresses learned in the previous phase. After the called party answers the call, the client probes the IP addresses against the caller to chose the IP address with best IP connectivity. When the “best” IP address is chosen, an encryption session-key is generated using DTLS and media is exchanged using SRTP.

The actual media call-flow can vary depending on how the WebRTC application is configured and the actual call-by-call result of ICE connectivity checks. In a typical scenario deploying frafos gateway, media is sent directly between the WebRTC client and the gateway. This is shown in the Figure RTCWeb Protocol Flows as the green dashed-dotted line. However, the WebRTC application can be also configured to communicate using a TURN server which introduces another hop to the media path. That’s the dashed green line in the Figure. It can be for example useful if one wishes to relay media using TCP protocol. It can also occur that both call parties are WebRTC clients on the same subnet and media can flow the shortest-path between them – shown as solid line in the Figure.

However, in scenarios using the gateway the most practical client configuration choice is to limit ICE process to its own IP address. That elimininates gathering the STUN and TURN choices and greatly reduces “post-pickup delay”, i.e. the period of time betwen when the caller answers and media can be actually heard and seen.

5.14.2. WebRTC Network Configuration

This subsection is about what components must be placed in the network and how they must be configured to enable working WebRTC call-flows. First, the following planning questions must be answered:

  • do you want to enable NAT/firewall traversal using media over TCP? This may increase the NAT/firewall traversal success rate. If so an additional TURN server [2] must be introduced.
  • which client do you want to use? The RTC-capable Web-browser alone includes the RTC engine but still needs an application that uses it. There are various commercial and open-source projects implementing the VoIP functionality, such as JSSIP. The ABC SBC comes with a JSSIP-based application for demonstration purposes. Note however, that frafos does not support third-party client software. Keep in mind that the Javascript code offered to WebRTC clients must include proper configuration of TURN and STUN servers.
  • do you want to integrate the gateway functionality in an SBC or run it on a dedicated server? We suggest to use a dedicated server unless you have a good reason for tight integration. With a dedicated server, it is easy to discriminate WebRTC-to-WebRTC calls from WebRTC-to-RTC, apply different security logic to WebRTC clients, and avoid interference with legacy-SIP configuration.
  • under which IP address and port number will be the websocket interface available? To enable websocket communication, you must configure an SBC interface and create a Call Agent linked to the interface. The interface configuration dialog is shown in Figure Websocket Interface Configuration. The most important element is “Interface type” which must be set to “websocket signaling”. The Call Agent configuration is shown in Figure Websocket Call Agent Configuration. By setting its interface to the previously created websocket interface and its IP address to “All” (, it captures every WebRTC clients communicating with the ABC SBC using websockets.

Figure 4: Websocket Interface Configuration


Figure 5: Websocket Call Agent Configuration

5.14.3. WebRTC Credentials Configuration

Confidentiality of calls by encryption is one of the major WebRTC features. Fortunately, it is rather easy to configure. In the simplest case only one configuration option needs to be turned on: “Config‣Global Config‣SRTP‣Enabled DTLS-SRTP”. All other configuration options are optional. Such configuration is shown in Figure SRTP Configuration Page.


Figure 6: SRTP Configuration Page

When no further options are selected, the ABC SBC creates ad-hoc self-signed credentials. A particular advantage of these is the length of resulting DTLS-SRTP packets will be bellow 1500-bytes packet length which is almost always certain to traverse networks without IP fragmentation.

If you prefer your own certificates, you must upload them using the “SSL certificate” and “SSL private Key” configuration options. If you additionally enable the “mandate client certificate” configuration choice, no TLS channel will be opened to clients without a certificate. Optionally you may also upload the “Trusted CA certificate”. If uploaded, TLS channel will be only opened to clients who present themselves using a listed CA. The channel will not be established to clients with an unlisted or without an CA.

Note that some credentials may result in too long DTLS-SRTP packets. If they exceed the length of 1500 bytes, they will be most likely fragmented and may result in failure to set up media channel. This is almost certain if there are NATs along the communication path.

5.14.4. WebRTC Rules Configuration

The configuration of the rules for SIP-WebRTC gateway must address both generic SIP processing aspects, which is routing and NAT travesal, and then specific aspects of WebRTC interworking.

In this configuration example we assume topology shown in Figure RTCWeb Protocol Flows, two types of calls: WebRTC-to-RTC and RTC-to-WebRTC, and media flowing through the ABC SBC along the dash-dotted green line.

The SIP routing flow is rather simple in this scenario: every call coming from the WebRTC Call Agent (i.e. over the websocket interface) will be routed to a SIP PBX, and reversely every call coming from the PBX will be routed to RTC browsers using websockets. The routing configuration is shown in Figure SIP-WebRTC Gateway Routing Rules.


Figure 7: SIP-WebRTC Gateway Routing Rules

The task of A and C rules is to anchor media to itself and to determine when to convert calls from RTC to SIP and vice versa. Therefore we create two realms: one for RTC clients and one for SIP clients. For each of them, we create one Call Agent that captures all traffic from/to any IP address flowing through the websocket and SIP interface respectively. The actions are configured to accomodate the following policies :

Realm Direction Policy (Actions)
RTC A-rules
  • enforce frequent re-REGISTERs to keep persistent TCP connections for websockets alive (REGISTER throttling)
  • cache registrations to forward SIP calls for RTC clients properly (Enable REGISTER caching)
  • fix NAT bindings (Enable Dialog handling)
  • anchor media, offer ICE and RTC Feedback to RTC clients (Enable RTP Anchoring)
RTC C-rules
  • anchor media (Enable RTP Anchoring)
  • enforce SRTP using DTLS keying (Force RTP/SRTP)
SIP A-rules
  • lookup registered RTC user, decline the call if offline (Reply to request with reason and code, Retarget R-URI from cache (alias)))
  • anchor media, don’t offer ICE to SIP callers (Enable RTP Anchoring)
SIP C-rules
  • anchor media (Enable RTP Anchoring)
  • enforce plain RTP on the way to the SIP Call Agent Force RTP/SRTP

We have met most of the rules in previous sections: driving re-registrations high to keep transport-layer connections alive, caching registrations, fix bindings and anchor media. Now we need to include the specifics of SIP and RTC interworking. SIP calls towards RTC clients must appear RTC-capable, i.e. they must offer SRTP encryption, ICE connectivity checks and RTC feedfack. Reversely, the RTC calls to SIP must be transformed to plain RTC.

The “Force RTP/SRTP” action determines if plain RTP or SRTP is used for a call. When this action is placed in C-rules, it converts media for the called party into the enforced protcol. When SRTP is chosen, one must set an additional option: the keying protocol. Only DTLS makes sense for RTC. In our example we convert all media traffic towards SIP devices by placing “Force RTP” in SIP realm’s C-rules. Analogically we convert all media traffic towards RTC clients by placing “Force SRTP” in RTC realm’s C-rules. The “Force SRTP” action is using “DTLS” as the keying option because that’s the keying protocol standardized for use with RTC.

One could also use the “Force RTP/SRTP” action in A-rules: here however it only determines if the caller’s SDP offer complies to the enforced preference and rejects the call otherwise. We are not using this kind of admission policy in our example.

The other options specific to the RTC interworking use-case are specific to how we anchor media. We need to make sure that RTC clients relying on ICE will receive proper STUN answers for their connectivity checks towards the built-in media relay and also RTC feedback. Therefore, we turn the options “offer ICE” and “offer RTCP feedback” on in the media anchoring action for both RTC A-rules and C-rules. The A-rules make sure that incoming RTC call offers obtain ICE and RTC/F capable answers, the C-rules ensure that SDP offers towards the RTC clients will be also ICE and RTC/F capable.

The resulting configuration is shown in Figures Configuration of RTCWeb Rules for RTC Realm and Configuration of RTCWeb Rules for SIP Realm for the RTC and SIP realm respectively.


Figure 8: Configuration of RTCWeb Rules for RTC Realm


Figure 9: Configuration of RTCWeb Rules for SIP Realm

Note that this configuration works even if two WebRTC clients connect to each other through the gateway. However the WebRTC-to-RTC conversion and forwarding to the SBC still takes place resulting in an WebRTC-to-RTC-to-WebRTC loop,as shown in Figure The WebRTC-to-WebRTC Lopback.


Figure 10: The WebRTC-to-WebRTC Lopback

Optionally it may be useful to manage codec negotiation. For example, one could blacklist G.711 in favor of OPUS, if there are SIP clients that can speak the codec. Or video could be stripped off, if there is no support for royalty-free VP8 codec. Note though that if codecs are stripped too agressively, a SIP user agent may fail to interoperate and return a 488 in UAS or an immediate BYE in UAC role.

5.14.5. WebRTC Interoperability Recommendations

The WebRTC standard and implementations are relatively new and as result degree of interworking largely depends on network configuration and used client. Unfortunately interoperability is still changing with every new version of WebRTC stack and the clients built upon it.

Network complications typically arise when there is a “middlebox”, an Application Layer Gateway (ALG) or an HTTP proxy in the path. This sort of network equipment manipulates HTTP traffic in a way that may impair interoperability. If the middlebox cannot handle the websocket extention of the HTTP protocol, signaling connection will fail. Therefore the default transport protocol for SIPoWebsockets is TLS.

WebRTC application complications typically arise when the application has imperfect support for the SIP protocol running on top of websockets, and/or changes its behaviour with a new software version. *We urge our customers to text extensivelly the client application before intial deployment of a WebRTC service AND during an update to a newer version.*

The most “fluid” inteoperability difficulty is continuous changes to the WebRTC protocol stacks hidden insider the browsers. Almost with every browser release, some minor changes appear that impair interoperability. Until the environment becomes more stable, typical reaction is reverse analysis of the new interop behaviour and using ABC SBC mediation features to address it. For example, Chrome browsers Version 39.0 and higher are known not to handle “early media” correctly. The ABC SBC configuration allows to mediate “183 early media” into regular “180 ringing” as shown in Figure WebRTC Mediation Example.


Figure 11: WebRTC Mediation Example

In summary, while the industry is converging to a solid level of interoperability, thorough effort during initial and and regression tests is highly recommended.


[1]The JSSIP application is available under MIT License and can be obtained from http://jssip.net.
[2]A TURN server is not part of ABC SBC. A publicly available TURN server is available under from https://code.google.com/p/rfc5766-turn-server/