How to Build a Cross-Browser/Hybrid Video Chat App with WebRTC - XB Software

How to Build a Cross-Browser/Hybrid Video Chat App with WebRTC

| |Reading time: 18 minutes

Nowadays, the availability of broadband internet connection is not a big deal. Thus, it’s a pretty natural thing that the video chatting apps become more and more common, replacing the communication using text messaging. Whether your intention is chatting with your relatives or discuss some business ideas with colleagues, video chats provide you with the most natural way of interpersonal communication.

The problem is when you want to eliminate the number of applications used and use your web browser as a video chat tool, it may require the installation of additional plugins. It is not always convenient for a user and developer, since reaching the cross-browser support may be a serious issue. As an answer to this challenge, WebRTC was created.

The Main Features of WebRTC

Probably, the main feature that attracts attention to WebRTC is the possibility to use peer-to-peer communication between the browsers without using any frameworks, plugins, or additional applications. It supports popular browsers such as Firefox, Opera, and Chrome and can be used on the main mobile platforms such as Android and iOS.

You can use third-party frameworks that can simplify the development process, but WebRTC doesn’t require any development tools. It’s an API that allows you to obtain audio, video or other data stream, gather network information, report errors, and initiate or closed sessions, etc.

The functionality of WebRTC is based on three pillars.

getUserMedia (MediaStream)

The MediaStream API allows you to get access to the streams of media such as audio and video data from your webcam. The getUserMedia() method gets three parameters:

• constraints define whether you want to use audio, video, or both of them. As well, it allows you to define such details as the video resolution, for example,
• a success callback that is passed a MediaStream
• an error callback that is called in case of errors

Here’s a canonical example that can help you understand how everything works:

In this case, we’re interested only in the video stream from a webcam (check how the constraints object looks like). If you add the <video autoplay></video> element to your page, successCallback will set the video stream as the source for it. In the case of error, you’ll see the error message defined within errorCallback in your console.

If you want to check this example, simply add the JavaScript code from above between the <script></script> tags. But you should notice that this code can be used only on a server such as Apache for example.


The main aim of RTCPeerConnection is to provide you with the communication of data between the peers. Its responsibilities also included codec handling, codec handling, security issues, etc.


Besides exchanging audio and video streams between the peers, WebRTC apps support the transmission of other types of data. You can create a real-time WebRTC text chat with file transfer support, for example. For exchanging the data such as text or files, RTCDataChannel is used. It uses the same API as WebSockets, so if you’re already familiar with it, it won’t be a hard task to learn new WebRTC tricks. Built-in DTLS protocol will guarantee the safety of transmitted data.


Another important part of WebRTC functionality is signaling. Using WebRTC, you can create an app that allows transmitting data between the several browsers. But there’s also a need for a mechanism that will coordinate the communication process. This exactly what signaling is. It’s used to exchange the session messages, info about the network configuration, and information about the media capabilities of browsers such as used codecs. To make all this job, a signaling server is required. The WebRTC standard doesn’t specify which server technology you should use. So, you can choose one or the other according to your preferences.

For exchanging the network information and connecting peers with each other via the UDP protocol, WebRTC apps use the ICE Framework. A caller creates a new RTCPeerConnection object with an onicecandidate handler. After a network candidate is available, the handler is called. Then, using WebSocket or any other mechanism, the caller can send serialized candidate data to a callee. After getting the candidate message, the callee uses addIceCandidate for adding the candidate to the remote peer description.

For exchanging the session descriptions, WebRTC applications use Session Description Protocol (SDP). This protocol doesn’t deliver any media from one peer to another. Its primary purpose is to describe the streaming media initialization parameters that are used during “negotiation” between the caller and callee. The main aim of such a process is to make sure that media type, format, and other properties are compatible. Here’s an example of how a serialized SDP object can look like:

o=jdoe 2890844526 2890842807 IN IP4
s=SDP Seminar
i=A Seminar on the session description protocol
u= (Jane Doe)
c=IN IP4
t=2873397496 2873404696
m=audio 49170 RTP/AVP 0
m=video 51372 RTP/AVP 99
a=rtpmap:99 h263-1998/90000

Let’s take a look at the meanings of some parameters:

• v= protocol version number
• o= originator and session identifier: username, id, version number, network address
• s= session name
• i= title or short information about the session
• m= media name and transport address
• c= connection information
• etc.

The process of exchanging local and remote media information takes place in several stages:

1. The caller uses the createOffer() method that creates an SDP offer which includes information about the local session such as media capabilities
2. setLocalDescription() allows setting the local description using the data gathered in the previous step
3. This description is sent to the callee using the signaling channel
4. Using the setRemoteDescription() method the callee sets the received data as the remote description
5. The callee uses the createAnswer() method to create and then set his own local session description.
6. After that, the callee sets the received description as the remote description and send its local description back to the caller
7. When the caller gets the sent data from the callee, he uses the setRemoteDescription method to set it as the remote description

May look a little bit messy, but everything will become clearer after we take a look at the practical example.

Creating a Video Chatting App

Let’s proceed with the coding process. As it been said, WebRTC itself doesn’t require any third-party frameworks. But since we need the signaling server we’ll use Node.js for our example. If you don’t have it installed on your system, you can check this download page. Once again, WebRTC standards do not require using Node.js or any other server technology specifically. So, you can use your favorite one.

Create a new folder named public. Within this folder, create a new HTML file index.html:

This code will add to the page two <video&gt</video&gt elements. The first one will display the media stream from your webcam. The second one is for the media stream from the person you will call. Two buttons allow to initiate and end the call. The addEventListener method will call the pageReady function after the page is loaded.

We’ve already added the webrtc.js file to the page. It will contain JavaScript code that implements previously explained WebRTC functionality. Let’s create some useful variables first:

Pay attention to the first code line. It’ll be used when we’ll create a new WebSocket connection. If you plan to test the application on a single device you can live it as is. But if you want to run a server and then connect to it from other devices, this value must match the IP address of the device on which the server is running. For example var config = { wssHost: ‘wss://’};. The peerConnCfg variable defines the parameters that will be used to initiate new RTCPeerConnection. The localVideoElem, remoteVideoElem, videoCallButton and endCallButton will be used to get access to the HTML elements of the page.

Now we can define the pageReady() function that is assigned to the load event:

Using getElementById(), we linked our variables to the page elements to have an opportunity to manipulate them. Then we enabled the Video Call button and attached a click event listener to it. So, after a user clicks this button the initiateCall() function will be called. The same way the closeConnection() function will be called after a user clicks the End Call button.

The prepareCall method creates a new RTCPeerConnection instance and assigns the required event listeners:

It uses the previously defined prepareCall() function. Then we need to get the media stream from the web cam and set it as the source for the <video id=”localVideo”></video> element. Finally, we create and send the connection offer to another peer using the createAndSendOffer() method that will be discussed later.

After a callee received an offer, he should get the media stream from his webcam, display it within the <video> element, and then create and send the answer.
Here’s how the answerCall() function works:

Now we have to define how WebSocket message exchange between peers will work:

Here’s how we created a new connection after clicking the Video Call button: peerConn = new RTCPeerConnection(peerConnCfg);. Now, we can analyze the state of the peerConn variable to decide what to do next. If there’s no such object (!peerConn), it’s a callee’s side, and we can simply answer the call. After receiving an offer from the remote peer, we can use the setRemoteDescription method to change the remote description. When we receive a new ICE candidate from the remote peer, we have to use the addIceCandidate method to add a new remote candidate to the RTCPeerConnection’s remote description.

The createAndSendOffer and createAndSendAnswer method are used for exchanging the media information between peers. This step was described in the Signaling section of this article:

The last step is to define the endCall() function:

Calling the close method we close the RTCPeerConnection. Then we have to stop the video streams and reset the state of the <video> elements. Last, we need to enable the Video Call to allow a user to make a new call and disable the End Call button.

Creating a Signaling Server

WebRTC works only with SSL. For the testing needs you can create two files that will be used by our app. Create a new subfolder named ssl. Within this folder create a new file named cert.pem:


The second file should be named key.pem:


Our application works via WebSocket, so we have to install it as well as some other dependencies. Create a new file named package.json and paste the following code in it:

To install the required files use the following command:
npm install

The application uses Secure WebSockets on port 3434. The server code allows WebSocket connections. It also broadcasts messages that were received from one peer to all other users.
Paste the required server in the server.js file:

To run the app use the command:
nodejs server.js

If you haven’t changed the webrtc.js file, you can open in your browser https://localhost:3434/. In a new browser window, you can open the same address and click the Video Call button.


The WebRTC example that we reviewed provides only basic functionality. It supports only one-to-one video chat, doesn’t allow exchanging text messages or files and it does not look attractive enough. The team of experienced web developers can significantly extend the functionality of WebRTC based chatting apps and create a competitive solution capable of providing the required level of security.

XB Software offers video chat apps development and building of innovative communication tools and web real-time applications using WebRTC and other real-time technologies.


give us a like
The following two tabs change content below.
Svetlana Gordiyenko

Svetlana Gordiyenko

XB Software marketing specialist proficient in digital marketing. She is passionate about web marketing and strives to create engaging, web-friendly content for IT audience across a variety of topics.