Scaling WebRTC - A serverless approach with Azure.
This article is based on a paper submitted for grading during my studies. (Steininger, 2024b)
Abstract
Scaling WebRTC applications, such as WaveNet8 //Send, can be quite a task. While several methods exist to achieve this, we will primarily focus on cloud services. After analyzing various options, we chose Azure for its extensive serverless services that facilitate secure and highly scalable solutions. No other providers offered a comparable SignalR serverless service. Additionally, Azure’s low fixed costs provided further economic incentives.
We will demonstrate the developed solution, detailing the services used. A small application with front- and back-end components will be deployed to showcase Continuous Deployment (CD), focusing on deployment without tests.
The cloud infrastructure is made reproducible using Terraform, an Infrastructure as Code (IaC) tool. We will also show how to set up the cloud services and the example application using the provided files.
Prerequisites
- Familiarity with Azure’s Dashboard
- Basic knowledge of HTML, JavaScript, and .NET (C#)
- Access to the GitHub Repository
- Knowledge about Terraform Scripts
We recommend ensuring you are familiar with these topics before proceeding.
Cloud Architecture
Introduction
Developing an architecture that is scalable, secure, and cost-effective is challenging. We believe we have found a solution that enables us to scale WaveNet8 //Send to meet any demand. Due to confidentiality, we will demonstrate the deployment with a simple chat app instead of the actual code for WaveNet8 //Send.
To manage costs, our focus is on North America and Europe.
Goals
The goal is to develop a cloud architecture that can seamlessly scale to any foreseeable demand, is highly available and secure, and maintains low costs. Additionally, CD should be possible using Azure Pipelines from our Azure DevOps Repository.
We do not aim to host and maintain a TURN server network due to the high effort and fixed costs involved.
The Architecture
The developed architecture is depicted below. It includes all services involved in providing functionality to the user and securing the service.
The image above shows a clear distinction between front-end and back-end components. The static WebApps provide the frontend files, while the SignalR Service and Hub handle the business logic and TURN credentials. The KeyVault stores secrets and provides access to authorized services and processes. The database stores anonymized statistics about WaveNet8 //Send usage. We rely on Google and Metered TURN for TURN and STUN servers.
The resulting infrastructure contains the main components and facilitates the interactions shown in the sequence diagram.
Azure Services Used
- Azure Functions: Provides serverless compute for backend logic.
- Azure SignalR Service: Facilitates real-time communication.
- Azure Static Web Apps: Hosts static files for the frontend.
- Azure Key Vault: Manages and stores secrets securely.
- Azure SQL DB: Stores anonymized usage statistics.
- Metered TURN Services: Provides TURN server functionality for WebRTC.
Installation
It is assumed you have already set up an Azure account, Metered account, an Azure DevOps Project (with three repositories: Frontend, Backend, Terraform), and Terraform.
- Retrieve the Metered Key from the Developers section in the Metered Dashboard.
- Execute terraform plan -var “metered_key=
” -out ./myplan followed by terraform apply “myplan” - Check your Azure Dashboard to ensure all services were created without errors.
If the Terraform script was executed without errors, continue to set up the Azure Pipelines.
- Create the Pipelines using the files in the repository.
- Connect to Azure using Service conections (Project settings -> Service connections).
- Select Azure Resource Manager and then Service principal (automatic)
- Select your Subscription, Resource group and name the connection.
- Switch to Azure and search for App registrations.
- Rename the Connection and add it to the AzurePipelinesPrincipal Group. (Note: This allows the pipeline to access the database as ADMIN to set up other SQL users, etc.)
Next, set up the pipeline variables to enable deployment to our generated resources.
- Copy the StaticWebApp Token and set it as a variable in the Frontend Pipeline.
- Set up the deployment of the backend. Copy the FunctionApp URL and paste it into the main.js
- Set the correct values for the backend pipeline variables.
- When you run the pipeline for the first time, you will be asked to grant permission for deployment.
After completing these installation instructions, the deployment should work, and users should be able to access the ChatApp (or any other application you choose to deploy).
Availability
When calculating the service’s availability, we assume the statistical independence of all variables, resulting in the minimum expected availability.
- SingalR Service rSigR > 99.9% (Microsoft, 2024, S. 79)
- Azure Functions App rFApp > 99.95% (Microsoft, 2024, S. 53)
- Storage Account rStAc > 99.95% (Microsoft, 2024, S. 85)
- Metered rMet > 99.999% (nicht bindend) (Metered, 2024)
- Key Vault rKeyV > 99.99% (Microsoft, 2024, S. 56)
- Application Insights rAIn > 99.9% (Microsoft, 2024, S. 25)
- Log Analytics rLogA > 99.9 (Microsoft, 2024, S. 59)
- Static WebApp rStWeb > 99.95% (Microsoft, 2024, S. 84)
rSigR * rFApp * rStAc * rMet * rKeyV * rStWe < rWaveNet8Send
rWaveNet8Send > 99.7%
We did not achieve our high availability goal of >99.9%.
Security
The KeyVault, combined with using KeyVault references in app settings, enhances security. Using RBAC reduces necessary keys and increases security. The firewall configuration only allows trusted sources to connect.
However, some compromises were needed to achieve CD. To eliminate the need for manual SQL Server configuration, the DBAdmin Group was assigned to the Pipelines principal. It is recommended to remove the Principal from this role after the first deployment to prevent tampering with the database by developers with pipeline access.
Metered services have certain risks due to poor security compliance. Improvements in their security are necessary to prevent exploitation (Steininger, 2024b).
IaC
The Infrastructure as Code (IaC) file in the repository contains variables and comments to allow customization.
For example, it offers suffixes to identify versions and allow multiple deployments. Sku variables are available to change the SKU for production and development environments.
For updates, ensure the azure_key_vault_id variable is set, enabling the script to load keys from the vault.
Economics
The following picture estimates the costs of each service per click and the combined cost of operation. The TURN Service is the largest cost factor, dependent on the average user’s data transfer. With more statistics, cost forecasts can be more precise.
Since the TURN Service is such a significant cost factor, reducing the second largest cost position by half would only result in a 10% decrease in operational costs. Hosting our own TURN Service is challenging due to high traffic and global distribution, leading to high personnel and material costs, advisable only if we significantly scale operations.
A mere ad-financed tier is not supportable without subscriptions.
Result
If you deployed the application from the repository, you should see a login site. Enter any name and start chatting. Note that we do not use WebRTC and send messages unencrypted using the AzureFunctionsApp. The focus was on deployment, not the WebApp.
Discussion
We did not achieve our high availability goal. Assuming the numbers offered by Microsoft (SLAs), the combined availability (considering essential services only) is as low as 99.7%.
This approach provides a highly scalable solution but allows attackers to drain funds by generating traffic you will pay for. The same risk applies to the TURN server due to insufficient measures against TURN Server Credential Exploitation (TSCE) (Steininger, 2024a).
This architecture will be our choice if we need to scale WaveNet8 //Send beyond our current infrastructure’s capabilities. However, further research is needed on securing the service against TSCE and misuse intended to generate huge fees. Additionally, the assumed independence of availability figures should be verified to ensure the actual availability is higher than the product of its components’ availability.
Yours truly, Sebastian Steininger
Literature
- Microsoft. (2024). OnlineSvcsConsolidatedSLA(WW)(April2024)(CR). https://wwlpdocu- mentsearch.blob.core.windows.net/prodv2/OnlineSvcsConsolidatedSLA(WW)(English)(April2024)(CR).docx
- (Steininger, 2024a) Steininger Sebastian, Mai 15 2024 - Metered TURN Server Credential Exploitation - A Case Study
- Steininger, S. (2024b). Portfolio Cloud-Programming.
Thx Open AI for providen Chat GPT 4 which created this article based on a rough draft.
© WaveNet8 2024-08-04
<< Back