Measuring the Performance and Network Utilization of ...

14
Measuring the Performance and Network Utilization of Popular Video Conferencing Applications Kyle MacMillan University of Chicago Tarun Mangla University of Chicago James Saxon University of Chicago Nick Feamster University of Chicago ABSTRACT Video conferencing applications (VCAs) have become a criti- cal Internet application, even more so during the COVID-19 pandemic, as users worldwide now rely on them for work, school, and telehealth. It is thus increasingly important to un- derstand the resource requirements of different VCAs and how they perform under different network conditions, including: how much “speed” (upstream and downstream throughput) a VCA needs to support high quality of experience; how VCAs perform under temporary reductions in available capacity; how they compete with themselves, with each other, and with other applications; and how usage modality (e.g., number of participants) affects utilization. We study three modern VCAs: Zoom, Google Meet, and Microsoft Teams. Answers to these questions differ substantially depending on VCA. First, the average utilization on an unconstrained link varies be- tween 0.8 Mbps and 1.9 Mbps. Given temporary reduction of capacity, some VCAs can take as long as 50 seconds to re- cover to steady state. Differences in proprietary congestion control algorithms also result in unfair bandwidth allocations: in constrained bandwidth settings, one Zoom video confer- ence can consume more than 75% of the available bandwidth when competing with another VCA (e.g., Meet, Teams). For some VCAs, client utilization can decrease as the number of participants increases, due to the reduced video resolution of each participant’s video stream given a larger number of participants. Finally, one participant’s viewing mode (e.g., pinning a speaker) can affect the upstream utilization of other participants. 1 INTRODUCTION Internet users around the world have become increasingly dependent on video conferencing applications over the past several years, particularly as a result of the COVID-19 pan- demic, which has caused many users to rely almost exclusively on these applications for remote work, education, healthcare, social connections, and many other activities. Some ISPs have reported as much as a three-fold increase in video conferenc- ing traffic during an eight-month period in 2020 during the pandemic [5]. Our increased reliance on these video confer- encing applications (VCAs) has highlighted certain disparities, especially given that nearly 15 million public school students across the United States during the pandemic lacked Internet access for remote education [19]. As cities and countries around the world moved to close this gap, many asked a series of simple questions geared towards understanding the level of connectivity that they needed to provide citizens to guarantee reliable, high-quality Internet experience: What is the baseline level of Internet performance needed to support common video conferencing applications for the activities people commonly use them for (e.g., education, remote work, telehealth)? This was the question, in fact, that our own city officials asked us which motivated us to pursue this question in this paper. The question has been brought into focus even more over recent months as the United Stated Federal Communications Commission (FCC) continues to debate the levels of Internet speed that should classify as a “broadband” service offering. The current standard is 25 Mbps downstream/3 Mbps upstream. With the rise of video confer- encing applications, however, consumer Internet connections saw an increase in upstream traffic utilization. Some policy ad- vocacy papers have claimed (without measurements) that the FCC should change its definition of “broadband” to a symmet- ric 20 Mbps connection on account of these trends. Moreover, because many users, especially in underserved regions, ex- perience frequent connectivity disruptions, a comparative analysis of VCAs under dynamic (and degraded) network con- ditions can also shed light on the design practices of VCAs and help identify best design practices. In light of these discussions and questions from municipal, state, and federal policymakers, we were motivated to explore the answers to questions concerning how much network re- sources video conferencing applications required, how they responded to network degradations and connectivity inter- ruptions, how they compete with each other and with other applications, and how these questions vary depending on the modalities of use (e.g., gallery mode vs. speaker mode). Similar questions have of course been studied in the past [2, 6, 7, 11, 31], but the vast majority of these studies are now at least a decade old, during which time both the VCAs themselves and how arXiv:2105.13478v1 [cs.NI] 27 May 2021

Transcript of Measuring the Performance and Network Utilization of ...

Measuring the Performance and Network Utilizationof Popular Video Conferencing Applications

Kyle MacMillanUniversity of Chicago

TarunManglaUniversity of Chicago

James SaxonUniversity of Chicago

Nick FeamsterUniversity of Chicago

ABSTRACTVideo conferencing applications (VCAs) have become a criti-cal Internet application, even more so during the COVID-19pandemic, as users worldwide now rely on them for work,school, and telehealth. It is thus increasingly important to un-derstand the resource requirementsofdifferentVCAsandhowthey perform under different network conditions, including:howmuch “speed” (upstream and downstream throughput) aVCA needs to support high quality of experience; how VCAsperform under temporary reductions in available capacity;how they competewith themselves, with each other, andwithother applications; and how usage modality (e.g., numberof participants) affects utilization. We study three modernVCAs: Zoom, Google Meet, and Microsoft Teams. Answers tothese questions differ substantially depending on VCA. First,the average utilization on an unconstrained link varies be-tween 0.8 Mbps and 1.9 Mbps. Given temporary reductionof capacity, some VCAs can take as long as 50 seconds to re-cover to steady state. Differences in proprietary congestioncontrol algorithms also result in unfair bandwidth allocations:in constrained bandwidth settings, one Zoom video confer-ence can consume more than 75% of the available bandwidthwhen competing with another VCA (e.g., Meet, Teams). Forsome VCAs, client utilization can decrease as the number ofparticipants increases, due to the reduced video resolutionof each participant’s video stream given a larger number ofparticipants. Finally, one participant’s viewing mode (e.g.,pinning a speaker) can affect the upstream utilization of otherparticipants.

1 INTRODUCTIONInternet users around the world have become increasinglydependent on video conferencing applications over the pastseveral years, particularly as a result of the COVID-19 pan-demic,whichhas causedmanyusers to rely almost exclusivelyon these applications for remote work, education, healthcare,social connections, andmany other activities. Some ISPs havereported as much as a three-fold increase in video conferenc-ing traffic during an eight-month period in 2020 during the

pandemic [5]. Our increased reliance on these video confer-encingapplications (VCAs)hashighlightedcertaindisparities,especially given that nearly 15 million public school studentsacross the United States during the pandemic lacked Internetaccess for remote education [19].

As cities and countries around theworldmoved to close thisgap, many asked a series of simple questions geared towardsunderstanding the level of connectivity that they needed toprovide citizens to guarantee reliable, high-quality Internetexperience:What is the baseline level of Internet performanceneeded to support common video conferencing applications forthe activities people commonly use them for (e.g., education,remote work, telehealth)? This was the question, in fact, thatour own city officials asked us which motivated us to pursuethis question in this paper. The question has been broughtinto focus evenmore over recent months as the United StatedFederal Communications Commission (FCC) continues todebate the levels of Internet speed that should classify as a“broadband” service offering. The current standard is 25Mbpsdownstream/3 Mbps upstream.With the rise of video confer-encing applications, however, consumer Internet connectionssawan increase in upstream trafficutilization. Somepolicy ad-vocacy papers have claimed (without measurements) that theFCC should change its definition of “broadband” to a symmet-ric 20 Mbps connection on account of these trends. Moreover,because many users, especially in underserved regions, ex-perience frequent connectivity disruptions, a comparativeanalysis of VCAs under dynamic (and degraded) network con-ditions can also shed light on the design practices of VCAsand help identify best design practices.

In light of these discussions and questions frommunicipal,state, and federal policymakers, weweremotivated to explorethe answers to questions concerning howmuch network re-sources video conferencing applications required, how theyresponded to network degradations and connectivity inter-ruptions, how they compete with each other and with otherapplications, and how these questions vary depending on themodalities of use (e.g., gallerymodevs. speakermode). Similarquestionshaveofcoursebeenstudied in thepast [2, 6, 7, 11,31],but the vastmajority of these studies are now at least a decadeold, during which time both the VCAs themselves and how

arX

iv:2

105.

1347

8v1

[cs

.NI]

27

May

202

1

we use them has changed dramatically. New VCAs such asGoogle Meet, Microsoft Teams, and Zoom have emerged inthe last few years alone. In light of these newVCAs and recentdrastic shifts in usage pattern and our dependence on theseapplications, a re-appraisal of these questions is timely.

Westudy the followingquestions for threepopular,modernvideo conferencing applications:

(1) What is the network utilization of common video con-ferencing applications?

(2) What speeds do these VCAs require to deliver goodquality of experience to users?

(3) How do VCAs respond to temporary disruptions toconnectivity? and how quickly do they recover whenconnectivity is restored?

(4) How do VCAs respond in the presence of competingtraffic? How do they compete with themselves, withother VCAs, and with other applications?

(5) How do different usage modalities (number of partici-pants, speaker vs. gallery viewing) affect network uti-lization?

We explore these questions by performing controlled labora-tory experiments with emulated network conditions across acollection of clients, collecting application performance datausing provided application APIs. We collect application dataunder a variety of network conditions and call modalities, in-cluding different numbers of participants and viewing modes.Many of the answers to these questions surprised us; as wewill discover, sometimes the answers depend quite a lot onthe particular video conferencing application. Table 1 sum-marizes our main findings and where they can be found inthe paper. In particular, we found significant differences innetwork utilization, encoding strategy, and recovery timeafter network disruption.We also discovered that VCAs shareavailable network resources with other applications differ-ently. We confirm earlier findings [33], that one participant’schoice of how to interact in the video conference can affectthe network utilization of other participants.Our findings have broad implications, for network man-

agement, as well as for policy. For network management, ourfindings concerning network utilization and performance un-der variousnetwork conditionshave implications for networkprovisioning, as access ISPs seek to provision home networksto achieve good performance for consumers. From a policyperspective, our results are particularly important: questionsabout the throughput needed to support quality video confer-encing, especially in multi-user households, was a questionfromcity government officials thatmotivated this paper in thefirst place. Looking ahead, as city, state, and federal govern-ments both subsidize broadband connectivity and build outnew infrastructure, understanding how these increasingly

The average network utilization on an unconstrained link rangesfrom 0.8 Mbps to 1.9 Mbps (§3.1).

Despite using the same WebRTC API, Meet and Teams browserclients differ significantly in how they encode video (e.g., framesper second, resolution) under reduced link capacity (§3.2).

Recovery times following transient drops in bandwidth are quitedifferent anddependonbothVCA’s congestion controlmechanismand the direction of the drop, i.e. uplink or downlink. However, allVCAs take at least 20 seconds to recover from severe uplink dropsto 0.25 Mbps (§4).

Differences in proprietary congestion control also create unfairbandwidth allocations in constrained bandwidth settings. ZoomandMeet can consume over 75% of the available downlink band-width when competing with Teams or a TCP flow (§5).

Each participant’s video layout impacts its own and others’ net-work utilization. Pinning a user’s video to the screen (speakermode) leads to an increase in the user’s uplink utilization (§6).

Table 1:Main results, and where they can be found in the paper.

popular video conferencing applications consume and sharenetwork resources is of critical importance.

2 BACKGROUND&EXPERIMENT SETUPWe provide a brief background on video conferencing trans-port, technology, and architecture before turning to our ex-periment setup.

2.1 Video Conferencing ApplicationsMost modern VCAs use Real-time Transport Protocol(RTP) [25, 26] or its variants [3, 16] to transmit media con-tent. The audio and video data is generally transmitted overseparate RTP connections. RTP also uses two other proto-cols in conjunction, Session Initiation Protocol (SIP) [23] toestablish connection between clients and RTP Control Proto-col (RTCP) [25] to share performance statistics and controlinformation during a call. Despite using well-known proto-cols, VCAs can differ from each other significantly across thefollowing dimensions:

• Networkmechanisms: RTP and the associated proto-cols are implemented in the application-layer. Thus, thespecific implementationof theprotocolscanvaryacrossVCAs. For instance, Zoom reportedly uses a custom ex-tension of RTP [16]). Furthermore, the key networkfunctions, i.e., congestion control and error recovery,can also be different and proprietary.

• Application-layerparameters: These includemediaencoding standards and default bitrates. More recentvideo codecs (e.g., VP9, H.264) can encode at the same

2

video quality with fewer bytes when compared to oldercodec (e.g., VP8), albeit with a higher compute over-head [4].

• Streaming architecture: VCAs can either choose touse direct connections or use relay servers. Centralizedservers are almost always used for multi-point calls tocombine data frommultiple users. VCAs can differ intheexact strategiesofdata combinationaswell as thege-ographic footprint of their servers. For instance, Zoomrapidly expanded its server infrastructure to supportincreased call volume because of COVID-19 [18].

Differences across one or more of these factors can lead todifferent VCA network utilization and performance. In thispaper, we aim to dig deeper into some of these differences fora subset of VCAs.

2.2 Experiment SetupVideoConferencingApplications: In this paper, we studythree popular VCAs—Zoom, Google Meet, and MicrosoftTeams. These VCAs have been used extensively worldwideover the past year, especially in enterprises and educationalinstitutions [29]. Teams and Zoom provide desktop applica-tions as well as browser clients, whereas Meet is “native" inthe Chrome. Most of our tests are conducted using the desk-top applications for Teams and Zoom, and using the GoogleChrome browser for Meet. In-browser tests for Teams andZoom are specified by Teams-Chrome and Zoom-Chrome, re-spectively. We use Chrome (Meet) version 89.0.4389, Zoomclient version 5.6.1, and Teams client version 1.4.00.7556.Laboratory Environment: We conduct experiments in acontrolled environment. We begin by describing our experi-mental setup for a 2-party call. We use two identical laptops,referred to as C1 and C2, representing the two VCA clients.Each laptop is a Dell Latitude 3300 with a screen resolutionof 1366 × 768 pixels and running Ubuntu 20.04.1. The lap-tops have a wired connection to a Turris Omnia 2020 routerand access a dedicated 1 Gbps symmetric link to the Internet.Each experiment consists of a call between C1 and C2 un-der a pre-specified network bandwidth profile and VCA. Thebandwidth profile is emulated by shaping the link betweenC1 and the router using traffic control (tc). A pre-recordedtalking-head video1 with a resolution of 1280 × 720 is usedas the video source for the call, using ffmpeg. This is done toboth replicate a real video call and ensure consistency acrossexperiments. All experiments are conducted with the laptoplid open and the application windowmaximized.AutomatingExperiments: Toconduct experimentsat scale,we automated the entire in-call process.We take several steps1It would be inappropriate to use the device webcam as the video, becausewithout movement, VCAs compress the video and ultimately send at a muchlower rate than during a normal call.

VCA Utilization (Mbps)Upstream Downstream

Meet 0.95 0.84Teams 1.40 1.86Zoom 0.78 0.95

Table 2: Unconstrained network utilization.

to recreate the in-call process. We use the Python PyAutoGUIpackage [22] to automate joining and leaving calls. The pack-age enables to programmatically control the keyboard andthemouse by specifying coordinates or visual elements on thescreen. For Zoom-Chrome, we encountered CAPTCHA be-fore joining a call on the default browser. Using the Selenium-based Chrome browser [27], however, enabled us to bypassthe CAPTCHA. Note the experiments using Selenium arerun exactly as it would be in default Chrome browser. Theworkflow is controlled from C1, with TCP sockets used to co-ordinate betweenC1 andC2.Wemodify this setup slightly forsubsequent experiments (e.g., multi-party calls); those slightmodifications are described in the respective sections.

3 STATICNETWORKCONDITIONSIn this section, we study the effect of network capacity onVCA performance.Method: We conduct a series of experiments, each consist-ing of a 2.5-minute call between two clients, C1 and C2(as described in Section 2.2), under a specific shaping level.We conduct two sets of experiments, shaping first the up-link and then the downlink. We constrain throughput to{0.3, 0.4, . . . , 1.4, 1.5, 2, 5, 10} Mbps (in each of the upstreamand downstream directions). For each condition, we performfive 2.5-minute experiments; our data show that due to rel-atively low variance in most observations, this number ofexperiments is sufficient to achieve statistically significant re-sults. For ZoomandTeams clients, we performmeasurementsboth for native clients as well as for browser-based clients, re-ferred to as Zoom-Chrome and Teams-Chrome, respectively.

3.1 Network UtilizationConstrained upstream utilization: Figure 1a shows themediansentnetworkbitrate fordifferentuplinkcapacities.Weobserve differences in upstream network utilization amongVCAsgiven the sameavailable uplinknetwork capacity. In thecase of a 10 Mbps uplink, for example, the average upstreamutilization for Teams-native is 1.44Mbps whereas it is only0.95 forMeet and 0.77Mbps for Zoom. All three VCAs utilizetheuplinkefficiently (above85%)when that link is constrained(0.8 Mbps or lower), althoughMeet’s bitrate is slightly higherthan the other VCAs.Constrained downstream throughput: We explored theeffect of constrained downstream capacity on VCAs’ network

3

0.5 1 1.5 2 5 10Uplink Capacity [Mbps]

0.250.5

0.751

1.251.5

1.752

Med

ian

Bitr

ate

[Mbp

s]

x=y

Meet Teams Zoom

(a) Uplink bandwidth vs network bitrate

0.5 1 1.5 2 5 10Downlink Capacity [Mbps]

0.250.5

0.751

1.251.5

1.752

Med

ian

Bitr

ate

[Mbp

s]

x=y

Meet Teams Zoom

(b)Downlink bandwidth vs network bitrate

0.5 1 1.5 2 5 10Uplink Capacity [Mbps]

0.250.5

0.751

1.251.5

1.752

Med

ian

Bitr

ate

[Mbp

s]

x=y

TeamsTeams-Chrome

ZoomZoom-Chrome

(c) Impact of VCA platform

Figure 1: Utilization under different shaping levels. The bands represent 90% confidence intervals.

utilization. Figure 1b shows this result. As with a constraineduplink, the VCAs differ in terms of their downlink utilizationunder unconstrained link. The unconstrained downlink uti-lization differs from unconstrained upstream utilization, asshown in Table 2. To better understand this phenomenon, weanalyzed the traffic captured from both clients.

In the case of Teams, we found that the sent traffic from C1is almost same as the received traffic at C2, and vice versa.Wesuspect the small differences in upstream and downstreamutilization may largely be due to the variability in utilizationacross experiments in Teams, as is also evident in the largerconfidence intervals as compared to Zoom andMeet.In contrast, Zoom’s utilization patterns were asymmetric:

we found an asymmetry in sent and received data rates at bothclients. For instance, in a single instance with 10 Mbps down-link shaping, C2 sent a median 0.85 Mbps and C1 receiveda median 1.10 Mbps. Investigating this asymmetry further,we discovered an interesting phenomenon: We found thatZoom uses a relay server instead of direct communication.Additionally, related work by Nistico et al. [21] suggests thatZoomuses Forward Error Correction (FEC) for error recovery.This is further supported by a related patent fromZoom itself,which talks about a methodology to generate FEC data at theserver [17]. We suspect that the extra data may thus corre-spond to FEC added by the relay server, leading to asymmetricupstream and downstream utilization.In addition to the asymmetric unconstrained utilization,

Meet exhibits markedly different behavior with a constraineddownlink. Specifically, the network utilization with con-strained downstream throughput (< 0.8 Mbps) is only 39–70% (Figure 1b), while it is more than 90% in the case of aconstrained uplink (Figure 1a). Upon further exploration, wediscovered that Meet also uses a relay server, as well as simul-cast, wherein the sender (C2) transmits multiple copies of thevideo to the server, each at a different quality level [21]. Theserver then relays one of the quality streams to C1 dependingon the inferred available capacity of the server-C1 link. We

observe two simultaneous video streams in our experiments,one at 320x180 and other at 640x360 resolution. When down-stream throughput is constrained, the relay server cannotswitch to a higher quality video and keeps sending at lowquality bitrate. This explains whyMeet’s network utilizationat 0.5 Mbps is only 0.19 Mbps, almost similar to its utilizationat 0.3 Mbps. The use of simulcast also explains the higherupstream utilization as compared to downstream utilization.Browser vs. native client utilization: Figure 1c comparesthe upstream utilization of Zoom and Teams between theirrespective native and Chrome clients. Zoom’s utilization issimilar across the native and browser-based platforms; in con-trast, we find significant difference between Teams native andbrowser client.When uplink capacity is shaped to 1Mbps, theTeams-native client uses 0.84 Mbps, whereas Teams-Chromeuses only 0.61 Mbps. We found a similar difference betweenTeams-nativeandTeams-Chromewhendownstreamthrough-put is constrained.

3.2 Application PerformanceNext, we explore how video conference application perfor-mance varies depending on network capacity. To do so, werely on theWebRTC stats API available in Google Chrome toobtain application metrics for Teams-Chrome andMeet [30].We could not obtain the same statistics for Zoom-Chrome as ituses DataChannels instead of RTPMediaStream inWebRTCto transmit media. DataChannels statistics lack any videoquality metrics and mostly contain data volume information.Obtaining applicationmetrics is challenging for native clients.The Zoom API does provide limited application performance(e.g., video resolution, FPS) at a per-minute granularity [35]for the native client, but this granularity is insufficient toobserve short-term quality fluctuations. We thus limit ouranalysis in this section to only Meet and Teams-Chrome. Wefocus on a subset of metrics available from WebRTC Stats

4

0.3 0.5 1 1.5 2 5 10Downlink [Mbps]

10

20

30

40

Qua

ntiz

atio

n pa

ram

eter Meet Teams-Chrome

(a)Downlink - Quantization parameter

0.3 0.5 1 1.5 2 5 10Downlink [Mbps]

0

10

20

30

FPS

Meet Teams-Chrome

(b)Downlink - Frames per second.

0.3 0.5 1 1.5 2 5 10Downlink [Mbps]

0

250

500

750

1000

Wid

th

Meet Teams-Chrome

(c)Downlink - Frame width.

0.3 0.5 1 1.5 2 5 10Uplink [Mbps]

10

20

30

40

Qua

ntiz

atio

n Pa

ram

eter Meet Teams-Chrome

(d) Uplink - Quantization parameter.

0.3 0.5 1 1.5 2 5 10Uplink [Mbps]

0

10

20

30FP

SMeet Teams-Chrome

(e) Uplink - Frames per second.

0.3 0.5 1 1.5 2 5 10Uplink [Mbps]

0

250

500

750

1000

Wid

th

Meet Teams-Chrome

(f) Uplink - Frame width.

Figure 2: Video encoding parameters with 90% confidence intervals under downstream and upstream throughput constraints.

API that relate to video quality and freezes. These metrics areavailable at a per-second granularity during each call.Video quality: VCAs adapt the video quality by adjustingthe encoding parameters to achieve a target bitrate estimateprovided by the transport. Ideally, VCAs can adjust one ormore of the following three parameters:

• frames per second (FPS),• quantization parameter used in video compression, and• video resolution indicated as the number of pixels ineach dimension.

We can obtain all three parameters from theWebRTC stats. Inour analysis, we present a single dimension of the resolution,namely, itswidth.

We first look at these parameters for different downstreamcapacities (see Figures 2a to 2c). Teams-Chrome simultane-ously degrades all three video quality parameters as down-stream capacity decreases. In the case of Teams-Chrome, wealso observe variable behavior across multiple experimentsunder the same network capacity, as shown by the 90% con-fidence interval bands in the plot. Meet, on the other hand,behaves in amore consistent fashion: it adjusts parameters dif-ferently based on capacity.Within the 0.7–1Mbps range,Meetadapts the bitrate primarily by adapting frames per second,while keeping the quantization parameter and frame widthsimilar to the quality levels in the absence of degradation. Ascapacity is further constrained, however, from 0.5–0.7 Mbps,

both the frame width and quantization parameter degrade,whereas there is a simultaneous increase in the frames persecond. Upon closer inspection of the per-second statistics inthis operating range, we find that the relay server may switchto a lower quality copy of the stream it received via simulcast.Meet does not appear to reduce the bitrate any further byreducing the FPS for the low-quality stream—specifically, weobserved a consistent frame rate even at rates of less than0.5 Mbps It is not clear why the quantization parameter re-duces from 38 to 33 at 0.3 Mbps in Meet.We next analyze the effect of uplink capacity constraints

on encoding parameters (see Figures 2d to 2f). Teams adaptsto constrained throughput settings mainly by increasing thequantization parameter and reducing the frame width, whilekeeping the FPS almost constant. To our surprise, we foundthat the framewidth, increases as uplink capacity is reduced to0.3 Mbps. There seems to be no particularly good explanationfor this effect, and (as we discuss in the next paragraph) it alsoresults in video freezes, suggesting a poor design decision orimplementation bug. Meet follows a similar trend by mostlyincreasing the quantization parameter until the upstreambandwidth decreases to 0.5 Mbps. At 0.4 Mbps, it also reducesframe width and the FPS.Video freezes: We also analyze the effect of constrained set-tings on video freezes.When constraining downlink capacity,we directly obtain the freeze duration fromWebRTC stats. A

5

0.3 0.5 1 1.5 2 5 10Downlink [Mbps]

0.00

0.05

0.10

0.15

Free

ze r

atio

Meet Teams-Chrome

(a)Downstream: Freeze ratio.

0.3 0.5 1 1.5 2 5 10Uplink [Mbps]

0

20

40

60

Tota

l FIR

cou

nt

Meet Teams-Chrome

(b) Full Intra Request (FIR) count.

Figure 3: Video freeze under downstream and upstream throughputconstraints. The bands represent 90% confidence intervals.

freeze is assumed to occur if the frame inter-arrival > max(3𝛿 , 𝛿 + 150𝑚𝑠), where 𝛿 is the average frame duration. Wenormalize the freeze duration with the total call duration toobtain freeze ratio. Figure 3a shows the freeze ratio underdifferent downstream capacities. The freeze ratio increasesas the downlink bandwidth degrades. Meet has higher freezeratio than Teams-Chrome with 10% freeze ratio at 0.3 Mbps.Interestingly, Teams-Chrome incurs freezes (a 3.6% freezeratio) even in the absence of throughput constraints, suggestingagain implementation problems or poor design choices.

When the uplink is constrained, we could not obtain freezestatistics from C1’s logs, because theWebRTC stats providefreeze statistics only for the received video stream. Instead,we analyze the total count of Full Intra Requests (FIR) for theupstream video data. An FIR is sent if the receiver cannotdecode the video or falls behind, likely due to frame losses. Alow FIR count does not rule out freezes on the receiver, buta high count does indicate that video freezes are occurring.Figure 3b shows that the FIR count is particularly high forTeams-Chrome at uplink capacity below 0.5Mbps. A high FIRcount in Teams-Chromemay be triggered due to the sendersending high-resolution video, as shown in Figure 2f.

Takeaways: VCA performance under same networkconditions varies among testedVCAs. Further, theVCAs’average upstream utilization on an unconstrained linkhas implications for broadband policy. The FCC cur-rently recommends a 25/3 Mbps minimum connection.Such a connection may not suffice even for two simulta-neous video calls.

4 NETWORKDISRUPTIONSVCAs must handle network disruptions; they may do so indifferent ways. Designing to cope with disruptions, such asinterruptions to network connectivity, has become evenmorepertinent over the past year as users have increasingly re-lied on broadband Internet access to use VCAs, which cansometimes experience connectivity disruptions. In additionto experiencing periodic outages, home networks are suscep-tible to disruptions caused by temporary congestion along theend-to-end path. In this section, we aim to understand howVCAs respond to the types of disruptions that home Internetusers might sometimes experience.Method: We analyze how VCAs respond to temporary net-work disruptions during the call by introducing transientcapacity reductions. Using the same setup as in the previousexperiment, we initiate a five-minute VCA session betweentwo clients, C1 and C2, both of which are connected to theInternet via a 1 Gbps link. One minute after initiating the call,we reduce the capacity between C1 and the router for 30 sec-onds, before reverting back to the unconstrained 1 Gbps link.We repeat each experiment four times.We conduct two sets ofexperiments: first disrupting the uplink, and then disruptingthe downlink.We reduce capacity to the following levels: 0.25,0.5, 0.75, 1.0Mbps. (Wedonot consider disruptions in capacityto levels of more than 1 Mbps because both Zoom andMeet’saverage utilization is below 1Mbps.)

4.1 Uplink DisruptionsFigure 4a shows VCA upstream bitrate over the course of acall. Following a disruption, both the time to recovery andthe characteristics of that recovery differ across VCAs. Wequantify the time for VCA utilization to return to normal bydefining a time to recovery (TTR) metric. We define TTR asthe time between when the interruption ends and when thefive-second rolling median bitrate reaches the median bitratebefore interruption, also referred to as nominal bitrate.Time to Recovery: Figure 4b shows how the extent of thedisruption to uplink connectivity affects each VCA’s time torecovery. The more severe the capacity reduction, the longerthe VCAs need to recover.Teams takes longer to recover even at less severe disrup-

tions for tworeasons: (1) thenominalbitrateofTeams ishigher6

0 50 100 150 200 250Time [s]

0.0

0.5

1.0

1.5

2.0

2.5U

pstr

eam

Bitr

ate

[Mbp

s] Meet Teams Zoom

(a)Average upstream bitrate over time. Grey region indicates period where uplinkcapacity is constrained to 0.25Mbps. Vertical dotted lines indicatewhen theupstreambitrate has returned to the average bitrate before disruption.

0.25 0.5 0.75 1.0Uplink Capacity [Mbps]

0

25

50

Tim

e to

Rec

over

y [s

]

Meet Teams Zoom

(b) Time to recover from disruption to a given uplink capacity

Figure 4: VCA response to a 30-second reduction in uplink capacity.

thanMeetandTeams, (2)Teams increases theupstreambitrateslowly immediately after the interruption before increasingquickly back to normal (as shown in Figure 4a). Meet also ob-serves a similar recovery pattern when the disruption dropscapacity to 0.25 Mbps. However, it recovers much faster forthe case of less severe disruptions, mostly because its nominalbitrate is around 0.96 Mbps.Recovery Patterns: While Meet and Teams follow a morerapid trend, Zoom’s recovery is different: It takes the longesttime to recover under severe disruptions. According to Fig-ure 4a, Zoom follows a stepwise recovery, with an almost-linear increase immediately after interruption. It then entersa periodic-probing phase where it increases the sending rate,stays at it for sometimebefore increasing it again. Theprobingphase continues well above its nominal bitrate, before finallyreducing the bitrate. Zoom does not return to a steady-statesending pattern until two minutes after the disruption, in themeantime sending at much higher rates. At first glance, suchprobing might appear to be bad design as it could introduceadditional packet loss and delay harming both Zoom’s ownperformance and the performance of other applications. Wesuspect, however, that Zoom may be using redundant FECpackets to gauge capacity similar to the FBRA congestioncontrol proposed byNagy et al. [20]. Thus, even if such behav-ior induces packet loss, the user performance may not suffer.

0 50 100 150 200 250Time [s]

0.0

0.5

1.0

1.5

2.0

2.5

Dow

nstr

eam

Bitr

ate

[Mbp

s] Meet Teams Zoom

(a)Average downlink bitrate over time. Grey region indicates period where down-link capacity is constrained to 0.25 Mbps. Vertical dotted lines indicate when thedownstream bitrate has returned to the average bitrate before disruption.

0.25 0.5 0.75 1.0Downlink Capacity [Mbps]

0

20

40

Tim

e to

Rec

over

y [s

]

Meet Teams Zoom

(b) Time to recover from disruption to a given downlink capacity.

Figure 5: Response to a 30-second reduction in downlink capacity.

Nevertheless, this inefficient use of the uplink, however, coulddisturb other applications on the same link, leading to a poorquality of experience for competing applications.

4.2 Downlink DisruptionsFigure 5b shows that Teams recovers more slowly thanMeetand Zoom, always taking at least 20 seconds longer to returnto the average rate, regardless of the magnitude of the inter-ruption. Furthermore, Meet and Zoom recover much morequickly in this case compared to uplink interruptions. Thiscan be explained by the way each VCA sends video. For allthree VCAs, the clients communicate through an intermedi-ary server. Thus, the recovery times depend on the congestioncontrol behavior at the server, as well.As mentioned in Section 3, Meet uses an encoding tech-

nique called simulcast, where the client encodes the samevideo at different quality levels and sends the encoded videosto an intermediate server. The server then sends the appro-priate version based on the perceived downlink capacity atthe receiver. The server can quickly switch between versionsbased on downstream capacity. This quick recovery is shownin Figure 5: Meet returns to its average rate in under ten sec-onds, regardless the severity of the interruption.

Similarly, Zoom uses scalable video coding when transmit-ting video [34]. Instead of sending several versions of varying

7

0 50 100 150 200 250Time [s]

0.0

0.5

1.0

1.5

2.0U

pstr

eam

Bitr

ate

[Mbp

s] Meet Teams

Figure 6: Client 2 (C2) upstream bitrate. Grey region indicates whenC1’s downlink capacity is reduced to 0.25 Mbps

VCA Client 1 (C1) VCA Client 2 (C2)Router

Bottleneck

VCA Client 1 (C1)

Competing Flow 1 (F1)

Switch Router

VCA Client 2 (C2)

Competing Flow 2 (F2)

Bottleneck Internet

Internet

Figure 7: Setup for competition experiments.

quality, Zoom sends hierarchical encoding layers that, whencombined,produceahighqualityvideo.ThisallowsC2 tosenduninterrupted even when C1’s downlink capacity decreases.The server can then recover fasterby sendingadditional layersonce the network conditions improve.

While the intermediary server forZoomandMeetperformscongestion control, in the case of Teams, this server acts onlyasa relay.DuringaTeamscall,C2will recognizeC1has limiteddownstream capacity and send at only the bitrate it knowsC1 can receive. Once C1 has more available bandwidth, C2must discover this: it must first probe the connection beforereturning to its average sending rate. Figure 6 illustrates howC2’s sending rate to the intermediary server does not changeduring a Meet call, but drops below C2’s downlink capacityduring a Teams call, leading to the slow recovery.Zoom’s downstream and upstream bitrates remain at the

level of available capacity during the disruption; Meet andTeams suffer more serious degradation. Zoom’s efficient net-work utilization and response to disruption may be due toits encoding mechanisms, which allows it to effectively con-trol the encoding parameters (i.e., SVC-encoded layers, FPS,resolution etc.) to match the target bitrate.

Takeaways: Modern VCAs are slow to recover from re-ductions inuplink capacity,with all three requiringmorethan 25 seconds to recover from severe interruptions to0.25 Mbps. Only Teams is consistently slow to recoverfrom disruptions to downlink capacity, even showingdegradation in response to more moderate capacity re-ductions (e.g., to 1Mbps). This result can be attributed to

each VCA’s media sending mechanism and how conges-tion control is implemented at the intermediate serverfor each VCA.

5 COMPETINGAPPLICATIONSWith work and school continuing online, it is even morecommon to have multiple applications simultaneously us-ing the home network, potentially leading to competitionbetween any combination of VCAs, video streaming appli-cations, or other popular applications. In this section, wemeasure how VCAs perform in the presence of other appli-cations sharing the same bottleneck link. We focus on linksharing with other VCAs, a single TCP flow (iPerf3), and twopopular video streaming applications, Netflix and YouTube(which uses QUIC).Method: As illustrated in Figure 7, the setup for these experi-ments differs slightly from earlier ones. Instead of connectingthe client directly to the router, the twomatched clients, C1and F1, are connected to the router via a switch. The link ca-pacity between the switch and the router is set on the router.For each test, C1 (the incumbent application) first establishesa VCA call with C2. Approximately 30 seconds later, F1 initi-ates the competing application, which lasts for two minutes.F1’s counter-party, F2, depends on the type of competing ap-plication. If the competing application is a VCA call, then F2is another laptop. If the application is an iPerf32 (TCP) flow,then F2 is a server on the same network. If F1 is Netflix orYoutube (launched on Chrome), then F2 is a server for the re-spective service. After the competing application terminates,the incumbent application continues for an additionalminute.We repeat each experiment 3 times, with link capacity variedsymmetrically at {0.5, 1, 2, 3, 4, 5} Mbps.

5.1 VCA vs. VCAVCAs canachieve their nominal bitratewhen the link capacityis 4Mbps or greater, which is expected because the link capac-ity exceeds the sumof nominal bitrates of eachVCAs.We thenfocus on the operating regimewhen the link capacity is lower,leading to competition between the VCAs. With only twocompeting clients, we use the proportion of the link sharedas a metric to assess fairness. We call a VCA “aggressive” if ituses more than half of the link capacity under competition,and “passive” otherwise.

Considering theupstreamdirection,weobservedifferencesin how each VCA shares the link. Figure 8 shows a box plot ofthe upstream share of an incumbent VCA and a competingVCAwhen the uplink capacity is 0.5 Mbps. We find that anincumbent Meet client shares the link fairly with a newMeetor Teams client but backs off when a Zoom client joins (see

2The iPerf3 server uses TCP CUBIC and is within the same network (averageRTT 2ms).

8

Teams Meet ZoomCompeting Application

0.00

0.25

0.50

0.75

1.00

Shar

e of

Lin

k

(a) Incumbent: Meet

Teams Meet ZoomCompeting Application

0.00

0.25

0.50

0.75

1.00

Shar

e of

Lin

k

(b) Incumbent: Teams

Teams Meet ZoomCompeting Application

0.00

0.25

0.50

0.75

1.00

Shar

e of

Lin

k

(c) Incumbent: Zoom

Figure 8: Share of uplink capacity used by VCAs in competition with VCAs on a 0.5 Mbps symmetric link. White box is the incumbent application

0 25 50 75 100 125 150 175Time [sec]

0.0

0.2

0.4

Ups

trea

m B

itrat

e [M

bps] Zoom Zoom

(a) Zoom vs. Zoom

0 25 50 75 100 125 150 175Time [sec]

0.0

0.2

0.4

Ups

trea

m B

itrat

e [M

bps] Meet Meet

(b)Meet vs. Meet

Figure 9: Upstream bitrate of VCAs in competition on a 0.5 Mbpscapacity link. Upper trendline is incumbent application and lower iscompeting application.

Figure 8a). The results are similar for Teams except it receivesa slightly higher sharewhile competingwith Zoom comparedto Meet. Interestingly, Zoom is highly aggressive, both as anincumbent and a competing application, using at least 75%of the link capacity in the case it is an incumbent client (seeFigure 8c). In fact, Zoom’s congestion control is not even fairto itself.Figure 9a illustrates this effect more clearly, with the link

sharing between two Zoom clients for a single experimentunder 0.5 Mbps uplink capacity. We contrast this with thelink sharing between two Meet clients in Figure 9b, whereboth sessions converge to their fair share of 0.25 Mbps. Theresults are similar for other uplink capacities, with aggressive

applications leavingmore room for newclients if they achievetheir nominal bitrate.We find similar results for Zoom and Meet when down-

stream capacity is constrained. However, we find that Teamsis passive when sharing downstream capacity, backing offto all other VCAs, including other Teams flows. Figure 10bshows the link share of an incumbent Teams client comparedto the competing VCA under 0.5 Mbps downlink capacity.Teams achieves only 20% of the link when sharing with Meetand Zoom.We observe similar behavior for other downlinkcapacities. Figure 11b illustrates how an incumbent Teamsclient shares the link with a Zoom client at 1 Mbps downlinkcapacity. Clearly the Teams client backs off to 0.2 Mbps. Incontrast in the upstreamdirection, both Teams andZoomcon-verge to a near fair share of the link, as shown in Figure 11a.Additionally, the downstream throughput in Figure 11b forTeams degrades even before the Zoom call starts, likely be-cause the competing client opens the Zoom landing page toinitiate a call. This can lead to competing TCP traffic on thelink and as we find in the next subsection, Teams is extremelypassive when competing against TCP.

5.2 VCA vs. TCPWe now compare how the three VCAs compete with a 120seconds long TCP flow from iPerf3. In a home network, a longTCP flow can be created by file download or upload. We findthat Zoom is highly aggressive, especially at low uplink anddownlink capacity, consuming more than 75% of the band-width under a symmetric 0.5 Mbps link. Meet is TCP-friendlyin the uplink direction but not in the downlink, consuming75% of the bandwidth at 0.5 Mbps downlink (figures omitteddue to lack of space). Teams, on the other hand, is passiveand backs-off against a TCP flow in both upstream and down-stream even at high link capacity. Figure 12b and 12a showhow each VCA shares a 2 Mbps downlink and uplink withiPerf3, respectively. Teams is able to use 37% in uplink andonly 20% in the downlink even at 2 Mbps. The low upstream

9

Teams Meet ZoomCompeting Application

0.00

0.25

0.50

0.75

1.00

Shar

e of

Lin

k

(a) Incumbent: Meet

Teams Meet ZoomCompeting Application

0.00

0.25

0.50

0.75

1.00

Shar

e of

Lin

k

(b) Incumbent: Teams

Teams Meet ZoomCompeting Application

0.00

0.25

0.50

0.75

1.00

Shar

e of

Lin

k

(c) Incumbent: Zoom

Figure 10: Share of downlink capacity used by VCAs in competition with other VCAs on a 0.5 Mbps symmetric link. White box is the incumbentapplication.

0 25 50 75 100 125 150 175Time [sec]

0.00

0.25

0.50

0.75

1.00

Ups

trea

m B

itrat

e [M

bps] Teams Zoom

(a) Uplink

0 25 50 75 100 125 150 175Time [sec]

0.2

0.4

0.6

0.8

Dow

nstr

eam

Bitr

ate

[Mbp

s] Teams Zoom

(b)Downlink

Figure 11: Comparison of Teams competition behavior for uplinkand downlink on a 1 Mbps symmetric link. Teams is the incumbentapplication.

throughput for Teams with a TCP flow is particularly surpris-ing as it was able to achieve its fair-share when competingagainst (more aggressive) Meet and Zoom, as shown in Fig-ure 8b. At 2Mbps, bothMeet andTeams achieve their nominalbitrate allowing iPerf3 to use rest of the link. Clearly, all theVCAs are not TCPCUBIC-friendlywithMeet andZoombeinghighly aggressive and Teams being highly passive.Anomalous ZoomBursts: In the previous section, Figures4a and 5a showed how Zoom can send bursts of data foran extended period of time following a network disruption.Figure 13 shows how this behavior is also exhibited whenin competition with iPerf3. At about 125 seconds, Zoom’sincreased sending rate causes iPerf3 to abruptly lower its

Teams Meet ZoomIncumbent Application

0.00

0.25

0.50

0.75

1.00

Shar

e of

Lin

k

(a)Downlink

Teams Meet ZoomIncumbent Application

0.00

0.25

0.50

0.75

1.00

Shar

e of

Lin

k

(b) Uplink

Figure 12: iPerf3 link sharing with VCAs on a 2 Mbps capacity link.

utilization. This confirms that the temporary bursts in Zoom,likely for inferring available bandwidth, can adversely affectother applications on a scarce link.

5.3 VCA vs. Video StreamingWe also compared each VCA’s link share against two videostreaming applications, Netflix and YouTube, which consumesignificant downstream bandwidth. YouTube uses QUIC, aUDP-based transport protocol, which can be TCP-friendlydepending on some configuration values [9]. Both Meet andZoom are very aggressive when competing against videostreaming applications, using over 75% of the link capacity.In contrast, Teams uses less than 25%while competing with

10

0 25 50 75 100 125 150 175Time [sec]

0.0

0.5

1.0

1.5

2.0

Dow

nstr

eam

Bitr

ate

[Mbp

s] Zoom iPerf3

Figure 13: Example of how Zoom probing can adversely affect com-peting applications

0 25 50 75 100 125 150 175Time [sec]

0.0

0.2

0.4

Dow

nstr

eam

Bitr

ate

[Mbp

s] Zoom Netflix

(a)Downstream bitrate for Zoom and Netflix

0 25 50 75 100 125 150Time [sec]

0

10

20

TCP

Flow

#

(b) TCP connections opened by Netflix

Figure 14: Netflix and Zoom competition on a 0.5 Mbps capacity link.

YouTube and Netflix at 0.5 Mbps. Figure 14a shows this effect,when a Netflix client competes with an incumbent Zoomclient at 0.5 Mbps downstream capacity. Zoom achieves anaverage throughput around 0.4 Mbps while Netflix strugglesto reachmore than0.1Mbps.Weobserve this effect despite thefact that Netflix is known to using multiple TCP connections,especially when capacity is limited. Figure 14b shows theTCP connections opened by Netflix during the 120-secondexperiment. In total, Netflix opens 28 TCP connections, eachwith more than than 100 Kbits of data; at one point, it opens11 parallel TCP connections. Despite this behavior, openingmultiple TCP connections does not improve link sharing forNetflix while competing with Zoom.

6 CALLMODALITIESPeople now increasingly rely on video conferencing for re-motework, school, and even large virtual conferences (such asthis one!), particularly during theCOVID-19pandemic,whichhas shiftedmany in-person activities to virtual forums. Thesesettings involve many participants, which raises questionsconcerning performance and network utilization of VCAs insettings involving multiple users. We study network utiliza-tion under two prominent call modalities:(1) the number of participants in a call and(2) the viewing mode.

We consider two viewing modes that are common across allthree VCAs: speaker mode wherein a specific user’s video ispinned on the call and gallerymode, in which all participants’video are shown on the screen. Each experiment consists ofa 2-minute call with 𝑛 users and specific viewing mode.Wevary the number of clients in the call from two to eight, acrossboth gallery and speaker modes. We perform five experimentfor each (number, viewing mode) combination; we log thenetwork utilization of Client C1 for each call. Although it isdifficult to evaluate these applications for hundreds of simulta-neous users, we can nevertheless explore trends in utilizationand performance for a smaller number of users and observevarious utilization trends. Larger-scale experiments could bea possible area for future work.

6.1 Number of UsersTo explore the effect of the number of users on utilization, wefix the viewing mode to gallery, which is the default viewingmode in all of theVCAs. Figures 15a and 15b show the averagedownstream and upstream network utilization respectivelyfor different numbers of participants. Total downstream uti-lization depends on both the number of video streams andthe data rate of each stream. Utilization in both directionstypically decreases as the number of participants increases inaMeet orZoomcall. In the case ofZoom, theuplinkutilizationdrops from 0.8 Mbps to 0.4 Mbps as number of participantsincreases from 4 to 5. For Meet, the reduction happens at n= 7, from 1 Mbps to 0.2 Mbps. Both Meet and Zoom havetiled-screen display with each user displayed in a separatetile. As the number of users increases, the tile size shrinksto accommodate all the users on the fixed size screen. Forinstance, Zoom uses a 2×2 grid for 4 participants; switchingto 5 participants creates a third row of video feeds makingeach individual video feed smaller. The sender uses this oppor-tunity to reduce the resolution of transmitted video, leadingto reduction in upstream utilization.We observe a similar reduction in downstream utilization

at 5 and 7 participants for Zoom andMeet, respectively (Fig-ure 15a). However, there are also notable differences whencompared to the uplink utilization. For instance, the downlink

11

2 3 4 5 6 7 8Number of Participants

0

1

2

3

Dow

nlin

k B

itrat

e [M

bps] Meet Teams Zoom

(a)Downlink trafficof clientwhose video is viewedin gallery mode

2 3 4 5 6 7 8Number of Participants

0

1

2

3

Upl

ink

Bitr

ate

[Mbp

s]

Meet Teams Zoom

(b) Uplink traffic of client whose video is viewedin gallery mode

2 3 4 5 6 7 8Number of Participants

0

1

2

3

Upl

ink

Bitr

ate

[Mbp

s]

Meet Teams Zoom

(c) Uplink traffic of client whose video is pinnedby all other participants (speaker mode)

Figure 15: Network utilization in different viewing modes. The bands represent 90% confidence intervals.

utilization for Google Meet increases from 1.25 Mbps to 2.5Mbps when the number of participants increases from 3 to 6,while upstream utilization stays mostly constant. The trendis similar for Zoom as the number of participants increasesbeyond than 5.

Teamsdoesnot exhibit these trends:upstreamutilization re-mains almost constant as the number of participants changes.Downstream utilization increases until five participants anddrops as more participants join the call. Teams has a fixed4-tile layout on Linux. It thus displays only a subset of par-ticipants if a call has more than four participants which mayexplain why the sending rate does not change, particularlyas the number of videos and the frame size may not changesignificantly with more users. It is, however, not clear whythe downstream utilization decreases in calls with more thanparticipants; this phenomenon deserves more exploration.

6.2 ViewingModeFor all three VCAs, viewing a user’s video in speaker modeleads to greater uplink consumption on that user’s networkas compared to gallerymode. Putting C1 on the speakermodeenlarges its tile size onother users’ screen. The sender streamsthe video at a high resolution to provide a high better userexperience, thus leading to increase in uplink utilization com-pared to when C1 was not pinned by other users. Zoom andMeet consistently send at 1 Mbps when all clients pin C1’svideo, regardless of thenumberof participants.Note that, onlyone client needed to put C1 on speaker for this behavior. Eachclient can decide to pin any client independently of others.

The behavior of Teams differs fromMeet and Zoom in thisregard, as well. C1’s uplink utilization continues to increasefrom 1.25 Mbps with 3 participants to 2.9 Mbps participantswhen it is put on speaker mode. We checked if the increasecould be attributed to Teams communicating with multipledestinations (e.g., with each user separately). However, we

observe that all of the traffic was directed to a single server.It is not clear what contributes to the increase in traffic forTeams but this clearly leads to inefficient network utilization,especially when compared to Zoom andMeet.

Takeaways: Each participant’s video layout affectsclient’s networkutilization.Callswithmore participantscan decrease upstream and downstream utilization foreach participant, depending on how the VCA displaysparticipant video. The settings of one participant (e.g.,pinning a video, using speaker mode vs. gallery mode)can also affect the upstream utilization of other partici-pants.

7 RELATEDWORKVCAmeasurement: Some of the early VCAmeasurementwork has focused on uncovering the design of the Skypefocusing on its streaming protocols [2], architecture [11],traffic characterization [6], and application performance [13].More recent work has included other VCAs and streamingcontexts [1, 31, 32]. Xu et al. [31] use controlled experimentsto study the design and performance of Google+, iChat, andSkype. The work is further extended to include performanceof the three services on mobile video calls [32].

Closest to ourwork iswork by Jansen et al. [14] andNisticoet. al [21]. Jansen et al. evaluateWebRTC performance usingtheir custom VCA under controlled network conditions [14].Emulating similar network conditions, we consider perfor-mance of commercial andmore recent VCAs that widely usedfor education and work. Even between tested VCAs usingWebRTC, namely Meet and Teams-Chrome, we find signifi-cant performance differences, likely due to different designparameters (e.g., codecs, default bitrates). Nistico et al. [21]consider a wider range of recent VCAs, focusing on their de-sign differences including protocols and architecture. Our

12

work provides a complimentary performance analysis for asubset of the VCAs studied by them.We use the insights fromtheir work to explain the differences among VCAs’ networkutilization and performance under similar streaming contexts.VCA congestion control: Several congestion control algo-rithms have been proposed for VCAs. These algorithms relyon a variety of signals such as loss [12], delay [8], and evenVCA performance metrics [28] for rate control. For instance,GoogleCongestionControl [8], also implemented inWebRTC,uses one-way delay gradient for adjusting the sender ratewhile SCReAM [15] relies on both loss and delay along withTCP-like self-clocking. While the VCAs may use one or moreof these variants, the exact implementation of the algorithmand parameter values vary and is proprietary. In thiswork,westudy the efficacy of the VCA congestion control in the case oftransient interruptions andbackgroundapplications.A recentstudy by Sander et al. evaluates Zoom’s congestion controlalong these dimensions [24]. Our work observes similar re-sults for Zoom and also analyses more VCAs, including theirfairness to each other and other popular internet applications,namely YouTube (QUIC-based) and Netflix (TCP-based).

8 FUTUREWORKGeneralizability to otherVCAcontexts:While this paperfocuses on three popular video conferencing applications, themethods in this paper could be used to measure utilizationand performance of other VCAs, across different device plat-forms. Our framework, using PyAutoGUI for automation, canbe applied to other VCAs. It can also work on other deviceplatforms based on Linux, MacOS, orWindows. Furthermore,we can include other network profiles that represent othercontexts, such asWiFi and cellular. We plan to release all ofour experiment data and code so that it others can extend thiswork to other contexts.Application performancemetrics: Analyzing applicationperformance statistics can shedmore light on the VCA behav-ior and user quality of experience, as is also seen through asubset of our results usingWebRTC statistics in Section 3.2.It is challenging to obtain application performance metricsfor all VCAs, especially native clients. Future research couldexplore the methods from related work, such as using an-notated videos [31] or network traffic captures [10] to inferapplication metrics.Performance under other network conditions: Othernetwork factors such as latency, packet loss, and jitter could

affect VCA performance and utilization. Future work couldexplore the effects of these parameters. Experiments in this pa-per also focus on issues in the last-mile access network, whenthere could be problems anywhere along the end-to-end path.However, emulating all kinds of impairments is challengingto accomplish in-lab. An alternative could be to collect VCAperformance metrics from a large number of users in the realworld.Further exploration of VCA design: Some of our resultsrevealunusualVCAbehavior thatwe founddifficult to explainand deserve further exploration. Examples include: (1) SomeVCAs exhibit different behavior depending on if the competi-tion exists in theupstreamordownstreamdirection, (2)Teamsutilization is not consistent with the video layout for differ-ent call modalities; and (3) Meet shows unusual encodingbehavior (specifically, increased frame rates) when networkcapacity is very low.

9 CONCLUSIONWeanalyzed thenetworkutilization andperformance of threemajor VCAs under diverse network conditions and usagemodalities. Our results show that VCAs vary significantlyboth in terms of their resource utilization and performance,especially when capacity is constrained. Differences in behav-ior canbe attributed todifferentVCAdesignparameter values,congestion control algorithms, and media transport mecha-nisms. The VCA response also varies depending on whetherconstraints exist on the uplink or downlink. Different behav-ior often arises due to both differences in encoding strategies,as well as how the VCAs rely on an intermediate server todeliver video streams and adapt transmission to changingconditions. Finally, the network utilization of each VCA canvary significantly depending on the call modality. Somewhatcounterintuitively, calls with more participants can actuallyreduce any one participant’s upstream utilization, becausechanges in the video layout on the user screen ultimatelylead to changes in the sent video resolutions. Future workcould extend this analysis to more VCAs, device platforms,and network impairments. In general, however, performancecan begin to degrade at upstream capacities below 1.5 Mbps,and some VCAs do not compete well with other TCP streamsunder constrained settings, suggesting the possible need forthe FCC to reconsider its 25/3 Mbps standard for defining“broadband”.

13

REFERENCES[1] Abdullah Azfar, Kim-Kwang Raymond Choo, and Lin Liu. 2016. An-

droid mobile VoIP apps: a survey and examination of their security andprivacy. Electronic Commerce Research 16, 1 (2016), 73–111.

[2] SalmanABaset andHenningSchulzrinne. 2004. Ananalysisof the skypepeer-to-peer internet telephony protocol. arXiv preprint cs/0412017(2004).

[3] Mark Baugher, DMcGrew, MNaslund, E Carrara, and Karl Norrman.2004. The secure real-time transport protocol (SRTP).

[4] Juraj Bienik, Miroslav Uhrina, Michal Kuba, and Martin Vaculik. 2016.Performance of H. 264, H. 265, VP8 and VP9 Compression Standardsfor High Resolutions. In 2016 19th International Conference on Network-Based Information Systems (NBiS). IEEE.

[5] Bitag. 2021. 2020 pandemic network performance. https://bitag.org/documents/bitag_report.pdf.

[6] Dario Bonfiglio, Marco Mellia, Michela Meo, Nicolo Ritacca, and DarioRossi. 2008. Tracking down skype traffic. In IEEE INFOCOM 2008-The27th Conference on Computer Communications. IEEE.

[7] Dario Bonfiglio, Marco Mellia, Michela Meo, and Dario Rossi. 2008.Detailed analysis of skype traffic. IEEE Transactions on Multimedia(2008).

[8] Gaetano Carlucci, Luca De Cicco, Stefan Holmer, and Saverio Mascolo.2016. Analysis and design of the google congestion control forweb real-time communication (WebRTC). In Proceedings of the 7th InternationalConference on Multimedia Systems. 1–12.

[9] Romuald Corbel, Stéphane Tuffin, Annie Gravey, Xavier Marjou, andArnaudBraud. 2019. Assessing the impact ofQUIConnetwork fairness.In 2nd International Conference on Communication and Network Protocol(ICCNP 2019).

[10] Mallesham Dasari, Shruti Sanadhya, Christina Vlachou, Kyu-Han Kim,and Samir R Das. 2018. Scalable ground-truth annotation for video qoemodeling in enterprise wifi. ACM. In IEEE IWQoS.

[11] Saikat Guha and Neil Daswani. 2005. An experimental study of the skypepeer-to-peer voip system. Technical Report. Cornell University.

[12] Mark Handley, Sally Floyd, Jitendra Padhye, and Jörg Widmer. 2003.TCP friendly rate control (TFRC): Protocol specification. Technical Report.RFC.

[13] Tobias Hoßfeld and Andreas Binzenhöfer. 2008. Analysis of Skype VoIPtraffic in UMTS: End-to-end QoS and QoE measurements. ComputerNetworks (2008).

[14] Bart Jansen, Timothy Goodwin, Varun Gupta, Fernando Kuipers, andGil Zussman. 2018. Performance evaluation ofWebRTC-based videoconferencing.ACMSIGMETRICSPerformanceEvaluationReview (2018).

[15] Ingemar Johansson and Zaheduzzaman Sarker. 2015. Self-clocked rateadaptation for multimedia. draft-johansson-rmcat-scream-cc-05 (workin progress) (2015).

[16] Citizen Lab. 2020. Zoom uses a bespoke extension of RTP.https://citizenlab.ca/2020/04/move-fast-roll-your-own-crypto-a-quick-look-at-the-confidentiality-of-zoom-meetings/.

[17] Qiyong Liu, Zhaofeng Jia, Kai Jin, Jing Wu, and Huipin Zhang. 2019.Error resilience for interactive real-time multimedia application. USPatent 10,348,454.

[18] Shinan Liu, Paul Schmitt, Francesco Bronzino, and Nick Feamster. 2020.Characterizing Service Provider Response to the COVID-19 Pandemicin the United States. arXiv preprint arXiv:2011.00419 (2020).

[19] Common Sense Media. 2021. LOOKING BACK, LOOKING FORWARD:What it will take to permanently close the K–12 digital divide. https://www.commonsensemedia.org/about-us/news/press-releases/the-us-k-12-digital-divide-has-narrowed-but-must-close-to-eliminate.

[20] Marcin Nagy, Varun Singh, Jörg Ott, and Lars Eggert. 2014. Congestioncontrol using fec for conversational multimedia communication. InProceedings of the 5th ACMMultimedia Systems Conference. 191–202.

[21] AntonioNistico, DenaMarkudova,MartinoTrevisan,MichelaMeo, andGiovanna Carofiglio. 2020. A comparative study of RTC applications.In 2020 IEEE International Symposium on Multimedia (ISM). IEEE.

[22] ReadTheDocs. 2020. PyAutoGUI: Application automation. https://pyautogui.readthedocs.io/en/latest/.

[23] Jonathan Rosenberg, Henning Schulzrinne, Gonzalo Camarillo, AlanJohnston, Jon Peterson, Robert Sparks, Mark Handley, Eve Schooler,et al. 2002. SIP: session initiation protocol.

[24] Constantin Sander, Ike Kunze, KlausWehrle, and Jan Rüth. [n.d.]. VideoConferencing and Flow-Rate Fairness: A First Look at Zoom and theImpact of Flow-Queuing AQM. ([n. d.]).

[25] Henning Schulzrinne, Steven Casner, R Frederick, and Van Jacobson.2003. RFC3550: RTP: A transport protocol for real-time applications.

[26] Henning Schulzrinne, Stephen Casner, Ron Frederick, Van Jacobson,et al. 1996. RTP: A transport protocol for real-time applications.

[27] Selenium. 2017. Selenium: Web browser automation.[28] Varun Singh, Jörg Ott, and Igor DD Curcio. 2012. Rate-control for

conversational video communication in heterogeneous networks. In2012 IEEE International Symposium on aWorld of Wireless, Mobile andMultimedia Networks (WoWMoM). IEEE, 1–7.

[29] Email ToolTester. 2021. Video conferencingmarket share. https://www.emailtooltester.com/en/blog/video-conferencing-market-share/.

[30] W3. 2020. WebRTC Stats. https://www.w3.org/TR/webrtc-stats/.[31] Yang Xu, Chenguang Yu, Jingjiang Li, and Yong Liu. 2012. Video tele-

phony for end-consumers: Measurement study of Google+, iChat, andSkype. In IMC. ACM.

[32] Chenguang Yu, Yang Xu, Bo Liu, and Yong Liu. 2014. “Can you SEE menow?” Ameasurement study of mobile video calls. In IEEE INFOCOM2014-IEEE Conference on Computer Communications. IEEE, 1456–1464.

[33] Jay Zhu, Doug Jones, and Sheldon Webster. 2020. TestingBandwidth Usage of Popular Video Conferencing Applications.https://www.cablelabs.com/testing-bandwidth-usage-of-popular-video-conferencing-applications

[34] Zoom. 2019. Here’s How Zoom Provides Industry-Leading Video Ca-pacity. https://blog.zoom.us/zoom-can-provide-increase-industry-leading-video-capacity/.

[35] Zoom. 2021. Zoom API. https://marketplace.zoom.us/docs/api-reference/zoom-api/.

14