Live streaming with receiver based peer-division multiplexing

1

Live Streaming withReceiver-based Peer-division Multiplexing

Hyunseok Chang Sugih Jamin Wenjie Wang

Abstract—A number of commercial peer-to-peer systems forlive streaming have been introduced in recent years. The behaviorof these popular systems has been extensively studied in severalmeasurement papers. Due to the proprietary nature of thesecommercial systems, however, these studies have to rely on a“black-box” approach, where packet traces are collected froma single or a limited number of measurement points, to infervarious properties of traffic on the control and data planes.Although such studies are useful to compare different systemsfrom end-user’s perspective, it is difficult to intuitively under-stand the observed properties without fully reverse-engineeringthe underlying systems. In this paper we describe the networkarchitecture of Zattoo, one of the largest production live stream-ing providers in Europe at the time of writing, and present alarge-scale measurement study of Zattoo using data collected bythe provider. To highlight, we found that even when the Zattoosystem was heavily loaded with as high as 20,000 concurrentusers on a single overlay, the median channel join delay remainedless than 2 to 5 seconds, and that, for a majority of users, thestreamed signal lags over-the-air broadcast signal by no morethan 3 seconds.

Index Terms—Peer-to-peer system, live streaming, networkarchitecture

I. I NTRODUCTION

T HERE is an emerging market for IPTV. Numerous com-mercial systems now offer services over the Internet that

are similar to traditional over-the-air, cable, or satellite TV.Live television, time-shifted programming, and content-on-demand are all presently available over the Internet. Increasedbroadband speed, growth of broadband subscription base, andimproved video compression technologies have contributedtothe emergence of these IPTV services.

We draw a distinction between three uses of peer-to-peer(P2P) networks: delay tolerant file download of archival ma-terial, delay sensitive progressive download (or streaming) ofarchival material, and real-time live streaming. In the firstcase, the completion of download is elastic, depending onavailable bandwidth in the P2P network. The applicationbuffer receives data as it trickles in and informs the userupon the completion of download. The user can then startplaying back the file for viewing in the case of a videofile. Bittorrent and variants are example of delay-tolerantfiledownload systems. In the second case, video playback startsas

H. Chang is with Alcatel-Lucent Bell Labs, Holmdel, NJ 07733USA (e-mail: [email protected]).

S. Jamin is with EECS Department, University of Michigan, Ann Arbor,MI 48109 USA (e-mail: [email protected]).

W. Wang is with IBM Ressearch, CRL, Beijing 100193, China (e-mail:[email protected]).

This work was done when authors Chang and Wang were at Zattoo Inc.

soon as the application assesses it has sufficient data bufferedthat, given the estimated download rate and the playback rate,it will not deplete the buffer before the end of file. If thisassessment is wrong, the application would have to eitherpause playback and rebuffer, or slow down playback. Whileusers would like playback to start as soon as possible, theapplication has some degree of freedom in trading off playbackstart time against estimated network capacity. Most video-on-demand systems are examples of delay-sensitive progressive-download application. The third case, real-time live streaming,has the most stringent delay requirement. While progressivedownload may tolerate initial buffering of tens of seconds oreven minutes, live streaming generally cannot tolerate morethan a few seconds of buffering. Taking into account thedelay introduced by signal ingest and encoding, and networktransmission and propagation, the live streaming system canintroduce only a few seconds of buffering time end-to-end andstill be considered “live” [1].

The Zattoo peer-to-peer live streaming system was a free-to-use network serving over 3 million registered users in eightEuropean countries at the time of study, with a maximumof over 60,000 concurrent users on a single channel. Thesystem delivers live streams using areceiver-based, peer-division multiplexingscheme as described in Section II. Toensure real-time performance when peer uplink capacity isbelow requirement, Zattoo subsidizes the network’s bandwidthrequirement, as described in Section III. After delving intoZattoo’s architecture in detail, we study in Sections IV andVlarge-scale measurements collected during the live broadcastof the UEFA European Football Championship, one of themost popular one-time events in Europe, in June, 2008 [2].During the course of the month of June 2008, Zattoo servedmore than 35 million sessions to more than one million distinctusers. Drawing from these measurements, we report on theoperational scalability of Zattoo’s live streaming systemalongseveral key issues:

1) How does the system scale in terms of overlay size andits effectiveness in utilizing peers’ uplink bandwidth?

2) How responsive is the system during channel switching,for example, when compared to the 3-second channelswitch time of satellite TV?

3) How effective is the packet retransmission scheme inallowing a peer to recover from transient congestion?

4) How effective is the receiver-based peer-division multi-plexing scheme in delivering synchronized sub-streams?

5) How effective is the global bandwidth subsidy systemin provisioning for flash crowd scenarios?

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 1, FEB 2011

2

6) Would a peer further away from the stream sourceexperience adversely long lag compared to a peer closerto the stream source?

7) How effective is error-correcting code in isolating packetlosses on the overlay?

We also discuss in Section VI several challenges in in-creasing the bandwidth contribution of Zattoo peers. Finally,we describe related works in Section VII and conclude inSection VIII.

II. SYSTEM ARCHITECTURE

The Zattoo system rebroadcasts live TV, captured fromsatellites, onto the Internet. The system carries each TVchannel on a separate peer-to-peer delivery network and is notlimited in the number of TV channels it can carry. Althougha peer can freely switch from one TV channel to another, andthereby departing and joining different peer-to-peer networks,it can only join one peer-to-peer network at any one time.We henceforth limit our description of the Zattoo deliverynetwork as it pertains to carrying one TV channel. Fig. 1shows a typical setup of a single TV channel carried on theZattoo network. TV signal captured from satellite is encodedinto H.264/AAC streams, encrypted, and sent onto the Zattoonetwork. The encoding server may be physically separatedfrom the server delivering the encoded content onto the Zattoonetwork. For ease of exposition, we will consider the two aslogically co-located on anEncoding Server. Users are requiredto register themselves at the Zattoo website to download a freecopy of the Zattoo player application. To receive the signalof a channel, the user first authenticates itself to the ZattooAuthentication Server. Upon authentication, the user is granteda ticket with limited lifetime. The user then presents this ticket,along with the identity of the TV channel of interest, to theZattooRendezvous Server. If the ticket specifies that the useris authorized to receive signal of the said TV channel, theRendezvous Server returns to the user a list of peers currentlyjoined to the P2P network carrying the channel, together witha signed channel ticket. If the user is the first peer to join achannel, the list of peers it receives contain only the EncodingServer. The user joins the channel by contacting the peersreturned by the Rendezvous Server, presenting its channelticket, and obtaining the live stream of the channel from them(see Section II-A for details).

Each live stream is sent out by the Encoding Server asn logical sub-streams. The signal received from satellite isencoded into a variable-bit rate stream. During periods ofsource quiescence, no data is generated. During source busyperiods, generated data is packetized into a packet stream,with each packet limited to a maximum size. The EncodingServer multiplexes this packet stream onto the Zattoo networkas n logical sub-streams. Thus the first packet generated isconsidered part of the first sub-stream, the second packet thatof the second sub-stream, then-th packet that of then-th sub-stream. Then+1-th packet cycles back to the first sub-stream,etc. such that thei-th sub-stream carries themn+i-th packets,where m ≥ 0, 1 ≤ i ≤ n, and n a user-defined constant.We call a set ofn packets with the same index multiplierm

Rendezvous Server

Authentication Server

Feedback Server

other admin servers

Demultiplexer

Encoding

Servers

Fig. 1. Zattoo delivery network architecture.

a “segment.” Thusm serves as the segment index, whileiserves as the packet index within a segment. Each segmentis of sizen packets. Being the packet index,i also serves asthe sub-stream index. The numbermn + i is carried in eachpacket as its sequence number.

Zattoo uses the Reed-Solomon (RS) error correcting code(ECC) for forward error correction. The RS code is a sys-tematic code: of then packets sent per segment,k < npackets carry the live stream data while the remainder carriesthe redundant data [3, Section 7.3]. Due to the variable-bitrate nature of the data stream, the time period covered by asegment is variable, and a packet may be of size less than themaximum packet size. A packet smaller than the maximumpacket size is zero-padded to the maximum packet size forthe purposes of computing the (shortened) RS code, but istransmitted in its original size. Once a peer has receivedkpackets per segment, it can reconstruct the remainingn − kpackets. We do not differentiate between streaming data andredundant data in our discussion in the remainder of this paper.

When a new peer requests to join an existing peer, itspecifies the sub-stream(s) it would like to receive from theexisting peer. These sub-streams do not have to be consecutive.Contingent upon availability of bandwidth at existing peers,the receiving peer decides how to multiplex a stream ontoits set of neighboring peers, giving rise to our descriptionofthe Zattoo live streaming protocol as a receiver-based, peer-division multiplexing protocol. The details of peer-divisionmultiplexing is described in Section II-A while the detailsofhow a peer manages sub-stream forwarding and stream recon-struction is described in Section II-B. Receiver-based peer-division multiplexing has also been used by the latest versionof CoolStreaming peer-to-peer protocol though it differs fromZattoo in its stream management (Section II-B) and adaptivebehavior (Section II-C) [4].

A. Peer-Division Multiplexing

To minimize per-packet processing time of a stream, theZattoo protocol sets up a virtual circuit with multiple fan outsat each peer. When a peer joins a TV channel, it establishesa peer-division multiplexing (PDM) scheme amongst a set ofneighboring peers, by building a virtual circuit to each of the


3

neighboring peers. Baring departure or performance degrada-tion of a neighbor peer, the virtual circuits are maintaineduntil the joining peer switches to another TV channel. With thevirtual circuits set up, each packet is forwarded without furtherper-packet handshaking between peers. We describe the PDMboot strapping mechanism in this section and the adaptivePDM mechanism to handle peer departure and performancedegradation in Section II-C.

The PDM establishment process consists of two phases:the searchphase and thejoin phase. In the search phase, thenew, joining peer determines its set of potential neighbors. Inthe join phase, the joining peer requests peering relationshipswith a subset of its potential neighbors. Upon acceptance ofapeering relationship request, the peers become neighbors anda virtual circuit is formed between them.

Search phase.To obtain a list of potential neighbors, ajoining peer sends out a SEARCH message to a random subsetof the existing peers returned by the Rendezvous Server. TheSEARCH message contains the sub-stream indices for whichthis joining peer is looking for peering relationships. Thesub-stream indices is usually represented as a bitmask ofn bits,where n is the number of sub-streams defined for the TVchannel. In the beginning, the joining peer will be looking forpeering relationships for all sub-streams and have all the bitsin the bitmask turned on. In response to a SEARCH message,an existing peer replies with the number of sub-streams it canforward. From the returning SEARCH replies, the joining peerconstructs a set of potential neighbors that covers the fullset ofsub-streams comprising the live stream of the TV channel. Thejoining peer continues to wait for SEARCH replies until theset of potential neighbors contains at least a minimum numberof peers, or until all SEARCH replies have been received.With each SEARCH reply, the existing peer also returns arandom subset of its known peers. If a joining peer cannotform a set of potential neighbors that covers all of the sub-streams of the TV channel, it initiates another SEARCH round,sending SEARCH messages to peers newly learned from theprevious round. The joining peer gives up if it cannot obtainthe full stream after two SEARCH rounds. To help the joiningpeer synchronize the sub-streams it receives from multiplepeers, each existing peer also indicates for each sub-streamthe latest sequence number it has received for that sub-stream,and the existence of any quality problem. The joining peer canthen choose sub-streams with good quality that are closelysynchronized.

Join phase.Once the set of potential neighbors is estab-lished, the joining peer sends JOIN requests to each potentialneighbor. The JOIN request lists the sub-streams for whichthe joining peer would like to construct virtual circuit with thepotential neighbor. If a joining peer hasl potential neighbors,each willing to forward it the full stream of a TV channel, itwould typically choose to have each forward only1/l-th of thestream, to spread out the load amongst the peers and to speedup error recovery, as described in Section II-C. In selectingwhich of the potential neighbors to peer with, the joiningpeer gives highest preference to topologically close-by peers,even if these peers have less capacity or carry lower qualitysub-streams. The “topological” location of a peer is defined

Zattoo

Peer and

Player

Application

PDM

Fig. 2. Zattoo peer with IOB.

to be its subnet number, autonomous system (AS) number,and country code, in that order of precedence. A joiningpeer obtains its own topological location from the ZattooAuthentication Server as part of its authentication process.The list of peers returned by both the Rendezvous Serverand potential neighbors all come attached with topologicallocations. A topology-aware overlay not only allows us tobe “ISP-friendly,” by minimizing inter-domain traffic andthus save on transit bandwidth cost, but also helps reducethe number of physical links and metro hops traversed inthe overlay network, potentially resulting in enhanced user-perceived stream quality.

B. Stream Management

We represent a peer as a packet buffer, called the IOB,fed by sub-streams incoming from the PDM constructed asdescribed in Section II-A.1 The IOB drains to (1) a localmedia player if one is running, (2) a local file if recordingis supported, and (3) potentially other peers. Fig. 2 depictsa Zattoo player application with virtual circuits established tofour peers. As packets from each sub-stream arrive at the peer,they are stored in the IOB for reassembly to reconstruct thefull stream. Portions of the stream that have been reconstructedare then played back to the user. In addition to providinga reassembly area, the IOB also allows a peer to absorbsome variabilities in available network bandwidth and networkdelay.

The IOB is referenced by aninput pointer, a repair pointer,and one or moreoutput pointers. The input pointer pointsto the slot in the IOB where the next incoming packet withsequence number higher than the highest sequence number

1In the case of the Encoding Server, which we also consider a peer on theZattoo network, the buffer is fed by the encoding process.


4

received so far will be stored. The repair pointer alwayspoints one slot beyond the last packet received in order and isused to regulate packet retransmission and adaptive PDM asdescribed later. We assign an output pointer to each forwardingdestination. The output pointer of a destination indicatesthedestination’s current forwarding horizon on the IOB. In accor-dance to the three types of possible forwarding destinationslisted above, we have three types of output pointers:playerpointer, file pointer, and peer pointer. One would typicallyhave at most one player pointer and one file pointer butpotentially multiple concurrent peer pointers, referencing anIOB. The Zattoo player application does not currently supportrecording.

Since we maintain the IOB as a circular buffer, if theincoming packet rate is higher than the forwarding rate of aparticular destination, the input pointer will overrun theoutputpointer of that destination. We could move the output pointerto match the input pointer so that we consistently forward theoldest packet in the IOB to the destination. Doing so, however,requires checking the input pointer against all output pointerson every packet arrival. Instead, we have implemented the IOBas a double buffer. With the double buffer, the positions of theoutput pointers are checked against that of the input pointeronly when the input pointer moves from one sub-buffer to theother. When the input pointer moves from sub-buffera to sub-bufferb, all the output pointers still pointing to sub-bufferb aremoved to the start of sub-buffera and sub-bufferb is flushed,ready to accept new packets. When a sub-buffer is flushedwhile there are still output pointers referencing it, packets thathave not been forwarded to the destinations associate withthose pointers are lost to them, resulting in quality degradation.To minimize packet lost due to sub-buffer flushing, we wouldlike to use large sub-buffers. However, the real-time delayrequirement of live streaming limits the usefulness of latearriving packets and effectively puts a cap on the maximumsize of the sub-buffers.

Different peers may request for different numbers of, possi-bly non-consecutive, sub-streams. To accommodate the differ-ent forwarding rates and regimes required by the destinations,we associate a packet map and forwarding discipline with eachoutput pointer. Fig. 3 shows the packet map associated with anoutput peer pointer where the peer has requested sub-streams1, 4, 9, and 14. Every time a peer pointer is repositionedto the beginning of a sub-buffer of the IOB, all the packetslots of the requested sub-streams are marked NEEDed andall the slots of the sub-streams not requested by the peer aremarked SKIP. When a NEEDed packet arrives and is storedin the IOB, its state in the packet map is changed to READY.As the peer pointer moves along its associated packet map,READY packets are forwarded to the peer and their stateschanged to SENT. A slot marked NEEDed but not READY,such as slotn + 4 in Fig. 3, indicates that the packet is lostor will arrive out-of-order and is bypassed. When an out-of-order packet arrives, its slot is changed to READY and thepeer pointer is reset to point to this slot. Once the out-of-orderpacket has been sent to the peer, the peer pointer will moveforward, bypassing all SKIP, NEED, and SENT slots until itreaches the next READY slot, where it can resume sending.

SKIP NEED

SENT READY

1 n

segment 0

4 9 14

segment m-1

Fig. 3. Packet map associated with a peer pointer.

Peer0

Peer1

Player

File

input

pointer

...

...

PacketMap

PacketMap

PacketMap

segment 0

segment m-1

segment size: n

sub-buffer 0

sub-buffer 1

received packet empty slot

segment 0

segment m-1

IOB

PacketMap

repair

pointer

Fig. 4. IOB, input/output pointers and packet maps.

The player pointer behaves the same as a peer pointer exceptthat all packets in its packet map will always start out markedNEEDed.

Fig. 4 shows an IOB consisting of a double buffer, with aninput pointer, a repair pointer, and an output file pointer, anoutput player pointer, and two output peer pointers referencingthe IOB. Each output pointer has a packet map associatedwith it. For the scenario depicted in the figure, the playerpointer tracks the input pointer and has skipped over somelost packets. Both peer pointers are lagging the input pointer,indicating that the forwarding rates to the peers are bandwidthlimited. The file pointer is pointing at the first lost packet.Archiving a live stream to file does not impose real-time delaybound on packet arrivals. To achieve the best quality recordingpossible, a recording peer always waits for retransmissionoflost packets that cannot be recovered by error correction.

In addition to achieving lossless recording, we use re-transmission to let a peer recover from transient networkcongestion. A peer sends out a retransmission request whenthe distance between the repair pointer and the input pointerhas reached a threshold ofR packet slots, usually spanningmultiple segments. A retransmission request consists of anR-bit packet mask, with each bit representing a packet, and thesequence number of the packet corresponding to the first bit.Marked bits in the packet mask indicate that the correspondingpackets need to be retransmitted. When a packet loss isdetected, it could be caused by congestion on the virtualcircuits forming the current PDM or congestion on the pathbeyond the neighboring peers. In either case, current neighbor


5

peers will not be good sources of retransmitted packets. Hencewe send our retransmission requests tor random peers that arenot neighbor peers. A peer receiving a retransmission requestwill honor the request only if the requested packets are stillin its IOB and it has sufficient left-over capacity, after servingits current peers, to transmit all the requested packets. Once aretransmission request is accepted, the peer will retransmit allthe requested packets to completion.

C. Adaptive PDM

While we rely on packet retransmission to recover fromtransient congestions, we have two channel capacity adjust-ment mechanisms to handle longer-term bandwidth fluctua-tions. The first mechanism allows aforwarding peer to adaptthe number of sub-streams it will forward given its currentavailable bandwidth, while the second allows thereceivingpeer to switch provider at the sub-stream level.

Peers on the Zattoo network can redistribute a highlyvariable number of sub-streams, reflecting the high variabilityin uplink bandwidth of different access network technologies.For a full-stream consisting of sixteenconstant-bit rate sub-streams, our prior study show that based on realistic peercharacteristics measured from the Zattoo network, half of thepeers can support less than half of a stream, 82% of peers cansupport less than a full-stream, and the remainder can supportup to ten full streams (peers that can redistribute more thana full stream is conventionally known assupernodesin theliterature) [5]. With variable-bit rate streams, the bandwidthcarried by each sub-stream is also variable. To increase peerbandwidth usage, without undue degradation of service, weinstituted measurement-based admission control at each peer.In addition to controlling resource commitment, another goalof the measurement-based admission control module is tocontinually estimate the amount of available uplink bandwidthat a peer.

The amount of available uplink bandwidth at a peer isinitially estimated by the peer sending a pair of probe packetsto Zattoo’sBandwidth Estimation Server. Once a peer startsforwarding sub-streams to other peers, it will receive fromthose peers quality-of-service feedbacks that inform its updateof available uplink bandwidth estimate. A peer sends quality-of-service feedback only if the quality of a sub-stream dropsbelow a certain threshold.2 Upon receiving quality feedbackfrom multiple peers, a peer first determines if the identifiedsub-streams are arriving in low quality. If so, the low quality ofservice may not be caused by limit on its own available uplinkbandwidth; in which case, it ignores the low quality feedbacks.Otherwise, the peer decrements its estimate of available uplinkbandwidth. If the new estimate is below the bandwidth neededto support existing number of virtual circuits, the peer closes

2Depending on a peer’s NAT and/or firewall configuration, Zattoo useseither UDP or TCP as the underlying transport protocol. The quality of a sub-stream is measured differently for UDP and TCP. A packet is considered lostunder UDP if it doesn’t arrive within a fixed threshold. The quality measurefor UDP is computed as a function of both the packet lost rate and the bursterror rate (number of contiguous packet losses). The quality measure for TCPis defined to be how far behind a peer is, relative to other peers, in servingits sub-streams.

a virtual circuit. To reduce the instability introduced into thenetwork, a peer closes first the virtual circuit carrying thesmallest number of sub-streams.

A peer attempts to increase its available uplink bandwidthestimate periodically: if it has fully utilized its currentestimateof available uplink bandwidth without triggering any badquality feedback from neighboring peers. A peer doubles theestimated available uplink bandwidth if current estimate isbelow a threshold, switching to linear increase above thethreshold, similar to how TCP maintains its congestion win-dow size. A peer also increases its estimate of available uplinkbandwidth if a neighbor peer departs the network without anybad quality feedback.

When the repair pointer lags behind the input pointer byRpacket slots, in addition to initiating a retransmission request,a peer also computes a loss rate over theR packets. If theloss rate is above a threshold, the peer considers the neighborslow and attempts to reconfigure its PDM. In reconfiguringits PDM, a peer attempts to shift half of the sub-streamscurrently forwarded by the slow neighbor to other existingneighbors. At the same time, it searches for new peer(s) toforward these sub-streams. If new peer(s) are found, the loadwill be shifted from existing neighbors to the new peer(s).If sub-streams from the slow neighbor continues to sufferafter the reconfiguration of the PDM, the peer will drop theneighbor completely and initiate another reconfiguration of thePDM. When a peer loses a neighbor due to reduced availableuplink bandwidth at the neighbor or due to neighbor departure,it also initiates a PDM reconfiguration. A peer may alsoinitiate a PDM reconfiguration to switch to a topologicallycloser peer. Similar to the PDM establishment process, PDMreconfiguration is accomplished by peers exchanging sub-stream bitmasks in a request/response handshake, with eachbit of the bitmask representing a sub-stream. During and aftera PDM reconfiguration, slow neighbor detection is disabledfor a short period of time to allow for the system to stabilize.

III. G LOBAL BANDWIDTH SUBSIDY SYSTEM

Each peer on the Zattoo network is assumed to serve auser through a media player, which means that each peermust receive, and can potentially forward, alln sub-streamsof the TV channel the user is watching. The limited redistri-bution capacity of peers on the Zattoo network means thata typical client can contribute only a fraction of the sub-streams that make up a channel. This shortage of bandwidthleads to a global bandwidth deficit in the peer-to-peer net-work. Whereas bittorrent-like delay-tolerant file downloads orthe delay-sensitive progressive download of video-on-demandapplications can mitigate such global bandwidth shortage byincreasing download time, a live streaming system such asZattoo’s must subsidize the bandwidth shortfall to providereal-time delivery guarantee.

Zattoo’sGlobal Bandwidth Subsidy System(or simply, theSubsidy System), consists of a global bandwidth monitoringsubsystem, a global bandwidth forecasting and provisioningsubsystem, and a pool ofRepeater nodes. The monitoringsubsystem continuously monitors the global bandwidth re-quirement of a channel. The forecasting and provisioning


6

subsytem projects global bandwidth requirement based onmeasured history and allocates Repeater nodes to the chan-nel as needed. The monitoring and provisioning of globalbandwidth is complicated by two highly varying parametersover time, client population size and peak streaming rate, andone varying parameter over space, available uplink bandwidth,which is network-service provider dependent. Forecastingofbandwidth requirement is a vast subject in itself. Zattooadopted a very simple mechanism, described in Section III-Bwhich has performed adequately in provisioning the networkfor both daily demand fluctuations and flash crowds scenarios(see Section IV-C).

When a bandwidth shortage is projected for a channel, theSubsidy System assigns one or more Repeater nodes to thechannel. Repeater nodes function as bandwidth multiplier,toamplify the amount of available bandwidth in the network.Each Repeater node serves at most one channel at a time;it joins and leaves a given channel at the behest of theSubsidy System. Repeater nodes receive and serve allnsub-streams of the channel they join, run the same PDMprotocol, and are treated by actual peers like any other peerson the network; however, as bandwidth amplifiers, they areusually provisioned to contributemoreuplink bandwidth thanthe download bandwidth they consume. The use of Repeaternodes makes the Zattoo network a hybrid P2P and contentdistribution network.

We next describe the bandwidth monitoring subsystem ofthe Subsidy System, followed by design of the simple band-width projection and Repeater node assignment subsystem.

A. Global Bandwidth Measurement

The capacity metric of a channel is the tuple(D, C),where D is the aggregate download rates required by allusers on the channel, andC is the aggregate upload capacityof those users. UsuallyC < D and the difference betweenthe two is the global bandwidth deficit of the channel. Sincechannel viewership changes over time as users join or leavethe channel, we need a scalable means to measure and updatethe capacity metric. We rely on aggregating the capacitymetric up the peer division multiplexing tree. Each peer in theoverlay periodically aggregates the capacity metric reportedby all its downstream receiver peers, adds its own capacitymeasure(D, C) to the aggregate, and forwards the resultingcapacity metric upstream to its forwarding peers. By the timethe capacity metric percolates up to the Encoding Server, itcontains the total download and upload rate aggregates of thewhole streaming overlay. The Encoding Server then simplyforwards the obtained(D, C) to the Subsidy Server.

B. Global Bandwidth Projection and Provisioning

For each channel, the Subsidy Server keeps a history of thecapacity metric(D, C) reports received from the channel’sEncoding Server. Thechannel utilization ratio(U ) is the ratioD over C. Based on recent movements of the ratioU , weclassify thecapacity trendof each channel into the followingfour categories.

• Stable: the ratioU has remained within [S-ǫ, S+ǫ] forthe pastTs reporting periods.

• Exploding: the ratioU increased by at leastE betweenTe (e.g.,Te = 2) reporting periods.

• Increasing: the ratioU has steadily increased byI (I ≪

E) over the pastTi reporting periods.• Falling: the ratioU has decreased byF over the pastTf

reporting periods.

Orthogonal to the capacity-trend based classification above,each channel is further categorized in terms of its capacityutilization ratio as follows.

• Under-utilized: the ratioU is below the low thresholdL, e.g.,U ≤ 0.5.

• Near Capacity: the ratioU is almost 1.0, e.g.,U ≥ 0.9.

If the capacity trend of a channel has been “Exploding,” oneor more Repeater nodes will be assigned to it immediately. Ifthe channel’s capacity trend has been “Increasing,” it willbeassigned a Repeater node with a smaller capacity. The goalof the Subsidy System is to keep a channel below “NearCapacity.” If a channel’s capacity trend is “Stable” and thechannel is “Under-utilized,” the Subsidy System attempts tofree Repeater nodes (if any) assigned to the channel. If achannel’s capacity utilization is “Falling,” the Subsidy Systemwaits for the utilization to stabilize before reassigning anyRepeater nodes.

Each Repeater node periodically sends a keep-alive messageto the Subsidy System. The keep-live message tells the Sub-sidy System which channel the Repeater node is serving, plusits CPU and capacity utilization ratio. This allows the SubsidySystem to monitor the health of Repeater nodes and to increasethe stability of the overlay during Repeater reassignment.When reassigning Repeater nodes from a channel, the SubsidySystem will start from the Repeater node with the lowestutilization ratio. It will notify the selected Repeater node tostop accepting new peers and then to leave the channel aftera specified time.

In addition to Repeater nodes, the Subsidy System mayrecognize extra capacity from idle peers whose owners arenot actively watching a channel. However, our previous studyshows that a large number of idle peers are required to makeany discernible impact on the global bandwidth deficit of achannel [5]. Our current Subsidy System therefore does notsolicit bandwidth contribution from idle peers.

IV. SERVER-SIDE MEASUREMENTS

In the Zattoo system, two separate centralized collectorservers collect usage statistics and error reports, which wecall the “stats” server and the “user-feedback” server re-spectively. The “stats” server periodically collects aggregatedplayer statistics from individual peers, from which full sessionlogs are constructed and entered into a session database.The session database gives a complete picture of all pastand present sessions served by the Zattoo system. A givendatabase entry contains statistics about a particular session,which includes join time, leave time, uplink bytes, downloadbytes, and channel name associated with the session. Westudy the sessions generated on three major TV channels from


7

TABLE ISESSION DATABASE(6/1/2008–6/30/2008).

Channel # sessions # distinct users

ARD 2,102,638 298,601Cuatro 1,845,843 268,522

SF2 1,425,285 157,639

TABLE IIFEEDBACK LOGS (6/20/2008–6/29/2008).

Channel # feedback logs # sessions

ARD 871 1,253Cuatro 2,922 4,568

SF2 656 1,140

three different countries (Germany, Spain, and Switzerland),from June 1st to June 30th, 2008. Throughout the paper, welabel those channels from Germany, Spain, and Switzerland asARD, Cuatro, and SF2, respectively. Euro 2008 games wereheld during this period, and those three channels broadcastamajority of the Euro 2008 games including the final match.See Table I for information about the collected session datasets.

The “user-feedback” server, on the other hand, collectsusers’ error logs submitted asynchronously by users. The “userfeedback” data here is different from peer’s quality feedbackused in PDM reconfiguration described in Section II-C. Zat-too player maintains an encrypted log file which contains,for debugging purposes, detailed behavior of client-side P2Pengine, as well as history of all the streaming sessions initiatedby a user since the player startup. When users encounterany error while using the player, such as log-in error, joinfailure, bad quality streaming etc., they can choose to reportthe error by clicking a “Submit Feedback” button on theplayer, which causes the Zattoo player to send the generatedlog file to the user-feedback server. Since a given feedbacklog not only reports on a particular error, but also describes“normal” sessions generated prior to the occurrence of theerror, we can study user’s viewing experience (e.g., channeljoin delay) from the feedback logs. Table II describes thefeedback logs collected from June 20th to June 29th. A givenfeedback log can contain multiple sessions (for the same ordifferent channels), depending on user’s viewing behavior.The second column in the table represents the number offeedback logs which contain at least one session generatedon the channel listed in the corresponding entry in the firstcolumn. The numbers in the third column indicate how manydistinct sessions generated on said channel are present in thefeedback logs.

A. Overlay Size and Sharing Ratio

We first study how many concurrent users are supported bythe Zattoo system, and how much bandwidth is contributed bythem. For this purpose, we use the session database presentedin Table I. By using the join/leave timestamps of the collectedsessions, we calculate the number of concurrent users on agiven channel at timei. Then we calculate the average sharingratio of the given channel at the same time. The average

TABLE IIIAVERAGE SHARING RATIO.

ChannelAverage sharing ratioOff-peak Peak

ARD 0.335 0.313Cuatro 0.242 0.224

SF2 0.277 0.222

sharing ratio is defined as total users’ uplink rate dividedby their download rate on the channel. A sharing ratio ofone means users contribute to other peers in the network asmuch traffic as they download at the time. We calculate theaverage sharing ratio from the total download/uplink bytesof the collected sessions. We first obtain all the sessionswhich are active across timei. We call a set of suchsessionsSi. Then assuming uplink/download bytes of eachsession are spread uniformly throughout the entire sessionduration, we approximate the average sharing ratio at time

i as

∑i∈Si

uplink bytes(i)/duration(i)∑

i∈Si

download bytes(i)/duration(i).

Fig. 5 shows the overlay size (i.e., number of concurrentusers) and average sharing ratio super-imposed across themonth of June, 2008. According to the figure, the overlaysize grew to more than 20,000 (e.g., 20,059 onARD on 6/18and 22,152 onCuatro on 6/9). As opposed to the overlaysize, the average sharing ratio tends to stay flatter throughoutthe month. Occasional spikes in the sharing ratio all occurredduring 2AM to 6AM (GMT) when the channel usage is verylow, and therefore may be considered statistically insignificant.

By segmenting a 24-hour day into two time periods, e.g.,off-peak hours (0AM-6PM) and peak hours (6PM-0AM),Table III shows the average sharing ratio in the two timeperiods separately. Zattoo’s usage during peak hours typicallyaccounts for about 50% to 70% of the total usage of theday. According to the table, the average sharing ratio duringpeak hours is slightly lower than, but not very much differentfrom during off-peak hours.Cuatro channel in Spain exhibitsrelatively lower sharing ratio than the two other channels.Onebandwidth test site [6] reports that average uplink bandwidth inSpain is about 205 kbps, which is much lower than in Germany(582 kbps) and Switzerland (787 kbps). The lower sharingratio on the Spanish channel may reflect regional differencein residential access network provisioning. The balance oftherequired bandwidth is provided by Zattoo’s Encoding Serverand Repeater nodes.

B. Channel Switching Delay

When user clicks on a new channel button, it takes sometime (a.k.a. channel switching delay) for the user to beable to start watching streamed video on Zattoo player. Thechannel switching delay has two components. First, Zattooplayer needs to contact other available peers and retrieve allrequired sub-streams from them. We call the delay incurredduring this stage “join delay.” Once all necessary sub-streamshave been negotiated successfully, the player then needs towait and buffer the minimum amount of streams (e.g., 3seconds) before starting to show the video to the user. We call


8

0

5000

10000

15000

20000

25000

0 5 10 15 20 25 30 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Ove

rlay

Siz

e

Sha

ring

Rat

io

Day in 2008/6

Overlay SizeSharing Ratio

(a) ARD

0

5000

10000

15000

20000

25000

0 5 10 15 20 25 30 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Ove

rlay

Siz

e

Sha

ring

Rat

io

Day in 2008/6


(b) Cuatro

0

5000

10000

15000

20000

25000

0 5 10 15 20 25 30 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Ove

rlay

Siz

e

Sha

ring

Rat

io

Day in 2008/6


(c) SF2Fig. 5. Overlay size and sharing ratio.

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10 12 14 16 18 20 22 24

CD

F

Arrival Hour

Feedback LogsSession Database

(a) ARD

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10 12 14 16 18 20 22 24

CD

F

Arrival Hour


(b) Cuatro

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10 12 14 16 18 20 22 24

CD

F

Arrival Hour


(c) SF2Fig. 6. CDF of user arrival time.

the resulting wait time “buffering delay.” The total channelswitching delay experienced by users is thus the sum of joindelay and buffering delay. PPLive reports channel switchingdelay around 20 to 30 seconds, but can be as high as 2 minutes,of which join delay typically accounts for 10 to 15 seconds [7].

We measure the join delay experienced by Zattoo users fromthe feedback logs described in Table II. Debugging informa-tion contained in the feedback logs tells us when user clickedon a particular channel, and when the player has successfullyjoined the P2P overlay and starting to buffer content. Oneconcern in relying on user-submitted feedback logs to inferjoin delay is the potential sampling bias associated with them.Users typically submit feedback logs when they encountersome kind of errors, and that brings up the question of whetherthe captured sessions are representative samples to study.We attempt to address this concern by comparing the datafrom feedback logs against those from the session database.The latter captures the complete picture of user’s channelwatching behavior, and therefore can serve as a reference. Inour analysis, we compare the user arrival time distributionobtained from the two data sets. For fair comparison, we useda subset of the session database which was generated duringthe same period when the feedback logs were collected (i.e.,from June 20th to 29th).

Fig. 6 plots the CDF distribution of user arrivals per hourobtained from feedback logs and session database separately.The steep slope of the distributions during hour 18-20 (6-8PM) indicates the high frequency of user arrivals duringthose hours. OnARDandCuatro, the user arrival distributionsinferred from feedback logs are almost identical to those fromsession database. On the other hand, onSF2, the distributionobtained from feedback logs tends to grow slowly during early

TABLE IVMEDIAN CHANNEL JOIN DELAY.

Channel Median join delay Maximum overlay sizeOff-peak Peak Off-peak Peak

ARD 2.29 sec 1.96 sec 2,313 19,223Cuatro 3.67 sec 4.48 sec 2,357 8,073

SF2 2.49 sec 2.67 sec 1,126 11,360

hours, which indicates that feedback submission rate duringoff-peak hours onSF2was relatively lower than normal. Laterduring peak hours, however, feedback submission rate picksup as expected, closely matching the actual user arrival rate.Based on this observation, we argue that feedback logs canserve as representative samples of daily user activities.

Fig. 7 shows the CDF distributions of channel join delayfor ARD, CuatroandSF2channels. We show the distributionsfor off-peak hours (0AM-6PM) and peak hours (6PM-0AM)separately. Median channel join delay is also presented in asimilar fashion in Table IV. According to the CDF distribu-tions, 80% of users experience less than 4 to 8 seconds ofjoin delay, and 50% of users even less than 2 seconds of joindelay. Also, Table IV shows that even a 10-fold increase onthe number of concurrent users during peak hours does notunduly lengthen the channel join delay (up to 22% increasein median join delay).

C. Repeater Node Assignment

As illustrated in Fig. 5, the live coverage of Euro Cup gamesbrought huge flash crowds to the Zattoo system. With typicalusers contributing only about 25–30% of the average streamingbitrate, Zattoo must subsidize the balance. As described inSection III, the Zattoo’s Subsidy System assigns Repeater


9

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

CD

F

Channel Join Delay (Sec)

Off-Peak HoursPeak Hours

(a) ARD

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

CD

F



(b) Cuatro

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

CD

F



(c) SF2Fig. 7. CDF of channel join delay.

0

5000

10000

15000

20000

25000

18 19 20 21 22 23 24 0

20

40

60

80

100

120

140

160

180

200

Ove

rlay

Siz

e

Num

ber

of R

epea

ters

Ass

igne

d

Hour of Day

Overlay SizeNumber of Repeaters

(a) ARD

0

5000

10000

15000

20000

25000

12 14 16 18 20 22 24 0

20

40

60

80

100

120

140

160

180

200

Ove

rlay

Siz

e

Num

ber

of R

epea

ters

Ass

igne

d

Hour of Day


(b) Cuatro

0

5000

10000

15000

20000

25000

18 19 20 21 22 23 24 0

20

40

60

80

100

120

140

160

180

200

Ove

rlay

Siz

e

Num

ber

of R

epea

ters

Ass

igne

d

Hour of Day


(c) SF2Fig. 8. Overlay size and channel provisioning.

nodes to channels that require more bandwidth than its ownglobally available aggregate upload bandwidth.

Fig. 8 shows the Subsidy System assigning more Repeaternodes to a channel as flash crowds arrived during each EuroCup game and then gradually reclaiming them as the flashcrowd departed. For each of the channel reported, we choosea particular date with the biggest flash crowd on the channel.The dip in overlay sizes on ARD and Cuatro channels occurredduring the half-time break of the games. The Subsidy Serverwas less aggressive in assigning Repeater nodes to the ARDchannel, as compared to the other two channels, because theRepeater nodes in the vicinity of the ARD server have highercapacity than those near the other two.

V. CLIENT-SIDE MEASUREMENTS

To further study the P2P overlay beyond details obtain-able from aggregated session-level statistics, we run severalmodified Zattoo clients which periodically retrieve the internalstates of other participating peers in the network by exchang-ing SEARCH/JOIN messages with them. After a given probesession is over, the monitoring client archives a log file wherewe can analyze control/data traffic exchanged and detailedprotocol behavior. We run the experiment during Zattoo’s livecoverage of Euro 2008 (June 7th to 29th). The monitoringclients tuned to game channels from one of Zattoo’s datacenters located in Switzerland while the games were broadcastlive. The data sets presented in this paper were collectedduring the coverage of the championship final on two separatechannels:ARD in Germany andCuatro in Spain. Soccer teamsfrom Germany and Spain participated in the championshipfinal.

As described in Section II-A, Zattoo’s peer discovery isguided by peer’s topology information. To minimize potentialsampling bias caused by our use of single vantage point formonitoring, we assigned “empty” AS number and countrycode to our monitoring clients, so that their probing is notgeared towards those peers located in the same AS andcountry.

A. Sub-Stream Synchrony

To ensure good viewing quality, peer should not onlyobtain all necessary sub-streams (discounting redundant sub-streams), but also have those sub-streams delivered temporallysynchronized with each other for proper online decoding. Re-ceiving out-of-sync sub-streams typically results in pixelatedscreen on the player. As described in Sections 1 and II-C,Zattoo’s protocol favors sub-streams that are relatively in-sync when constructing the PDM, and continually monitorsthe sub-streams’ progression over time, replacing those sub-streams that have fallen behind and reconfiguring the PDMwhen necessary. In this section we measure the effectivenessof Zattoo’s Adaptive PDM in selecting sub-streams that arelargely in-sync.

To quantify such inter-sub stream synchrony, we measurethe difference in the latest (i.e., maximum) packet sequencenumbers belonging to different incoming sub-streams. Whenaremote peer responds to a SEARCH query message, it includesin its SEARCH reply the latest sequence numbers that it hasreceived for all sub-streams. If some sub-streams happen tobelossy or stalled at that time, the peer marks such sub-streamsin its SEARCH replies. Thus, we can inspect SEARCH repliesfrom existing peers to study their inter-sub stream synchrony.


10

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

CD

F

Number of Bad Sub-streams

Euro 2008 final on ARDEuro 2008 final on Cuatro

(a) CDF for number of bad sub-streams

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 100 200 300 400 500 600 700 800 900 1000

CD

F

Sub-stream Synchrony (# of Packets)

Euro 2008 final on ARDEuro 2008 final on Cuatro

(b) CDF for sub-stream synchronyFig. 9. Sub-stream synchrony.

In our experiment, we collected SEARCH replies from4,420 and 6,530 distinct peers fromARDandCuatrochannelsrespectively, during the 2-hour coverage of the final game.From the collected SEARCH replies, we check how manysub-streams (out of 16) are “bad” (e.g., lossy or missing) foreach peer. Fig. 9(a) shows the CDF distribution of the numberof bad sub-streams. According to the figure, about 99% (ARD)and 96% (Cuatro) of peers have 3 or less bad sub-streams.

Current Zattoo deployment dedicatesk = 3 sub-streams(out of n = 16) for loss recovery purposes. That is, given asegment of 16 consecutive packets, if peer has received at least13 packets, it canreconstructthe remaining 3 packets fromthe RS error correcting code (see Section 1). Thus if peer canreceive any 13 sub-streams out of 16 reliably, it can decodethe full stream properly. The result in Fig. 9 (a) suggests thatthe number of bad sub-streams is low enough as to not causequality issues in the Zattoo network.

After discounting “bad” sub-streams, we then look at thesynchrony of the remaining “good” sub-streams in each peer.Fig. 9(b) shows the CDF distribution of the sub-stream syn-chrony in the two channels. Sub-stream synchrony of a givenpeer is defined as the difference between maximum and mini-mum packet sequence numbers among all sub-streams, whichis obtained from the peer’s SEARCH reply. For example, ifsome peer has sub-stream synchrony measured at 100, it meansthat the peer has one sub-stream that is ahead of another sub-stream by 100 packets. If all the packets are received in order,the sub-stream synchrony of a peer measures at mostn − 1.If we received multiple SEARCH replies from the same peer,we average the sub-stream synchrony across all the replies.Given the 500 kbps average channel data rate, 60 consecutivepackets roughly correspond to 1-second worth of streaming.Thus, the figure shows that onCuatro channel, 20% of peershave their sub-streams completely in-sync, while more than90% have their sub-streams lagging each other by at most5 seconds; onARD channel, 30% are in-sync, and more than90% are within 1.5 seconds. The buffer space of Zattoo playerhas been dimensioned sufficiently to accommodate such lowdegree of out-of-sync sub-streams.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-350 -300 -250 -200 -150 -100 -50 0 50 100

CD

F

Relative Peer Synchrony (# of Packets)

Euro 2008 final on CuatroEuro 2008 final on ARD

Fig. 10. Peer synchrony.

B. Peer Synchrony

While sub-stream synchrony tells us stream quality differentpeers may experience, “peer synchrony” tells us how varied intime peers’ viewing points are. With small scale P2P networks,all participating peers are likely to watch live streamingroughly synchronized in time. However, as the size of theP2P overlay grows, the viewing point of edge nodes may bedelayed significantly compared to those closer to the EncodingServer. In the experiment, we define theviewing pointof apeer as the median of the latest sequence numbers acrossits sub-streams. Then we choose one peer (e.g., a Repeaternode directly connected to the Encoding Server) as a referencepoint, and compare other peers’ viewing point against thereference viewing point.

Fig. 10 shows the CDFs of relative peer synchrony. Therelative peer synchrony of peerX is obtained by the viewingpoint of X subtracted by the reference viewing point. So peersynchrony at -60 means that a given peer’s viewing pointis delayed by 60 packets (roughly 1 second for a 500 kbpsstream) compared to the reference viewing point. A positiveviewing point means that a given peer’s stream gets aheadof the reference point, which could happen for peers whichreceive streams directly from the Encoding Server. The figureshows that about 1% of peers onARD and 4% of peers onCuatro experienced more than 3 seconds (i.e., 180 packets)delay in streaming compared to the reference viewing point.


11

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-350 -300 -250 -200 -150 -100 -50 0 50 100

CD

F

Peer Synchrony

Peer hops from ES: 6Peer hops from ES: 5Peer hops from ES: 4Peer hops from ES: 3Peer hops from ES: 2

(a) ARD

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-350 -300 -250 -200 -150 -100 -50 0 50 100

CD

F

Peer Synchrony

Peer hops from ES: 6Peer hops from ES: 5Peer hops from ES: 4Peer hops from ES: 3Peer hops from ES: 2

(b) CuatroFig. 11. Peer synchrony vs. peer-hop from the Encoding Server.

To better understand to what extent peer’s position in theoverlay affects peer synchrony, we plot in Fig. 11 the CDFsof peer synchrony at differentdepths, i.e., distances from theEncoding Server. We look at how much delay is introducedfor sub-streams traversingi peers from the Encoding Server(i.e., depthi). For this purpose, we associate per-sub streamviewing point information from SEARCH reply with per-substream overlay depth information from JOIN reply, whereSEARCH/JOIN replies were sent from the same peer, close intime (e.g., less than 1 second apart). If the length of peer-hopfrom the Encoding Server has an adverse effect on playbackdelay and viewing point, we expect peers further away fromthe Encoding Server to be further offset from the referenceviewing point, resulting in CDFs that are shifted to the upperleft corner for peers at further distances from the EncodingServer. The figure shows an absence of such a leftward shiftin the CDFs. The median delay at depth 6 does not growby more than 0.5 seconds compared to the median delay atdepth 2. The figure shows that having more peers further awayfrom the Encoding Servers increases the number of peers thatare 0.5 seconds behind the reference viewing point, withoutincreasing the offset in viewing point itself. On the Zattoonetwork, once the virtual circuits comprising a PDM havebeen set up, each packet are streamed with minimal delaythrough each peer. Hence each peer-hop from the EncodingServer introduces delay only in tens of milliseconds range,attesting to the suitability of the network architecture tocarrylive media streaming on large-scale P2P networks.

C. Effectiveness of ECC in Isolating Loss

Now we investigate the effects of overlay sizes on the per-formance scalability of the Zattoo system. Here we focus onclient-side quality (e.g., loss rate). As described in Section 1,Zattoo-broadcast media streams are RS encoded, which allowspeers to reconstruct a full stream once they obtain at leastk ofn sub-streams. Since the ECC-encoded stream reconstructionoccurs hop by hop in the overlay, it can mask sporadic packetlosses, and thus prevent packet losses from being propagatedthroughout the overlay at large.

To see if such packet loss containment actually occurs inthe production system, we run the following experiments. Welet our monitoring client join a game channel, stay tuned for15 seconds, and then leave. We wait for 10 seconds after the15-second session is over. We repeat this process throughoutthe 2-hour game coverage and collect the logs. A given log filefrom each session contains a complete list of packet sequencenumbers received from the connected peers during the session,from which we can detect upstream packet losses. We thenassociate individual packet loss with the peer-path from theEncoding Server. This gives us a rough sense of whetherpackets traversing longer hops would experience higher lossrate. In our analysis, we discount packets delivered for thefirst3 seconds of a session to allow the PDM to stabilize.

Fig. 12 shows how the average packet loss rate changesacross different overlay depths (in all cases< 1%). On bothchannels, the packet loss rate does not grow with overlaydepth. This result confirms our expectation that ECC helplocalize packet losses on the overlay.

VI. PEER-TO-PEER SHARING RATIO

The premise of a given P2P system’s success in providingscalable stream distribution is sufficient bandwidth sharingfrom participating users [5]. Section IV-A shows that theaverage sharing ratio of Zattoo users ranges from 0.2 to 0.35,which translates into bandwidth uplinks ranging from 100 kbpsto 175 kbps. This is far lower than the numbers reportedas typical uplink bandwidth in countries where Zattoo isavailable [6]. Aside from the possibility that user’s bandwidthmay be shared with other applications, we find factors suchas users’ behavior, support for variable-bit rate encoding,and heterogeneous NAT environments contributing to sub-optimal sharing performance. Designers of P2P streamingsystems must pay attention to these factors to achieve goodbandwidth sharing. However, one must also keep in mind thatimproving bandwidth sharing should not be at the expenseof compromised user viewing experience, e.g., due to morefrequent uplink bandwidth saturation.


12

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 1 2 3 4 5 6 7

Pac

ket L

oss

Rat

e (%

)

Peer Hops from Encoding Server(a) ARD

0

0.1

0.2

0.3

0.4

0.5

0.6

0 1 2 3 4 5 6 7

Pac

ket L

oss

Rat

e (%

)

Peer Hops from Encoding Server(b) Cuatro

Fig. 12. Packet loss rate vs. peer hops from Encoding Server.

0

10

20

30

40

50

60

0 10 20 30 40 50 60

Ave

rage

Rec

eive

r-P

eer

Chu

rns

Session Length (Min)

Fig. 13. Receiver-peer churn frequency.

User churns: It is known that frequent user churns ac-companied by short sessions, as well as flash crowd behaviorpose a unique challenge in live streaming systems [7], [8].User churns can occur inboth upstream (e.g., forwardingpeers) and downstream (e.g., receiver peers) connectivityonthe overlay. While frequent churns of forwarding peers cancause quality problems, frequent changes of receiver peerscanlead tounder-utilization of a forwarding peer’s uplink capacity.

To estimate receiver peers’ churn behavior, we visualizein Fig. 13 the relationship between session length and thefrequency of receiver-peer churns experienced during the ses-sions. We studied receiver-peer churn behavior for theCuatrochannel by using Zattoo’s feedback logs described in Table II.The y-value at session lengthx minutes denotes the averagenumber of receiver-peer churns experienced for sessions withlength∈ [x, x + 1). According to the figure, the receiver-peerchurn frequency tends to grow linearly with session length,and peers experience approximately one receiver-peer churnevery minute.

Variable-bit rate streams: Variable-bit rate (VBR) en-coding is commonly used in production streaming systems,including Zattoo, due to its better quality-to-bandwidth ra-tio compared to constant-bit rate encoding. However, VBRstreams may put additional strain on peer’s bandwidth con-tribution and quality optimization. As described in Sec-tion II-C, the presence of VBR streams require peers to

perform measurement-based admission control when allocat-ing resources to set up a virtual circuit. To avoid degrada-tion of service due to overloading of the uplink bandwidth,the measurement-based admission control module must beconservative both in its reaction to increases in availablebandwidth and its allocation of available bandwidth to newlyjoining peers. This conservativeness necessarily lead to under-utilization of resources.

NAT reachability : Asymmetric reachability imposed byprevalent NAT boxes adds to the difficulty in achieving fullutilization of user’s uplink capacity. Zattoo delivery systemsupports 6 different NAT configurations: open host, full cone,IP-restricted, port-restricted, symmetric, and UDP-disabled,listed in increasing degree of restrictiveness. Not every pair-wise communication can occur among different NAT types.For example, peers behind a port-restricted NAT box cannotcommunicate with those of symmetric NAT type (we docu-mented a NAT reachability matrix in an earlier work [5]).

To examine how the varied NAT reachability comes intoplay as far as sharing performance is concerned, we performedthe following two controlled experiments. In both experiments,we run four Zattoo clients, each with a distinct NAT type,tuned to the same channel concurrently for 30 minutes. Inthe first experiment, we fixed the maximum allowable uplinkcapacity (denoted “max cp”) of those clients to the same con-stant value.3 In the second experiment, we let themax cp ofthose clients self-adjust over time (which is the default settingof Zattoo player). In both experiments, we monitored how theuplink bandwidth utilization ratio (i.e., ratio between activelyused uplink bandwidth and maximum allowable bandwidthmax cp) changes over time. The experiments were repeatedthree times for reliability.

Fig. 14 shows the capacity utilization ratio from these twoexperiments for four different NAT types. Fig. 14(a) visualizeshow fast the capacity utilization ratio converges to 1.0 overtime after the client joins the network. Fig. 14(b) plots theaverage utilization ratio (i.e.,[

∫t current capacity(t)dt] /

[∫

tmax cp(t)dt]). In both constant and adjustable cases, the

3The experiments were performed from one of Zattoo’s data centers inEurope, and there was sufficient uplink bandwidth availableto support 4 *max cp.


13

results clearly exemplify the varied sharing performance acrossdifferent NAT types. Especially, the “symmetric” NAT type,which is the most restrictive among the four, shows inferiorsharing ratio compared to the rest. Unfortunately, however, the“symmetric” NAT type is the second most popular NAT typefor Zattoo population [5], and therefore can seriously affectZattoo’s overall sharing performance. There is a relativelysmall number of peers running “IP-restricted” NAT-type, henceits sharing performance has not been fine-tuned at Zattoo.

Reliability of NAT detection : To allow communicationsamong clients behind a NAT gateway (i.e., NATed clients),each client in Zattoo performs a NAT detection procedureupon player startup to identify its NAT configuration andadvertises it to the rest of the Zattoo network. Zattoo’s NATdetection procedure implements a UDP-based standard STUNprotocol which involves communicating with external STUNservers to discover the presence/type of a NAT gateway [9].The communication occurs in UDP with no reliable transportguarantee. Lost or delayed STUN UDP packets may leadto inaccurate NAT detection, preventing NATed clients fromcontacting each other, and therefore adversely affect theirsharing performance.

To understand the reliability and accuracy of STUN-basedNAT detection procedure, we utilize Zattoo’s session database(Table I) which stores among other things NAT information,public IP address, and private IP address for each session. Weassume that all the sessions associated with the same pub-lic/private IP address pair would be generated under the sameNAT configuration. If sessions with the same public/privateIP address pair report inconsistent NAT detection results,weconsider those sessions as having experienced failed NATdetection, and apply the majority rule to determine the correctresult for that configuration.

Fig. 15 plots the daily trend of NAT detection failure ratederived fromARD, CuatroandSF2during the month of June,2008. The NAT detection failure rate at hourx is the numberof bogus NAT detection results divided by the total number ofNAT detections occurring on that hour. The total number ofNAT detections indicates how busy Zattoo system was throughout. The NAT detection failure rate grows from around 1.5%at 2-3AM to a peak of almost 6% at 8PM. This means that atthe busiest times, the NAT type of about 6% of clients are notdeterminable, leading to them not contributing any bandwidthto the P2P network.

VII. R ELATED WORKS

Aside from Zattoo, several commercial peer-to-peer systemsintended for live streaming have been introduced since 2005,notably PPLive, PPStream, SopCast, TVAnts, and UUSee fromChina, and Joost, Livestation, Octoshape, and RawFlow fromthe EU. A large number of measurement studies have beendone on one or the other of these systems [7], [10]–[16].Many research prototypes and improvements to existing P2Psystems have also been proposed and evaluated [4], [17]–[24].Our study is unique in that we are able to collect networkcore data from a large production system with over 3 millionregistered users with intimate knowledge of the underlyingnetwork architecture and protocol.

0

1

2

3

4

5

6

7

0 2 4 6 8 10 12 14 16 18 20 22 240.0E+

5.0E+4

1.0E+5

1.5E+5

2.0E+5

2.5E+5

3.0E+5

3.5E+5

NA

T D

etec

tion

Fai

lure

Rat

e (%

)

Num

ber

of N

AT

Det

ectio

ns

Hour of Day

NAT failure rate# of NAT detections

Fig. 15. NAT detection failure rate.

P2P systems are usually classified as either tree-based pushor mesh-based swarming [25]. In tree-based push schemes,peers organize themselves into multiple distribution trees [26]–[28]. In mesh-based swarming, peers form a randomly con-nected mesh, and content is swarmed via dynamically con-structed directed acyclic paths [21], [29], [30]. Zattoo isnotonly a hybrid between these two, similar to the latest versionof Coolstreaming [4], its dependence on Repeater nodes alsomakes it a hybrid of P2P network and a content-distributionnetwork (CDN), similar to PPlive [7].

VIII. C ONCLUSION

We have presented a receiver-based, peer-division multi-plexing engine to deliver live streaming content on a peer-to-peer network. The same engine can be used to transparentlybuild a hybrid P2P/CDN delivery network by adding Repeaternodes to the network. By analyzing large amount of usage datacollected on the network during one of the largest viewingevent in Europe, we have shown that the resulting networkcan scale to a large number of users and can take goodadvantage of available uplink bandwidth at peers. We have alsoshown that error-correcting code and packet retransmissioncan help improve network stability by isolating packet lossesand preventing transient congestion from resulting in PDMreconfigurations. We have further shown that the PDM andadaptive PDM schemes presented have small enough overheadto make our system competitive to digital satellite TV in termsof channel switch time, stream synchronization, and signallag.

ACKNOWLEDGMENT

The authors would like to thank Adam Goodman for devel-oping the Authentication Server, Alvaro Saurin for the errorcorrecting code, Zhiheng Wang for an early version of thestream buffer management, Eric Wucherer for the RendezvousServer, Zach Steindler for the Subsidy System, and the rest ofZattoo development team for fruitful collaboration.

REFERENCES

[1] R. auf der Maur, “Die Weiterverbreitung von TV- und Radioprogrammenuber IP-basierte Netze,” inEntertainment Law (f. d. Schweiz), 1st ed.Stampfli Verlag, 2006.

[2] UEFA, “Euro2008,” http://www1.uefa.com/.


14

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30

Cur

rent

Cap

acity

Util

izat

ion

Rat

io

Time Since Joining (Min)

Full coneIP-restricted

Port-restrictedSymmetric

(a) Constant maximum capacity

0

0.2

0.4

0.6

0.8

1

Ave

rage

Cap

acity

Util

izat

ion

Rat

io

Full-cone IP-restricted Port-restricted Symmetric(b) Adjustable maximum capacity

Fig. 14. Uplink capacity utilization ratio.

[3] S. Lin and D. J. Costello, Jr.,Error Control Coding, 2nd ed. PearsonPrentice-Hall, 2004.

[4] S. Xie, B. Li, G. Y. Keung, and X. Zhang, “CoolStreaming: Design,Theory, and Practice,”IEEE Trans. on Multimedia, vol. 9, no. 8,December 2007.

[5] K. Shami et al., “Impacts of Peer Characteristics on P2PTV NetworksScalability,” in Proc. IEEE INFOCOM Mini Conference, April 2009.

[6] Bandwidth-test.net, “Bandwidth test statistics across different countries,”http://www.bandwidth-test.net/stats/country/.

[7] X. Hei, C. Liang, J. Liang, Y. Liu, and K. W. Ross, “Insights intoPPLive: A Measurement Study of a Large-Scale P2P IPTV System,” inProc. IPTV Workshop, International World Wide Web Conference, 2006.

[8] B. Li et al., “An Empirical Study of Flash Crowd Dynamics ina P2P-Based Live Video Streaming System,” inProc. IEEE GlobalTelecommunications Conference, 2008.

[9] J. R. et al, “STUN - Simple Traversal of User Datagram Protocol (UDP)Through Network Address Translators (NATs),” 1993, RFC 3489.

[10] A. Ali, A. Mathur, and H. Zhang, “Measurement of Commercial Peer-to-Peer Live Video Streaming,” inProc. Workshop on Recent Advancesin Peer-to-Peer Streaming, August 2006.

[11] L. Vu et al., “Measurement and Modeling of a Large-scale Overlayfor Multimedia Streaming,” inProc. Int’l Conf. on HeterogeneousNetworking for Quality, Reliability, Security and Robustness, 2007.

[12] T. Silverston and O. Fourmaux, “Measuring P2P IPTV Systems,” inProc. ACM NOSSDAV, November 2008.

[13] C. Wu, B. Li, and S. Zhao, “Magellan: Charting Large-Scale Peer-to-Peer Live Streaming Topologies,” inProc. ICDCS’07, 2007, p. 62.

[14] M. Cha et al., “On Next-Generation Telco-Managed P2P TV Architec-tures,” in Proc. Int’l Workshop on Peer-to-Peer Systems, 2008.

[15] D. Ciullo et al., “Understanding P2P-TV Systems Through Real Mea-surements,” inProc. IEEE GLOBECOM, November 2008.

[16] E. Alessandriaet al., “P2P-TV Systems under Adverse Network Con-ditions: a Measurement Study,” inProc. IEEE INFOCOM, April 2009.

[17] S. Banerjeeet al., “Construction of an Efficient Overlay MulticastInfrastructure for Real-time Applications,” inProc. IEEE INFOCOM,2003.

[18] D. Tran, K. Hua, and T. Do, “ZIGZAG: An Efficient Peer-to-PeerScheme for Media Streaming,” inProc. IEEE INFOCOM, 2003.

[19] D. Kostic et al., “Bullet: High Bandwidth Data Dissemination Usingan Overlay Mesh,” inProc. ACM SOSP, Bolton Landing, NY, USA,October 2003.

[20] R. Rejaie and S. Stafford, “A Framework for Architecting Peer-to-PeerReceiver-driven Overlays,” inProc. ACM NOSSDAV, 2004.

[21] X. Liao et al., “AnySee: Peer-to-Peer Live Streaming,” inProc. IEEEINFOCOM, April 2006.

[22] F. Pianeseet al., “PULSE: An Adaptive, Incentive-Based, UnstructuredP2P live Streaming System,”IEEE Trans. on Multimedia, vol. 9, no. 6,2007.

[23] J. Douceur, J. Lorch, and T. Moscibroda, “Maximizing Total Uploadin Latency-Sensitive P2P Applications,” inProc. ACM SPAA, 2007, pp.270–279.

[24] Y.-W. Sung, M. Bishop, and S. Rao, “Enabling Contribution Awarenessin an Overlay Broadcasting System,” inProc. ACM SIGCOMM, 2006.

[25] N. Magharei, R. Rejaie, and Y. Guo, “Mesh or Multiple-Tree: AComparative Study of Live P2P Streaming Approaches,” inProc. IEEEINFOCOM, May 2007.

[26] V. N. Padmanabhan, H. J. Wang, and P. A. Chou, “ResilientPeer-to-PeerStreaming,” inProc. IEEE ICNP, November 2003.

[27] M. Castroet al., “SplitStream: High-Bandwidth Multicast in CooperativeEnvironments,” inProc. ACM SOSP, October 2003.

[28] J. Liang and K. Nahrstedt, “DagStream: Locality Aware and FailureResilient Peer-to-Peer Streaming,” inProc. SPIE Multimedia Computingand Networking, January 2006.

[29] N. Magharei and R. Rejaie, “PRIME: Peer-to-Peer Receiver-drIvenMEsh-based Streaming,” inProc. IEEE INFOCOM, May 2007.

[30] M. Hefeedaet al., “PROMISE: Peer-to-Peer Media Streaming UsingCollectCast,” inProc. ACM Multimedia, November 2003.

Hyunseok Changreceived the B.S. degree in elec-trical engineering from Seoul National University,Seoul, Korea, in 1998, and the M.S.E. and Ph.D.degrees in computer science and engineering fromthe University of Michigan, Ann Arbor, in 2001and 2006, respectively. He is currently a Memberof Technical Staff with the Network Protocols andSystems Department, Al- catel-Lucent Bell Labs,Holmdel, NJ. Prior to that, he was a Software Ar-chitect at Zattoo, Inc., Ann Arbor, MI. His researchinterests include content distribution network, mo-

bile applications, and network measurements.

Sugih Jamin received the Ph.D. degree in computerscience from the University of Southern California,Los Angeles, in 1996. He is an Associate Professorwith the Department of Electrical Engineering andComputer Science, University of Michigan, AnnArbor. He cofounded a peer-to-peer live streamingcompany, Zattoo, Inc., Ann Arbor, MI, in 2005.Dr. Jamin received the National Science Founda-tion (NSF) CAREER Award in 1998, the Presiden-tial Early Career Award for Scientists and Engineers(PECASE) in 1999, and the Alfred P. Sloan Research

Fellowship in 2001.

Wenjie Wang received the Ph.D. degree in com-puter science and engineering from the Universityof Michigan, Ann Arbor, in 2006. He is a ResearchStaff Member with IBM Research China, Beijing,China. He co-founded a peer-to-peer live streamingcompany, Zattoo Inc., Ann Arbor, MI, in 2005, basedon his Ph.D. dissertation, and was CTO. He alsoworked in Microsoft Research China and OracleChina as a visiting student and consultant intern.


Live streaming with receiver based peer-division multiplexing

Education

Transcript of Live streaming with receiver based peer-division multiplexing