Challenges, Design and Analysis of a Large-scale P2P-VoD System Yan Huang, Tom Z. J. Fu, Dah-Ming...

28
Challenges, Design and Analysis of a Large-scale P2P-VoD System Yan Huang, Tom Z. J. Fu, Dah-Ming Chiu, John C. S. Lui and Cheng Huang 2008. 10. 6. SeungHo Lee

Transcript of Challenges, Design and Analysis of a Large-scale P2P-VoD System Yan Huang, Tom Z. J. Fu, Dah-Ming...

Challenges, Design and Analysis of a Large-scale P2P-VoD SystemYan Huang, Tom Z. J. Fu, Dah-Ming Chiu, John C. S. Lui and Cheng Huang

2008. 10. 6.SeungHo Lee

Outline

P2P overview An architecture of a P2P-VoD system Performance metrics Measurement results and analysis Future works

P2P Overview

Advantages of P2P• Users help each other so that the server load is significantly reduced.

• P2P increases robustness in case of failures by replicating data over multiple peers.

P2P services• P2P file downloading : BitTorrent and Emule

• P2P live streaming : Coolstreaming and PPLive

• P2P video-on-demand (P2P-VoD) : PPLive– Like P2P streaming systems, P2P-VoD systems also deliver the content by

streaming, but peers can watch different parts of a video at the same time.– P2P-VoD systems require each user to contribute a small amount of storage

(usually 1GB) instead of only the playback buffer in memory as in the P2P streaming systems

[Ref] P2P Protocols and Applications

Network or

ProtocolUse Applications

BitTorrent File sharing / Software distribution / Media distribution

ABC, AllPeers, Vuze (formerly Azureus), BitComet, BitLord, BitTornado, BitTorrent, Burst!, Deluge, FlashGet, G3 Torrent, Halite, KTorrent, LimeWire, MLDonkey, Opera, Panthera, QTorrent, rTorrent, Shareaza, TorrentFlux, Transmission, Tribler, µTorrent, Thunder

eDonkey File sharing aMule, eDonkey2000 (discontinued), eMule, eMule Plus, FlashGet, iMesh, Jubster, lMule, MLDonkey, Morpheus, Panthera, Pruna, Shareaza, xMule

Gnutells File sharing Acquisition, BearShare, Cabos,FilesWire,FrostWire, Gnucleus, Grokster, gtk-gnutella, iMesh, Kiwi Alpha, LimeWire, MLDonkey, Morpheus, MP3 Rocket, Panthera, Poisoned, Shareaza, Swapper, XoloX

Napster File sharing Napigator, Napster

P2PTV Video stream / File sharing

TVUPlayer, Joost, CoolStreaming, Cybersky-TV, TVants, PPLive, LiveStation

P2P-VoD system

Major components• Peers

• Servers : the source of content

• Trackers : help peers connect to other peers to share the same content

• A bootstrap server : helps peers to find a suitable tracker and to perform other bootstrapping functions

• Other servers – log servers : log significant events for data measurement– transit servers : help peers behind NAT boxes

Segment sizes

How to divide a video into multiple pieces• Small segment size gives more flexibility to schedule which piece should

be uploaded from which neighboring peer.

• The larger the segment size the smaller the overhead.– Header overhead– Bitmap overhead– Protocol overhead

• The video player expects a certain minimum size for a piece of content to be viewable.

Segmentation of a movie in PPLive’s VoD system

Replication Strategy

Goal• To make the chunks as available to the user population as possible to

meet users’ viewing demand while without incurring excessive additional overheads

Considerations• Whether to allow multiple movies be cached

– Multiple movie cache (MVC) / single movie cache (SVC)

• Whether to pre-fetch or not

• Which chunk/movie to remove when the disk cache is full– Least recently used (LRU) / least frequently used (LFU)

Content Discovery

Content advertising and look-up methods• Trackers

– Used to keep track of which peers replicate a given movie– As soon as a user starts watching a movie, the peer informs its tracker that it

is replicating that movie.– When a peer wants to start watching movie, it goes to the tracker to find out

which other peers have that movie.

• Gossip method– Discovering where chunks are is by the gossip method.– This cuts down on the reliance on the tracker, and makes the system more

robust.

• DHT– Used to automatically assign movies to trackers to achieve some level of

load balancing.

Piece Selection

Which piece to download first• Sequential : select the piece that is closest to what is needed for the

video playback

• Rarest first : selecting the rarest piece helps speeding up the spread of pieces, hence indirectly helps streaming quality.

• Anchor-based : when a user tries to jump to a particular location in the movie, if the piece for that location is missing then the closest anchor point is used instead.

Transmission Strategy

Goals• Maximize downloading rate

• Minimize the overheads

Strategies (by levels of aggressiveness)• A peer can send a request for the same content to multiple neighbors

simultaneously

• A peer can request for different content from multiple neighbors simultaneously (PPLive’s choice)

– For playback rate of 500Kbps, 8-20 neighbors is the best. More than this number can still improve the achieved rate, but at the expense of heavy duplication rate.

• A peer can work with one neighbor at a time.

Other Design Issues

NAT and firewalls• Discovering different types of NAT boxes

• Pacing the upload rate and request rate

Content authentication• Chunk level authentication

– Some pieces may be polluted and cause poor viewing experience locally at a peer.

– If a peer detects a chunk is bad, discard it.

• Piece level authentication

What to measure

User behavior• includes the user arrival patterns, and how long they stayed watching a

movie

• used to improve the design of the replication strategy

External performance metrics• includes user satisfaction and server load

• used to measure the system performance perceived externally

Health of replication• measures how well a P2P-VoD system is replicating a content

• used to infer how well an important component of the system is doing

User Behavior

MVR (movie viewing record)

User Satisfaction

Simple fluency• measures the fraction of time a user spends watching a movie out of the

total time he spends waiting for and watching that movie

R(m, i) : the set of all MVRs for a given movie m and user i

n(m, i) : the number of MVRs in R(m, i)

r : one of the MVRs in R(m, i)

User Satisfaction (cont’)

User satisfaction index• considers the quality of the delivery of the content

r(Q) : a grade for the average viewing quality for an MVR r

Health of Replication

Three levels• Movie level

– The number of active peers who have advertised storing chunks of that movie

– The information that the tracker collects about movies

• Weighted movie level– Considers the fraction of chunks a peer has in computing the index

• Chunk bitmap level– The number of copies each chunk of a movie is stored by peers– Various other statistics can be computed; the average number of copies of a

chunk in a movie, the minimum number of chunks, the variance of the number of chunks.

Statistics on video objects

Overall statistics of the three typical movies

Statistics on user behavior (1)

Interarrival time distribution of viewers

Statistics on user behavior (2)

View duration distribution

Statistics on user behavior (3)

Start position distribution

Health index of Movies (1)

Number of peers that own the movie

Health index of Movies (2)

Average owning ratios for different chunks

Health index of Movies (3)

Chunk availability and chunk demand

Health index of Movies (4)

The available to demand ratios

User Satisfaction Index (1)

Generating fluency index• The computation of F(m, i) is carried out by the client software.

• The client software reports all MVRs and the fluency F(m, i) to the log server whenever a “stop-watching” event occurs.

– The STOP button is pressed– Another movie/programme is selected– The user turns off the P2P-VoD software

User Satisfaction Index (2)

The number of fluency records• A good indicator of the number of viewers of the movie

User Satisfaction Index (3)

The distribution of fluency index

Future works

Further research in P2P-VoD systems• How to design a highly scalable P2P-VoD system to support millions of

simultaneous users

• How to perform dynamic movie replication, replacement, and scheduling so as reduce the workload at the content servers

• How to quantify various replication strategies so as to guarantee a high health index

• How to select proper chunk and piece transmission strategies so as to improve the viewing quality

• How to accurately measure and quantify the user satisfaction level