Popularity-Awareness in Temporal DHT for P2P-based Media Streaming Applications

23
Popularity-Awareness in Temporal DHT for P2P-based Media Streaming Applications Abhishek Bhattacharya, Zhenyu Yang & Deng Pan IEEE International Symposium on Multimedia (ISM2011) Dana Point, California, USA December 5-7, 2011.

description

Popularity-Awareness in Temporal DHT for P2P-based Media Streaming Applications. Abhishek Bhattacharya, Zhenyu Yang & Deng Pan IEEE International Symposium on Multimedia (ISM2011 ) Dana Point, California, USA December 5-7, 2011. Outline. Introduction Background Popularity-Aware Search - PowerPoint PPT Presentation

Transcript of Popularity-Awareness in Temporal DHT for P2P-based Media Streaming Applications

Slide 1

Popularity-Awareness in Temporal DHT for P2P-based Media Streaming ApplicationsAbhishek Bhattacharya, Zhenyu Yang & Deng Pan

IEEE International Symposium on Multimedia (ISM2011)Dana Point, California, USADecember 5-7, 2011.

1OutlineIntroductionBackgroundPopularity-Aware SearchEstimationResultsSummary2

Introduction: Distributed Hash Tables (DHT)DHT is a generic interface

There are several implementations of this interface

Chord [MIT]Pastry [Microsoft Research UK, Rice University]Tapestry [UC Berkeley]Content Addressable Network (CAN) [UC Berkeley]SkipNet [Microsoft Research US, Univ. of Washington]Kademlia [New York University]Viceroy [Israel, UC Berkeley]P-Grid [EPFL Switzerland]Freenet [Ian Clarke]

3

(1) Distributed version of a hash table data structure(2) Stores (key, value) pairs(3) The key is like a filename(4) The value can be file contents, or pointer to location(5) Goal: Efficiently insert/lookup/delete (key, value) pairs(6) Each peer stores a subset of (key, value) pairs in the system(7) Core operation: Find node responsible for a key (Map key to node, Efficiently route insert/lookup/delete request to this node)(8) Allow for frequent node arrivals/departuresDesirable Properties:Keys should mapped evenly to all nodes in the network (load balance)Each node should maintain information about only a few other nodes (scalability, low update cost)Messages should be routed to a node efficiently (small number of hops)Node arrival/departures should only affect a few nodesIntroduction: Chord (DHT)IdentifierCirclexsucc(x)010110110010111110pred(x)010110000Exponentially spacedpointers!xsucc(x)sourceO(log n) hops for routing4

Map nodes and keys to identifiers Using randomizing hash functions Arrange them on a circle Routing table (ith entry = succ(n + 2i); log(n) finger pointers) Node join (Set up finger i: route to succ(n + 2i) log(n) fingers ) O(log2 n) cost Node leave (Maintain successor list for ring connectivity, Update successor list and finger pointers)Introduction: Video on Demand (VoD)c1c2c3c4c5c6c7c8p1p4p5p3p2Content Discovery: Tracking Server Decentralized Indexing Structures

Content Distribution: Overlay Tree/Multi-Tree/Mesh

5

The video stream is generally divided into small segments for the purpose of efficient transmission. As shown in the figure, suppose the video stream is denoted by c1, c2,,c8 and the peers are distributed as shown according to their current playing positions. Now, suppose p2 randomly jumps from c3 to c6 to c4 to c8. For jump 1: p2 should be able to quickly locate p5 as the supplier to stream from and likewise for jump 2; p2 should find p3 as a potential supplier. If the discovery process is not efficient (should be fast to satisfy deadlines), then p2 will contact the server for content which will increase the server stress. So, we can observe that content discovery is one of the most important function of a P2P-VoD system. For jump 3: p2 has no other option other than to contact the server which cannot be avoided since no peer currently holds the particular segment, but for jump 1, 2: an efficient content discovery process can avoid p2 contacting the server. There are 2 important process in a P2P-VoD systems: Content Discovery and Content Distribution. Content discovery involves in the efficient and fast lookup of content providers which requires an indexing mechanism. One of the nave techniques is to use a Tracking Server, wherein all the peers update their current positions to the server. This is a centralized mechanism and is clearly not scalable. Another option is to use decentralized indexing mechanisms which will allow efficient lookup service. Content Distribution is involved in the dissemination of content blocks among the peers which is associated with streaming efficiency and can be achieved by various structures such as Overlay Trees, or Multiple-Trees, or Mesh. Multiple-Trees are more resilient than a single tree as shown in SplitStream whereas Mesh based structures are more robust than Multiple-Trees, shown in PRIME by Magharie et al. OutlineIntroductionBackgroundPopularity-Aware SearchEstimationResultsSummary6

Background: DHT-based VoD Systemc1c2c3c4c5c6c7c8p1c1 :p1p1c2 :p1p1c6 :p17

Background: Temporal-DHT

8

Basically the content linkage structure is built to facilitate fast in-order access since they can be reached with only a single hop and avoiding the costly DHT routing. Basically, the peer pa providing indexing service for segment ci will have a content-successor pointer to a peer providing indexing service for segment ci+1 and a content-predecessor pointer to a peer providing indexing service for ci-1. Apart from the in-order continuous access, it can also help to efficiently route small jumps i.e., jumps with distance less than logarithmic complexity based on the total number of segments in the stream.

Background: Temporal-DHTCi+1CiCi+2Ci+zCiCi+1Ci+2TRange Query Reformulation9pi

The indexing records loose consistency due to lazy updations with a publish interval of T. For correct query resolution, the DHT exact-match routing needs to be augmented in t-DHT with a range query reformulation technique. As demonstrated in the figure, t-DHT estimates the query results by exploiting the predictive temporal dynamics of the application i.e., VoD in this case. During in-order continuous playback, the playing position moves at the rate of video data rate which is an application parameter that is utilized by the t-DHT to estimate the query result by traversing the entire range of estimated playback positions between the two publish operations which is kept fixed for all the peers in the system.Our research is motivated by the observation that users in a practical VoD system usually performs frequent random jumps during the initial period after joining for scanning the entire video. Then the user either decides to leave the system if the video is found to be not interesting or stabilizes to in-order continuous playback mode since the user is interested and wishes to view normally. Based on the above observation, we divide the VoD users complete duration into 2 sections: the initial random seek mode when the user continually performs random jumps followed by the latter continuous playback mode when the user views the stream in an in-order continuous fashion. The 2 distinct modes of the user request patterns motivates our approach for using an adaptive content distribution mechanism. The initial mode is handled by the static buffer/index resulting in randomized distribution pattern. The continuous mode is handled by the dynamic buffer/index resulting in synchronous distribution pattern. The initial random access mode can be handled by the static caching/indexing mechanism and the content is distributed through the mesh based content linkage pointers and performs parent switch after every segment playback to find new parents. The latter continuous playback mode can be handled by the dynamic caching/indexing mechanism and the content is distributed through the overlay tree by the parentchild relationship.

OutlineIntroductionBackgroundPopularity-Aware SearchEstimationResultsSummary10

11Popularity-Aware Search

12Popularity-Aware SearchCost: (1) log N = 4 + Range = 4 (2) log N = 4 + Range = 4 (3) log N = 4 + Range = 4 (4) log N = 4 + Range = 4 (5) log N = 4 + Range = 4

Total: 20 (excluding the common log N part)Popularity:

3 : 1 : 1

13Popularity-Aware SearchCost: (1) log N = 4 + Range = 2 (2) log N = 4 + Range = 2 (3) log N = 4 + Range = 2 (4) log N = 4 + Range = 6 (5) log N = 4 + Range = 6

Total: 18 (excluding the common log N part)

Popularity:

3 : 1 : 1

OutlineIntroductionBackgroundPopularity-Aware SearchEstimationResultsSummary14

Estimation: Centralized15C1C2C3C4C5C6C7

Estimation: Decentralized16C1C2C3C4C5C6C70.10.30.10.10.20.10.11. Initialize xi1. Local Value: xj2. Update: xj xj + j (xi ~ xj )2. Update: xi xi - j (xi ~ xj )

We employ a practical approach for estimating popularity with the help of distributed averaging algorithm in a decentralized fashion. Each node connects to r random neighbors and exchange messages with them. Assume node i has a local value of zi and the requirement is to estimate the average value of all zi over the network. Each node periodically communicates with its neighbors and performs a set of actions as follows: Node I sends its local value xi to Node j; Node j updates its local value xj to xj + j(xi xj) where 0 < j < 1 is a local parameter. Node j also send back a value j(xi xj) to node i; Node i updates its local value xi to xj to xj j(xi xj). The central idea behind the algorithm is to conserve the sum of all the values in the system by performing alternative increment and decrement operation between two neighboring nodes thereby approaching closer to the average value at each update. The algorithm can be extended to cope with node dynamics. The information gets disseminated through each peers neighborhood to eventually arrive on a local bi value which represents a good approximation of the global popularity distribution for every segment i.OutlineIntroductionBackgroundPopularity-Aware SearchEstimationResultsSummary17

Results: SimulationNetwork Setting: GT-ITM with 15 transit domains, each connected to 10 stub domains with 15 stub nodes each.Data Setting: 256 to 4096 peers with randomly distributed out/in-bound bandwidths in the range of 500~1000 Kbps. User arrival model: Poisson distribution with = 1 sec Peer Lifetime: Exponential distribution with mean of 30 minsUser Request Pattern: 50% follow Zipf distribution with different values of Rest 50% with initial 6~7 random jumps followed by continuous playback mode.Compare with VMesh, TDHTM, TDHTM -PA( = 0.4), TDHTM-PA ( = 2.0)Performance Metrics: Server Stress, Streaming Quality, Messaging Overhead, Seek Latency.18

Results: Experiments19

Results: Experiments20

OutlineIntroductionBackgroundPopularity-Aware SearchEstimationResultsSummary21

Summary22We incorporated the notion of popularity-awareness within the framework of a Temporal-DHT based VoD System.

Improvement of the overall performance by optimizing the search cost among the content set within the entire system.

Dynamic adaptation of the update interval based on the popularity of the content.

Decentralized computation of the popularities of various content.

Extensive simulation results demonstrate the effectiveness of the popularity awareness mechanism.

Thank You........Questions ???Please send all your questions to: [email protected]