Query Adaptation Techniques in Temporal- DHT for P2P...

Query Adaptation Techniques in Temporal-

DHT for P2P Media Streaming Applications

Abhishek Bhattacharya, Zhenyu Yang, and Deng Pan

School of Computing and Information Sciences

Florida International University, Miami, FL, USA.

ABSTRACT

Peer-to-Peer (P2P)-based approach for on-demand video streaming systems (P2P-VoD)

characterized by asynchronous user-interactivity has proven to be practical and effective in

recent years with real-world Internet-scale deployment (Huang, Li, & and Ross, 2007). Current

state-of-art P2P-VoD systems employ tracker server for discovering content suppliers which

poses scalability and bottleneck issues. Temporal-DHT is a structured P2P based approach

which can efficiently accommodate the large number of update operations with the continuous

change of user’s playing position and supporting asynchronous jumps (Bhattacharya, Yang, &

Zhang, Temporal DHT and its Application in P2P-VoD Systems., 2010). We propose different

query adaptation strategies based upon content popularity distributions and shortage bandwidth

ratios which are proved to be effective in improving the performance of P2P streaming system

by deriving certain optimized solutions. We formulate valuable optimization problems in the

context of a P2P-VoD system such as minimization of query search cost, server bandwidth

consumption, and a joint cost-load framework. We provide optimized solutions that achieve the

best result for the above mentioned optimization objectives. We show extensive simulation

studies under various scenarios of search cost, streaming quality, and other associated factors in a

dynamic network environment where users are free to asynchronously join/leave the system.

Keywords: Multimedia Information Systems, Video Streaming, Distributed Hash Tables,

Optimization, Peer-to-Peer Systems, Video-on-Demand.

INTRODUCTION

Gnutella, Napster, etc. are some of the first-generation unstructured systems that started the P2P

revolution, followed by the more efficient structured approaches such as Distributed Hash Tables

(DHT) represented by Chord (Stoica, Morris, Karger, Kaashoek, & Balakrishnan, 2001), CAN

(Ratnasamy, Francis, Handley, Karp, & Shenker, 2001), Pastry (Rowstron & Druschel, 2001),

and a suite of similar systems which is based upon similar principle. Web caching, distributed

storage, etc. are some of the earlier applications supported by P2P approach, followed recently

by the more popular ones such as file sharing e.g., BitTorrent (Qiu & Srikant, 2004),

multicasting e.g., Narada (Chu, Rao, & Zhang, 2000), and live streaming e.g., CoolStreaming

(Zhang, Liu, Li, & Yum, 2005), PPLive (Hei, Liang, Liu, & Ross, 2007), AnySee (Liao, Jin, Liu,

Ni, & Deng, 2006), etc. The potential advantage of P2P-based applications is mainly associated

with the fact that peers share their resources such as processing power, storage, and bandwidth to

help each other in searching/distributing content, thereby alleviating the server load. The

management and distribution of multimedia content is particularly critical with respect to P2P

applications and imposes more importance to the Internet traffic which is largely dominated by

the ever-growing bandwidth-hungry multimedia data.

On-demand streaming can be enormously benefited from the application of P2P techniques as

revealed in a recent study (Yann, Fu, Chiu, Lui, & Huang, 2008). We advocate a DHT-overlay

based approach to address the challenging problem of efficient content discovery in On-demand

system with asynchronous user interactivity. DHT overlays are already proved to be stable

substrate with nice characteristics such as scalability, decentralized control, self-organizing, and

resilience to network/peer dynamics. Incorporating DHT in one-demand streaming systems is not

a trivial issue since it will generate a flurry of update operations with the continuously changing

playback position of the user. The framework for Temporal-DHT (Bhattacharya, Yang, &

Zhang, Temporal DHT and its Application in P2P-VoD Systems., 2010) addressed this issue of

accommodating a large number of update operations by exploiting the temporal dynamics of the

content for estimating the current playing position of the peers automatically. Temporal-DHT

combines the advantages of both the approaches of cache-relay and static-cache. Cache-relay

based approach has a high streaming efficiency due to buffer-overlap relation between parent

and child peers, whereas, on the contrary static-cache based approaches are more adapted for

supporting dynamic and synchronous operations such as random jumps by avoiding the

dependency on playing position between peers. Temporal-DHT employs a skilful integration of

static and dynamic buffer management schemes to handle the request dynamics and streaming

efficiency in a seamless fashion. We can describe Temporal-DHT as an augmented version of

generic DHT semantics by incorporating the query reformulation, TTL filtering, and access

workload self-profiling techniques.

The initial Temporal-DHT framework involved a static query reformulation mechanism without

considering the possible effects of content popularity distributions and other related factors

which are common phenomenon’s in present day P2P systems. The concept of popularity

awareness is generally employed for optimizing certain objectives such as search cost or

server/peer load factor utilizing the content popularity ratios. One of the important intentions is

to reduce the search cost of more popular contents since they are queried more frequently which

will eventually help to improve the overall performance of the system (Rao, Chen, Fu, & Wang,

2010). It has already proved to be highly useful in web-caching and file-sharing systems where

the data objects are typically characterized with different popularity ratios (e.g., some popular

files are downloaded with a higher frequency or some popular web pages are accessed more

frequently). Different studies reported that web requests in Internet are highly skewed with a

Zipf-like distribution (Yiu, J, & Chan, 2007) with typical characteristics of a few objects having

a very high popularity, a medium number of objects with average popularity, followed by a long

tail with a huge number of objects with very low popularity. Zipf-distributions are universally

used for modeling popularity in various scenarios. The generic approach to deal with this kind of

skewed popularities is to cache the data objects at the various intermediate relay nodes in the

query resolution path which will eventually help to reduce the number of search hops for the

popular queries. This type of caching should be adaptive under dynamic popularity scenarios

(popularity of data objects change with time) since there is an associated trade-off relation

between the higher performance due to lower search complexity and the cost for caching the data

objects at the intermediate nodes. In the context of media streaming applications, caching is not a

reasonable choice since it does not make sense to continuously cache large media-sized objects

at the intermediate nodes which consumes a lot of network bandwidth. Various proposals such as

VMesh (Yiu, J, & Chan, 2007) employs a popularity-based content storage mechanism where the

cached segments are continuously replaced in accordance with the recent content popularity

distribution. Continuous replacement of the cached segments to adapt to the dynamic popularity

variations is one of the downside of this kind of mechanism which consumes large network

bandwidth and thereby rendering this as a heavy-weighted technique.

As already mentioned, current approaches for dealing the popularity skew is replication whereby

the less popular objects are replaced with more popular objects in a dynamic fashion. This

technique consumes excessive bandwidth for keeping the cache updated (i.e., proportional to

content popularities) and so we pursue a different approach of query resolution adaptation.

Distinct from other approaches where the query resolution adaptation depends on the

replication/caching strategies, our method avoids the expensive method of replication by

adopting a range query adaptation technique. This is possible due to the availability of a range

query reformulation technique inherently present in a Temporal-DHT framework where the

generic exact-match DHT prefix routing is augmented with a range query and the range query

span is dependent on the object update interval. The initial Temporal-DHT framework assumed a

fixed value for the object update interval thereby rendering increased search cost with respect to

popularity skewness of the content (Bhattacharya, Yang, & Zhang, Temporal DHT and its

Application in P2P-VoD Systems., 2010). There exists a tradeoff relation between the

performance benefits of decreasing search cost and the increased cost of update operations i.e., if

we intend to minimize the search cost then we need to decrease the update interval which will

trigger more number of update operations thereby increasing the messaging overhead. Due to

this situation, it is essential to find an efficient solution that optimizes certain performance

objectives and then perform the adaptations based on the optimization solutions. In this context,

we address the following problems:

P1: How to minimize the search cost with a given threshold constraint of update

interval?

P2: How to minimize the server load with a given constraint of available outbound

bandwidth and update interval?

P3: How to jointly minimize the search cost-server load with given constraints of

available bandwidth, update interval, messaging overhead?

P1 is addressed in (Bhattacharya, Yang, & Pan, Popularity Awareness in Temporal-DHT for

P2P-based Media Streaming Applications, 2011) and in this paper we undertake P2 and P3. We

present formulations for optimization objectives of P1, P2, P3, and present practical solutions for

them which will help to develop techniques to perform dynamic adaptation of the object update

intervals in the context of a Temporal-DHT with varying popularity distributions.

To summarize, our contributions are as follows: (a) We incorporate the notion of popularity-

awareness in the context of Temporal-DHT with different performance objectives for optimizing

the search cost, server load, update interval, and messaging overhead in a dynamic fashion under

varying conditions; (b) We formulate the problems P1, P2, P3, in a representative manner and

propose solutions to achieve the objectives; (c) We implement the three adaptation strategies in a

Temporal-DHT based P2P Video-on-Demand system model and provide extensive simulation

studies to show the effectiveness of the adaptive query resolution strategies in a media streaming

scenario and the performance benefits associated with the optimization of P1, P2, P3. The rest of

the paper is organized as follows: We present some basic background stuff related to DHT and

Temporal-DHT in the next section. In the following section, we present the detailed adaptation

mechanisms and the optimization problems P1, P2, P3, and the various solutions with its

interpretation in the Temporal-DHT framework. We analyze our simulation studies in the

following section. The next section summarizes related work from the literature followed by the

section for conclusion.

RELATED WORK

The general trend in dealing with popularity skews is caching and replication, where the queried

data objects are cached or replicated in the intermediate relay nodes or some strategic nodes near

the query originator. The typical problem in this domain mainly involves in the placement

strategies of replicas or cached objects to reduce the search cost for the more popular objects.

Web-caching systems are benefited from these techniques since the web-based objects typically

follow a Zipf-like popularity distribution. CFS (Dabek, Kaashoek, Karger, Morris, & Stoica,

2001) is a cooperative file system over Chord DHT which caches the popular objects along the

lookup path towards the home node where the popular objects are originally stored. PAST

(Rowstron & Druschel, Storage management and caching in PAST, a large-scale, persistent peer-

to-peer storage utility, 2001) is a storage system over Pastry DHT where the search for some

object is redirected to the nearest replicas of the targeted object. One of the more technique is

proposed in Beehive (Ramasubramanian & S, 2004) where it replicates the object copies to all

the nodes that have at least l common prefixes matching with object hash ID where l is defined

as the replication level. Replication was proposed in (Cohen & Shenker, 2002) to optimize

search efficiency where the number of replicas of an object is kept proportional to the square-

root of the object popularity. A square-root topology for unstructured P2P networks was

proposed in (Cooper, 2005) where the in/out degree of a peer is proportional to the square-root of

the node popularity. PRing/PCache (Rao, Chen, Fu, & Wang, 2010) presented a replica

placement strategy for web-caching systems with data objects having skewed popularities in both

deterministic and randomized structured P2P networks. They gave detailed analytical results

with closed form optical solutions for different resource optimization objectives. LAR

(Gopalakrishnan, Silaghi, Bhattacharjee, & Keleher, 2004) proposed a lightweight, adaptive, and

system-neutral replication framework that maintains low access latencies and good load balance

even under higly skewed demand. Now, let us discuss some replication/caching strategies

specifically for multimedia data object: VMesh (Yiu, J, & Chan, 2007) uses a static-cache based

DHT overlay for P2P VoD streaming where the cached objects are continuously refreshed with

different video segments and this segment replacement strategy is proportional to the probability

of the derived segment popularities. (Tan & Massoulie, 2011) proposed optimal content

placement strategy and request acceptance policy for P2P-VoD systems which jointly maximize

uplink bandwidth utilization. Statistical modeling is proposed in (Zhou, Fu, & Chiu, 2011) to

derive relationship among storage capacity, number of videos, number of peers, server load,

which is later used for a replication algorithm that balances load among all the peers for both

deterministic and random demand models, and both homogenous and heterogeneous upload

bandwidth distribution. (Wu & Lui, 2011) presented mathematical models and optimization

framework for understanding the impact of popularity on server load where they argued the

conventional wisdom of proportional replication strategy to be non-optimal and expanded the

design space by deriving passive replacement and active push policies based on optimal

replication ratios. (Tewari & Kleinrock, 2006) advocated to tune the number of replicas in

proportion to the request rate of the corresponding content, based on a simple queuing formula

from the standpoint of load on network links. Investigations for content placement in P2P-VoD

systems were conducted in (Kyoungwon, et al., 2007) in the context of both queuing and loss

models. (Wu & Li, 2009) used dynamic programming to derive the optimal replication strategy

for P2P-VoD system where the peers have homogenous upload capacity.

BACKGROUND MODEL DESCRIPTION

We introduce the following notations to describe our model of Temporal-DHT based P2P-VoD

system as follows:

P is the set of participating peers, pi such as { } . N is the number of peers in the system.

is the upload capacity of peer i.

is the download capacity of peer i.

S is the media server with an outbound bandwidth of .

C is the video stream such that { } is made up of M chunks or segments.

D is the size of one chunk or segment in MB.

d is the video data rate in Kbps required to maintain for uninterrupted streaming.

is the playtime of each video segment

is a dynamic/random buffer with size of k segments i.e., kD MB.

is a static/sequential buffer of size b segments i.e., bD MB.

T is the publish interval i.e., where z is a system defined parameter.

TTL is the Time-to-Live which indicates the freshness index for each indexing record.

/ are the successor/predecessor pointers in the content space.

A novel conceptual augmentation of the traditional DHT semantics for indexing content with

temporal dynamics provide considerable savings in messaging overhead is proposed in one of

our earlier work as the framework for Temporal-DHT (Bhattacharya, Yang, & Zhang, Temporal

DHT and its Application in P2P-VoD Systems., 2010). The proposed framework has two

distinctive properties: (a) Application-level Characteristics: DHT takes a more active role by

exposing the internal behavior of the application which allows for a chance to better service the

dynamic needs of the application by advocating a proactive design approach; (b) Data

Transiency: Unlike traditional DHT, the stale indexing records are flushed off from the system at

a periodic interval and the predictive temporal dynamics is exploited for effective query

resolution in Temporal-DHT.

represents a 3-tuple indexing record in a typical Temporal-DHT with

and indicating contains at and this record will be flushed off from the system at

. Temporal-DHT can accommodate both static (like traditional DHT indexing records)

and dynamic indexing records within the same framework by initializing the value of TTL to z

(for dynamic case) and ∞ (for static case). Temporal-DHT exploits the technique of lazy

updations by allowing certain degree of inconsistencies in the indexing structure which enable

the record to update in a coarser granularity i.e., predefined constant periodic interval T. To

allow this kind of inconsistency relaxation, the query resolution mechanism of the traditional

DHT is augmented by employing query reformulation and TTL filtering techniques by taking

hint from the dynamics of content workload. Next, we try to illustrate the underlying idea with

the help of an intuitive example: Referring to Fig: 1(a) and suppose k=1, we notice that VoD

peer ( ) perform an update operation at time ( ) with z=4 which is represented as an indexing

record . After each time interval , the buffer slides by one segment and the

next update is performed by peer ( ) at time ( + *4) with the record < , , + *4,

*4>. During the time interval [ ] where δ is a very small time unit signifying

already loaded in of but the Temporal-DHT update is not yet performed, the

traditional exact-match query resolution will fail to return as a result in this framework due to

the allowed inconsistency. For effectively returning as a result, we transform the exact-match

query resolution with a range query reformulation of <q, q-z>. Fig: 1(b) depicts an illustrative

example with playing buffer of VoD peers and sliding over the section of video stream

during the time interval [ ] ( ) with z=6. The accurate

query resolution formalization is given in Theorem 1 taken from (Bhattacharya, Yang, & Zhang,

Temporal DHT and its Application in P2P-VoD Systems., 2010). Further proof details and TTL

filtering schemes are covered in (Bhattacharya, Yang, & Zhang, Temporal DHT and its

Application in P2P-VoD Systems., 2010).

Figure 1: (a) Temporal-DHT content linkage and updates and (b) Range query reformulation

and buffer sliding.

Theorem 1: Given the playback buffer of size k and the publish interval z, a peer that

searches for dynamic segment needs to perform a range query of at most k+z segments.

There are also some other distinctive features associated with Temporal-DHT which will be

briefly discussed as follows: A content based overlay is initiated for supporting in-order access

and range-query resolution by maintaining pointers with respect to the semantic sequential

relationship i.e., . This content linkage pointers can also support short

random jumps as long as the number of routing hops (which can be easily calculated in this case

using content distance i.e., jump from to segment is 3) is less than O(log N) in DHT generic

routing (Yiu, J, & Chan, 2007). An overarching framework was proposed as a Temporal-DHT

based mesh (TDHTM) which can seamlessly integrate the power of asynchronous interactivity

support with static cache based indexing and smooth in-order streaming efficiency with dynamic

cache based indexing. A combined static-dynamic buffer management scheme is employed in

TDHTM where the static or segments are indexed with TTL=∞ (kept constant throughout the

peer’s lifetime) and the dynamic or segments are indexed with TTL=z (keeps changing with

player viewing position by buffer sliding after each segment playback). Static indexing involves

a one-time publication of indexing record at initialization, and query processing follows the

generic DHT based exact-match resolution mechanism, whereas dynamic indexing is concerned

with the publication of indexing records in a periodic interval of T with augmented Temporal-

DHT based range query resolution technique. Moreover, TDHTM also employs access workload

self-profiling at the client end for adaptive content distribution by dynamic switching between

random seek mode (handled by static indexing) and continuous playback mode (handled by

dynamic indexing). We provide a high-level overview of the algorithm in pseudocode as follows:

Peer joins the system and initializes the temporal DHT by deriving the finger table

Randomly selects b segments for filling followed by search and download.

Publish static indexing records of content.

Fix / pointers by joining the content overlay.

Accepts user’s request of starting video segment and searches static/dynamic indexing

records.

Peer fills dynamic buffer by invoking temporal DHT queries and download video

segments from neighbors with available .

At each gossip time interval, exchange (send/receive) messages with neighboring peers

for segment query/access popularity information.

Peer computes the access/query popularity index values for each segment and set

the update interval according to the proportionality of popularity indexes.

After each time interval of , publish a dynamic indexing record of the

representative segment in to the temporal DHT.

During each segment playback, search and download the next segment from tree parent

or neighbors and perform buffer sliding of .

Peer leaves the system by informing the temporal DHT or no information through

failures.

FRAMEWORK DETAILS

We now present detailed optimization strategies for different objectives by incorporating content

popularity and other resource management techniques in the context of a Temporal-DHT based

P2P-VoD system model.

Search Cost

Let us analyze the cost of a search query in the Temporal-DHT framework which will be

obviously more than the generic DHT cost due to the range query reformulation and so it is one

of the important metric for optimization. A typical Temporal-DHT query is composed of two

sections: (1) a basic exact-match generic DHT query with prefix routing executed through the

pointers in the finger table, (2) a range query reformulation performed by the linear traversal of

the content overlay by moving forward/backward directions with the help of / linkage

pointers. The query cost in a generic DHT search between any pair of source and destination is

given by O(log N) (Stoica, Morris, Karger, Kaashoek, & Balakrishnan, 2001). Let the ith node in

Chord DHT have a node ID i in the hash identifier space, then the kth entry in the finger table

points to the successor node of ID where 1≤k≤log N and therefore, the distance

travelled by a routing hop is given by for 1≤x≤log N and the query is forwarded to the node in

the xth entry from the finger table. This can be generalized to the fact that the query can traverse

at most half of the remaining distance between the source and destination in the identifier space

in each routing hop. The range query cost can be derived from Theorem 1, where it was stated

that the search span can traverse at most for k+z segments. The time complexity for the range

search can be equated to O(k+z) since each sequential segments can be reached by a single

application hop from each other which is facilitated by the content based overlay. So, the total

cost (in terms of messaging) to search any content in the Temporal-DHT framework

represented as number of hops is given as follows:

Search Cost Optimization

We formulate the problem of search cost by MIN-SEARCH as follows:

MIN-SEARCH: Minimize the total query cost ( ) in terms of lookup hops with a given

threshold constraint of update interval ( ).

This problem involves in maximizing the performance benefits of the Temporal-DHT framework

by associating the cost with the messaging complexity required for query resolution which is

crucial in conserving valuable network bandwidth. In our initial proposal (Bhattacharya, Yang,

& Zhang, Temporal DHT and its Application in P2P-VoD Systems., 2010), we had a fixed value

of z which essentially fails to realize the query load skew due to time-varying popularity of

individual segments in a video stream. This will essentially generate a higher total search cost of

the system mainly contributed by the large number of popular query segments.

Suppose, the total query set for a P2P session is defined by { } ⋃ where

{ } and denotes the set of segments with similar ID ( ) for each value of i. Now, we

can define popularity index ( ) of as follows:

∑

The total search cost for a single Temporal-DHT query is as derive before. Now, given

certain query popularity distribution function as , then the total search cost H for M data

objects can be represented as follows:

∑

∑

The optimization objective is to minimize the value of H. We derive our solution by exploiting

the theorem stated in (Rao, Chen, Fu, & Wang, 2010) as follows:

Theorem 2: Let the cost of each update operation be denoted as (i.e., ) and the total

number of update operations as L, then it is observed that for ∑

; H is minimized

when

.

From Eq: 10 in (Rao, Chen, Fu, & Wang, 2010) and adapting for our scenario we have:

∑

Now, let us substitute and

in Eq: 4 for H as follows:

∑

∑

It is interesting to note that the term ∑ denotes the entropy of the query popularity

index . This is in accordance with our intuition that the expected popularity distribution

skewness will play a crucial role in the cost optimization objective. It can be observed that

considering the entropy of query popularity is a sound measure for modeling the skewness

distribution. Thus, we can notice that the total search cost H depends upon N, k, L, M, z, Entropy

( ). In our framework, {N, k, M} are kept fixed, and our goal is to minimize the value of H by

adapting z with respect to . We define the estimated update interval adaptation for segment

as follows:

∑

where is the estimated popularity index for segment using the number of received requests

and derived in a later section.

Popularity of Video Segments

Popularity models are typically based on Zipf distribution which is usually derived from the

popularity of web objects in the Internet. It is generally true that in a video stream, some portions

are more popular than other as evident from previous studies which will essentially generate a

skewed query pattern by overwhelming the system with more popular queries. If all the video

segments are ranked in the descending order if their popularities, then the popularity index of the

ith segment ( ), can be denoted as follows:

⁄

∑ ⁄

where α is a Zipf constant. We assume that the segment popularities are linked to the user

request distribution which is reasonable since the more popular segments are requested by a

larger number of users with a high probability. We model the VoD query distribution as follows:

A peer initializes from a randomly selected segment and start to watch the video from that point.

The user continues to play in a normal sequential playback mode for a random time period with

an exponential distribution of mean seconds. Then, the process goes on repeating by

jumping to another random segment and remaining in normal playback mode for a certain

period. This process continues until the user leaves the system. The life of a peer in the system is

given by an exponential distribution with mean . Our main objective is to derive the segment

popularities based on the above user access model in a typical VoD system where peers

randomly join/leave. Let be the state when a peer is accessing segment i. The average time of

a peer staying in the system or its expected life period can be denoted as ⌈

⌉. The peer plays

the media in a sequential in-order mode by traversing from to with a probability of

⁄ . The average number of sequential segments accessed by the peer during this

phase is

(geometric summation series). The random jump probability

from any

segment i to another non-sequential segment j is defined as follows:

∑

We can formulate the one-step transition probability function from to for any { } as follows:

{

Let us denote as the probability of segment i at time x such that

since the

peer starts from a random point which is typically evenly distributed among all the segments.

Suppose at the end of time slot t, the peer still stays in the system with probability

⁄ which follows an exponential distribution. Thus, the expected access probability for

segment i is given as follows:

∑

It is possible to calculate the access probabilities of each segment from the above equations

provided the one-step probability function is known. The one-step probability function is

typically random in nature due to asynchronous user access patterns whereby a peer can jump to

any position at any time. This is unlike any static distribution pattern which is typically assumed

to study various theoretical properties. Moreover, it also considers the knowledge of global

information at each peer to make optimal decisions. Hence, we take a practical approach of

popularity estimation in a distributed fashion suitable for realistic conditions as described in the

next section.

Query Popularity Estimation

We use a distributed averaging algorithm for estimating query popularity in a dynamic and

decentralized fashion without any static assumption of load distribution. The average number of

queries received from a set of distributed peers is utilized for estimating the popularity indices by

exploiting the algorithm proposed in (Yiu, J, & Chan, 2007). We provide a brief description of

the algorithm as follows: Each node exchange messages with r randomly connected nodes.

Assume node i have a local value of and the objective is to estimate the average value of all over the network. The value can be conceptualized as which represents the total number

of queries for video segment in the P2P system. A local dynamic variable which is

initialized with a value of is also maintained at each peer. Each node periodically

communicates with its set of random neighbors and performs a set of action as follows: (1)

Node i send its local value to Node j, (2) Node j update its local value to

where 0 < <1 is a local parameter. Node j also sends back the value ( ) to Node i, (3)

Node i updates its local value to ( ). The central idea behind the algorithm is based

on alternate increment and decrement operations of the same value in two neighboring nodes

which helps to conserve the sum of all the values in the system and approaching closer to the

global average value after each update. This technique can also be extended to cope with peer

dynamics where each node i maintains a variable for each neighbor j, which accumulates all

the changes made to j. On detection of failure of node j, node i performs which helps to

conserve the total sum of all values.

The above distributed algorithm can be utilized to keep track of the total number of requests

from different peers which is used for the calculation of . Each peer maintains an array for

each indicating its access to . If receives a request for , then it sets , otherwise it

remains as . A peer also maintains another set of local variables which stores the

frequency of received requests for . The averaging algorithm is then executed to exchange and

update the value of continuously with its neighboring nodes. The information gets propagated

through each peers neighborhood and thereby converge to a local value which can be assumed

to be a good approximation of the global popularity distribution for . Now, it is trivial to

compute the estimated popularity of from its local set of average values as follows:

∑

Server Load Optimization

An efficient P2P-VoD system will tend to minimize the upload bandwidth traffic of S for

reducing the total operating cost. The upload bandwidth consumption of S depends on various

factors but some important of them are: (1) peer scheduling policies, and (2) content replication

strategies. We employ a practical peer scheduling strategy where a peer initially strives to locate

and download data from other peers already in the system, and only when other peers cannot

supply due to content/bandwidth bottleneck, the request is redirected to S. A peer scheduling

strategy involve two design issues: (1) peer seeking to download needs to decide which peers to

request for data (suppliers); (2) peer seeking to upload needs to decide which peer for fulfilling

its request (provider). Both these decisions are based on a queue of requests since there will be a

list of suppliers and providers and the choice need to be made with priorities. We incorporate an

intuitive approach for scheduling and assigning priority to requests based on node capacity. Node

capacity is a function of the node’s access bandwidth, processing power, disk speed, etc. This

strategy will ensure fair load sharing among the different node heterogeneities. They can be

calculated locally at each node and the information is propagated to the decision making peer by

piggybacking on request messages.

Assume to be the set of peers that currently hold segment in buffer . Obviously,

⋃ and ∑

. The expected upload bandwidth consumption of server S can be

expressed as follows:

(∑

) ∑[ (∑( )

) ]

where is the server bandwidth consumption with respect to segment (in other words it can

be conceptualized as the number of segments downloaded from server S); is the peer-

assisted bandwidth throughput provided by all the other peers in the set who currently hold

segment in buffer (in other words it can be conceptualized as the number of segments

downloaded by peer from other peers in set currently holding in buffer); is the

maximal upload bandwidth from all peers that can contribute to . For solving the above

problem, the notion of shortage bandwidth with respect to segment i is defined as follows:

∑( )

∑

and we denote

[ ]

where can be defined as the expected shortage bandwidth in the peer-set which is

actually the gap between the demand bandwidth and the available bandwidth supported by peer-

set . Thus, we can obtain the following:

∑

Our objective is to find an adaptation strategy such that the average upload bandwidth

consumption U of the server S can be minimized. The shortage bandwidth can be efficiently

calculated in an iterative way as follows:

∑

{

(∑( )

)

where ; since the first peer entering do not have any suppliers and have to

download the segment from server. Based on this framework, we can show the impact of

popularity indices of each segment and shortage bandwidth on the server upload bandwidth

consumption. To model the number of peers ( ) currently holding in buffer, we assume the

Zipf-based popularity distribution as already defined before. Let a random variable , denote the

probability for segment having number viewers ( number of peers possessing segment

in buffer ) as follows:

( ) ⁄

∑ ⁄

Thus, the average upload bandwidth consumption of server S can be derived as follows:

∑{ ⁄

∑ ⁄

∑[ ( ) ⁄

∑ ⁄

]

}

Now, the model of server load optimization can be formulated as follows: ;

In general it is difficult and also not a practical approach to find a closed form solution of this

optimization problem which will require global knowledge not inherently present for scalable

P2P systems. Rather, we define the following practical and distributed solution where the

estimated update interval adaptation is based upon shortage bandwidth proportionality ratio

which can be derived as follows:

∑

We can estimate the denominator ∑ by exploiting the distributed averaging algorithm

as discussed before. This strategy of update adaptation based upon shortage bandwidth

proportionality ratio is found to produce good result as shown later in experimental evaluation.

Search Cost-Server Load Joint Optimization

Now, we define the joint optimization problem of search cost and server load in a single function

as follows:

∑{

⁄

∑ ⁄

( ⁄

∑ ⁄

)}

∑{ ⁄

∑ ⁄

∑[ ( ) ⁄

∑ ⁄

]

}

and the objective is: ( ) .

We derive the joint optimization solution by a linear combination of their respective solutions as

follows:

∑

∑

where and are their respective weightage values and can be tuned to find the

implications and we study them in experimental evaluation. Both the denominators can be

estimated using the previous distributed averaging algorithm piggybacked in the same

communication message.

EXPERIMENTAL EVALUATION

We present extensive simulation results to validate our models and evaluate the performance of

our adaptation strategies with the help of different system properties. We implemented a discrete

event-driven simulator for various P2P operations in C++. All the P2P events such as media

playback, random jump, and peer join/leave/failure are simulated by events scheduled at

respective times. Chord (Stoica, Morris, Karger, Kaashoek, & Balakrishnan, 2001) is used as the

base DHT due to its simplistic construction and provable performance guarantees. The following

adaptation strategies are evaluated:

STATIC: No query adaptation strategy employed;

SP: Query adaptation based upon proportionality ratio of segment popularities;

∑

; .

SB: Query adaptation based upon shortage bandwidth proportionality ratio;

∑

; .

SP-SB: Joint query adaptation based upon linear combination of popularity and shortage

bandwidth proportionality ratios;

∑

∑

; .

Some of the simulator details are as follows: The underlying network topology generated using

GT-ITM (Zegura, Calvert, & Bhattacharjee, 1996) consist of 15 transit domains, each with 25

transit nodes, and each transit node connected to 10 stub domains, each with 15 stub nodes. We

randomly place the server in a transit node and peers in the stub nodes. The latency of each link

is computed in proportional to the Euclidean distance between the nodes. For each point in the

plot, we repeated the placement and simulation for 10 times to mitigate the effect of randomness.

The number of peers (N) in the system is varied from 256 to 4096. We model the user arrival

process as a Poisson distribution with an inter-arrival time λ=1 sec. The peer lifetime is modeled

as an exponential distribution with an expected mean of 30 mins. The peer upload bandwidth

( ) is randomly distributed between 250~1000 kbps and the video data rate is d=500 Kbps. The

user request pattern or the segment popularities follow a Zipf distribution with different values of

α. Each segment size (d) is set to be 3.84 MB which corresponds to one minute of video length.

The total viewing length of the video stream is 128 mins and each simulation session is set for 2

hrs. Other parameters are: =4 Mbps, =500 Mbps; k=5, b=4.

Query Resolution Cost

The query resolution cost is measured by the number of lookup hops required for returning the

result set. It is calculated by the average of lookup hops initiated through the entire query set by

all the peers in the system in a simulation session. It is an important P2P performance metric

which enables to control the query messaging overhead and also improve the chances of the

request to be served within the deadline. Figure 2 illustrates the performance of different

adaptation strategies with respect to the average number of lookup hops for various peer

populations. Some of the fixed parameters for this plot are: α=1.0; ; . It can

be observed that adaptation strategies (SP/SB/SP-SB) provide considerable performance gains

compared to no-adaptation strategies (STATIC). Among the adaptation strategies, SP provides

slight improvement over SB which is justified since SP is specifically geared to minimize the

lookup cost, but the joint adaptation technique (SP-SB) performs slightly better among all the

variants.

Figure 2: Plot of average number of lookup hops for different number of peers in the system.

Now, we study the effect of skewness degree of the popularity distribution (i.e., variation of α)

with respect to the different strategies. The performance metric is kept the same (i.e., query

resolution cost in terms of average lookup hops) and the skew degree (i.e., α) is varied from 1.0

to 5.0 for N=406 peers plotted in Figure 3. The no-adaptation strategy fairs badly with increasing

α as expected. SP performs better than SB/SP-SB which is justified since the adaptation is tuned

in proportion to the estimated query popularities (i.e., ) . SP performs better with increasing

value of α, which indicates that is able to capture the variation of skew (i.e., α).

Figure 3: Plot of lookup cost for different popularity skew degree (Zipf constant ) with N=4096.

Request Rejection Ratio

0

2

4

6

8

10

12

14

16

256 512 1024 2048 4096

Avg

# o

f lo

oku

p h

op

s

Number of peers (N)

STATIC

SP

SB

SP-SB

0

5

10

15

20

25

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Avg

# o

f lo

oku

p h

op

s

Zipf constant (α)

STATIC

SP

SB

SP-SB

The next performance metric for our experimental study is request rejection ratio which is

defined by the ratio of the number of requests fulfilled with respect to the total number of

requests initiated by all the peers in a P2P system. There are various types of requests made by

the peers at different time but we will be restricting ourselves only to the requests dealing with

content/bandwidth and do not consider any requests for control parameters for effective

performance evaluation since they are not a major focus in this framework. Figure 4 plots the

variation of request rejection rate for different population of peers in the P2P system. The no-

adaptive strategy generates the highest rejection rate since the large number of popular queries

traverse through a long query path which invokes request rejections and high query overhead.

SP-SB and SB strategies give good performance values since they adapt the query resolution

based on popularity and available bandwidth. The increase in peer population does not seem to

have a high influence on rejection rate which also renders the strategies (SB/SP-SB) to be

scalable.

Next, we study the variation of request rejection with change in popularity models with different

skew ratios. Figure 5 plots the request rejection rates for the different adaptation strategies with

varying α from 1.0 to 5.0. In accordance to the previous discussions, the STATIC strategy is not

able to control the high request rejections in the system and it is observed that it grows quite

significantly with increasing α. It is not a scalable solution and in a system with 4096 peers, the

rejection rate increase from 48.47% (α=1.0) to 60.39% (α=5.0) which is not a desirable property.

SP-SB strategy performs the best and even with increase in α it is able to considerable reduce the

request rejection rate. SB performs significantly better than SP which suggests that available

bandwidth based adaptation is better suited to minimize the request rejection rate as compared to

popularity based adaptation.

Figure 4: Plot of request rejection rate for different system size at α=1.0

0

10

20

30

40

50

60

256 512 1024 2048 4096

% R

eq

ue

st R

eje

ctio

n

Number of peers (N)

STATIC

SP

SB

SP-SB

Figure 5: Variation of request rejection rate with different popularity skew (α) for N=4096.

Server bandwidth consumption

We study the server load or upload bandwidth consumption which is one of the most important

concerns for content providers with respect to varying system parameters. Figure 6 illustrates the

variation of different adaptation strategies for varying system sizes on the server bandwidth

consumption where the popularity model is kept constant at α=1.0. The no-adaptation strategy

(STATIC) does not perform quite well and generate a steep increase of server load from75.0932

(N=256) to 557.6189 (N=4096). The shortage bandwidth based adaptation strategy (SB)

performs the best with a higher control on server load from 51.5297 (N=256) to 273.3985

(N=4096) rendering it to be a scalable solution. Joint adaptation (SP-SB) also performs

considerably better than SP which infers that the request popularity based adaptation does not

help to reduce the server load to a significant extent. Instead SB and SP-SB strategies are more

preferred for the conservation of server bandwidth.

0

10

20

30

40

50

60

70

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

% R

eq

ue

st R

eje

ctio

n

Zipf constant (α)

STATIC

SP

SB

SP-SB

0

100

200

300

400

500

600

256 512 1024 2048 4096

Nu

mb

er

of

serv

er

stre

ams

Number of peers (N)

STATIC

SP

SB

SP-SB

Figure 6: Server stress for varying user population with α=1.0

Next, we study the effect of varying segment popularity skew and its influence on the server load

generated by the four variants of adaptation technique as depicted in Figure 7. Note, that the

range of X-axes values are magnified to fit in the (max, min) range for studying the properties of

each curve in a higher granularity. We make the following observations: (1) STATIC increases

consistently with higher values of α and the average rate of increase for every interval (i.e.,

increase of α=0.5) is 4.08685 with the least in α= [2.5→3.0] interval (value=1.7078); (2) SP

performs better than STATIC but still generate a considerable server load and the curve is

somewhat invariant to change of α with an average value of 493.9939; (3) The performance of

SP-SB is better than the above two and is able to lower the server consumption to a considerable

extent. The curve is initially invariant to α till 3.5 but after that it consistently shows a better

performance in lowering the server load and thus can be concluded that this adaptation strategy

performs well after α=3.5; (4) SB performs the best in terms of conserving server bandwidth and

a closer look at the curve shows that it is able to consistently drop the server load till α=3.5 after

which it seems to stay at a constant level (α>3.5). So, it can be concluded that the best operating

point for SB strategy is α≤3.5.

Streaming Quality

Though all the above metrics are important in a P2P system performance point of view, but

streaming quality is a more relevant parameter from a user-centric perspective for improving

Quality of Experience (QoE). Streaming Quality can be defined from various context, but here

we consider playback continuity which can be defined as the number of segments that are

received within deadline and used for continuous playback divided by the total number of

segments that can fit in its lifetime. Figure 8 plots the result for different values of α from 1.0 to

5.0 with N=4096 peers. The general observations are: no-adaptive (STATIC) strategy has the

lowest playback continuity index; performance of SP and SB are within comparable limits; SP-

SB produces the best result with consistently high continuity. Note that, the range of X-axes is

made to fit in the (max, min) range to get a closer look. Now, let us take a detail look in each of

the respective curves as follows: (1) STATIC undergoes a consistent drop in continuity as the

value of α varies from 1.0 to 5.0 with an average index of 0.7551 which means that it is unable to

download 25% segments within deadline due to content/bandwidth deficiency; (2) SP improves

the continuity index from 0.8528 (α=1.0) to 0.8902 (α=2.5) but after α=2.5 it consistently drops

to 0.7848 (α=5.0) and thus its performance is reasonable till α=2.5; (3) SB have an initial drop of

continuity index from 0.8424 (α=1.0) to 0.7751 (α=3.0), but improves consistently in the later

part till 0.9031 (α=5.0) and thus its performance starts to enhance post α=3.0; (4) SP-SB

generates the best streaming quality among all the four techniques and consistently improves the

continuity from 0.9128 (α=1.0 ) to 0.9838 (α=3.0) after which it saturates and levels off without

any further possible improvements.

Figure 7: Server load for different popularity ratios (α) with N=4096.

Figure 8: Playback continuity index variation for different popularity ratios with N=4096.

Finally, we study the inter-relationship of weights and in the joint adaptation strategy

SP-SB and their effects on streaming quality for a fixed segment popularity distribution (α=5.0)

illustrated in Figure 9. From the figure it can be observed that the optimal weight ratios are

.4 and where it generates the highest continuity index of 0.9916. We have

also experimented with different popularity distribution skew (i.e., different values of α) and the

results follow similar trend with similar optimal weights and . These values are obtained

with our experimental assumptions for synthetic workload pattern and we do not claim that these

are universally optimal. More informed values can be derived by experimenting with real

network traces and the operating point can be dynamically adjusted in real and dynamic

environments.

245

295

345

395

445

495

545

595

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Nu

mb

er

of

serv

er

stre

ams

Zipf constant (α)

STATIC

SP

SB

SP-SB

0.69

0.74

0.79

0.84

0.89

0.94

0.99

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Co

nti

nu

ity

Ind

ex

Zipf constant (α)

STATIC

SP

SB

SP-SB

Figure 9: Plot of streaming quality variation with different values of and for α=5.0

CONCLUSION

Query adaptation is important in the context of Temporal-DHT, especially when the popularity

distributions are skewed. We formulated optimization problems to address search cost and server

load. We derived practical optimized solutions which can help to adapt the query resolution

mechanism for dealing with popularity skewed distribution of content. The essential objective is

to minimize the search cost and server load which are important performance parameters in the

context of a P2P-VoD system. The basic mechanism involves the adaptation of object update

interval for minimizing the search cost and server load under dynamic changing content

popularity distributions. We also showed distributed approaches for estimating the adaptation

parameter ratios reliably. Simulation results demonstrated the effectiveness of the proposed

techniques for improving various performance indicators such as search cost, request rejection

rate, server bandwidth consumption, and streaming quality.

REFERENCES

Bhattacharya, A., Yang, Z., & Pan, D. (2011). Popularity Awareness in Temporal-DHT for P2P-

based Media Streaming Applications. IEEE International Symposium on Multimedia, (pp. 241-

248).

Bhattacharya, A., Yang, Z., & Zhang, S. (2010). Temporal DHT and its Application in P2P-VoD

Systems. IEEE International Symposium on Multimedia, (pp. 81-88).

Chu, Y., Rao, S., & Zhang, H. (2000). A case for end system multicast. Proceedings of the 2000

ACM SIGMETRICS international conference on Measurement and modeling of computer

systems, (pp. 1-12).

0.69

0.74

0.79

0.84

0.89

0.94

0.99

0.0 0.2 0.4 0.6 0.8 1.0

Co

nti

nu

ity

Ind

ex

α_SP

α_SB = 1.0

α_SB = 0.8

α_SB = 0.6

α_SB = 0.4

α_SB = 0.2

α_SB = 0.0

Cohen, E., & Shenker, S. (2002). Replication strategies in unstructured peer-to-peer networks.

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols

for computer communications, (pp. 177-190).

Cooper, B. (2005). An optimal overlay topology for routing peer-to-peer searches. Proceedings

of the ACM/IFIP/USENIX 6th international conference on Middleware, (pp. 82-101).

Dabek, F., Kaashoek, M. F., Karger, D., Morris, R., & Stoica, I. (2001). Wide-area cooperative

storage with CFS. Proceedings of the eighteenth ACM symposium on Operating systems

principles, (pp. 202-215).

Gopalakrishnan, V., Silaghi, B., Bhattacharjee, B., & Keleher, P. (2004). Adaptive replication in

peer-to-peer systems. Distributed Computing Systems, 2004. Proceedings. 24th International

Conference on, (pp. 360-369).

Hei, X. L., Liang, J., Liu, Y., & Ross, K. (2007). A Measurement Study of a Large-Scale P2P

IPTV System. IEEE Transactions on Multimedia , 1672 -1687.

Huang, C., Li, J., & and Ross, K. W. (2007). Can internet video-on-demand be profitable?

Proceedings of the ACM SIGCOMM 2008 Conference on Applications,Technologies,

Architectures, and Protocols for Computer Communications, (pp. 375-388).

Kyoungwon, S., Diot, C., Kurose, J., Massoulie, L., Neumann, C., Towsley, D., et al. (2007).

Push-to-Peer Video-on-Demand System: Design and Evaluation. IEEE Journal on Selected

Areas in Communications , 1706-1716.

Liao, X., Jin, H., Liu, Y., Ni, L., & Deng, D. (2006). AnySee: Peer-to-Peer Live Streaming.

INFOCOM 2006. 25th IEEE International Conference on Computer Communications.

Proceedings, (pp. 1-10).

Qiu, D., & Srikant, R. (2004). Modeling and performance analysis of BitTorrent-like peer-to-

peer networks. Proceedings of the 2004 conference on Applications, technologies, architectures,

and protocols for computer communications, (pp. 367-378).

Ramasubramanian, V., & S, G. E. (2004). Beehive: O(1) Lookup Performance for Power-Law

Query Distributions in Peer-to-Peer Overlays. USENIX NSDI.

Rao, W., Chen, L., Fu, A. W., & Wang, G. (2010). Optimal Resource Placement in Structured

Peer-to-Peer Networks. IEEE Transactions on Parallel and Distributed Systems , 1011-1026.

Ratnasamy, S., Francis, P., Handley, M., Karp, R., & Shenker, S. (2001). A scalable content-

addressable network. Proceedings of the 2001 conference on Applications, technologies,

architectures, and protocols for computer communications, (pp. 161-172).

Rowstron, A., & Druschel, P. (2001). Pastry: Scalable, Decentralized Object Location, and

Routing for Large-Scale Peer-to-Peer Systems. Proceedings of the IFIP/ACM International

Conference on Distributed Systems Platforms Heidelberg (pp. 329-350). Springer-Verlag.

Rowstron, A., & Druschel, P. (2001). Storage management and caching in PAST, a large-scale,

persistent peer-to-peer storage utility. Proceedings of the eighteenth ACM symposium on

Operating systems principles, (pp. 188-201).

Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., & Balakrishnan, H. (2001). Chord: A

scalable peer-to-peer lookup service for internet applications. Proceedings of the 2001

conference on Applications, technologies, architectures, and protocols for computer

communications, (pp. 149-160).

Tan, B., & Massoulie, L. (2011). Optimal content placement for peer-to-peer video-on-demand

systems. INFOCOM, 2011 Proceedings IEEE, (pp. 694-702).

Tewari, S., & Kleinrock, L. (2006). Proportional Replication in Peer-to-Peer Networks.

INFOCOM 2006. 25th IEEE International Conference on Computer Communications.

Proceedings, (pp. 1-12).

Wu, J., & Li, B. (2009). Keep Cache Replacement Simple in Peer-Assisted VoD Systems.

INFOCOM 2009, IEEE, (pp. 2591-2595).

Wu, W., & Lui, J. C. (2011). Exploring the optimal replication strategy in P2P-VoD systems:

Characterization and evaluation. INFOCOM, 2011 Proceedings IEEE, (pp. 1206-1214).

Yann, H., Fu, T., Chiu, D. M., Lui, J., & Huang, C. (2008). Challenges, design and analysis of a

large-scale p2p-vod system. Proceedings of the ACM SIGCOMM 2008 conference on Data

communication, (pp. 375-388).

Yiu, W. P., J, X., & Chan, S. H. (2007). VMesh: Distributed Segment Storage for Peer-to-Peer

Interactive Video Streaming. IEEE Journal on Selected Areas in Communications , 1717-1731.

Zegura, E. W., Calvert, K. L., & Bhattacharjee, S. (1996). How to model an internetwork.

INFOCOM '96. Fifteenth Annual Joint Conference of the IEEE Computer Societies. Networking

the Next Generation. Proceedings IEEE, (pp. 594-602).

Zhang, X., Liu, J., Li, B., & Yum, Y. (2005). CoolStreaming/DONet: a data-driven overlay

network for peer-to-peer live media streaming. INFOCOM 2005. 24th Annual Joint Conference

of the IEEE Computer and Communications Societies. Proceedings IEEE, (pp. 2102-2111).

Zhou, Y., Fu, T. Z., & Chiu, D. M. (2011). Statistical modeling and analysis of P2P replication to

support VoD service. INFOCOM, 2011 Proceedings IEEE, (pp. 945-953).

Query Adaptation Techniques in Temporal- DHT for P2P...

Documents

Transcript of Query Adaptation Techniques in Temporal- DHT for P2P...