Query Adaptation Techniques in Temporal- DHT for P2P...
Transcript of Query Adaptation Techniques in Temporal- DHT for P2P...
Query Adaptation Techniques in Temporal-
DHT for P2P Media Streaming Applications
Abhishek Bhattacharya, Zhenyu Yang, and Deng Pan
School of Computing and Information Sciences
Florida International University, Miami, FL, USA.
ABSTRACT
Peer-to-Peer (P2P)-based approach for on-demand video streaming systems (P2P-VoD)
characterized by asynchronous user-interactivity has proven to be practical and effective in
recent years with real-world Internet-scale deployment (Huang, Li, & and Ross, 2007). Current
state-of-art P2P-VoD systems employ tracker server for discovering content suppliers which
poses scalability and bottleneck issues. Temporal-DHT is a structured P2P based approach
which can efficiently accommodate the large number of update operations with the continuous
change of user’s playing position and supporting asynchronous jumps (Bhattacharya, Yang, &
Zhang, Temporal DHT and its Application in P2P-VoD Systems., 2010). We propose different
query adaptation strategies based upon content popularity distributions and shortage bandwidth
ratios which are proved to be effective in improving the performance of P2P streaming system
by deriving certain optimized solutions. We formulate valuable optimization problems in the
context of a P2P-VoD system such as minimization of query search cost, server bandwidth
consumption, and a joint cost-load framework. We provide optimized solutions that achieve the
best result for the above mentioned optimization objectives. We show extensive simulation
studies under various scenarios of search cost, streaming quality, and other associated factors in a
dynamic network environment where users are free to asynchronously join/leave the system.
Keywords: Multimedia Information Systems, Video Streaming, Distributed Hash Tables,
Optimization, Peer-to-Peer Systems, Video-on-Demand.
INTRODUCTION
Gnutella, Napster, etc. are some of the first-generation unstructured systems that started the P2P
revolution, followed by the more efficient structured approaches such as Distributed Hash Tables
(DHT) represented by Chord (Stoica, Morris, Karger, Kaashoek, & Balakrishnan, 2001), CAN
(Ratnasamy, Francis, Handley, Karp, & Shenker, 2001), Pastry (Rowstron & Druschel, 2001),
and a suite of similar systems which is based upon similar principle. Web caching, distributed
storage, etc. are some of the earlier applications supported by P2P approach, followed recently
by the more popular ones such as file sharing e.g., BitTorrent (Qiu & Srikant, 2004),
multicasting e.g., Narada (Chu, Rao, & Zhang, 2000), and live streaming e.g., CoolStreaming
(Zhang, Liu, Li, & Yum, 2005), PPLive (Hei, Liang, Liu, & Ross, 2007), AnySee (Liao, Jin, Liu,
Ni, & Deng, 2006), etc. The potential advantage of P2P-based applications is mainly associated
with the fact that peers share their resources such as processing power, storage, and bandwidth to
help each other in searching/distributing content, thereby alleviating the server load. The
management and distribution of multimedia content is particularly critical with respect to P2P
applications and imposes more importance to the Internet traffic which is largely dominated by
the ever-growing bandwidth-hungry multimedia data.
On-demand streaming can be enormously benefited from the application of P2P techniques as
revealed in a recent study (Yann, Fu, Chiu, Lui, & Huang, 2008). We advocate a DHT-overlay
based approach to address the challenging problem of efficient content discovery in On-demand
system with asynchronous user interactivity. DHT overlays are already proved to be stable
substrate with nice characteristics such as scalability, decentralized control, self-organizing, and
resilience to network/peer dynamics. Incorporating DHT in one-demand streaming systems is not
a trivial issue since it will generate a flurry of update operations with the continuously changing
playback position of the user. The framework for Temporal-DHT (Bhattacharya, Yang, &
Zhang, Temporal DHT and its Application in P2P-VoD Systems., 2010) addressed this issue of
accommodating a large number of update operations by exploiting the temporal dynamics of the
content for estimating the current playing position of the peers automatically. Temporal-DHT
combines the advantages of both the approaches of cache-relay and static-cache. Cache-relay
based approach has a high streaming efficiency due to buffer-overlap relation between parent
and child peers, whereas, on the contrary static-cache based approaches are more adapted for
supporting dynamic and synchronous operations such as random jumps by avoiding the
dependency on playing position between peers. Temporal-DHT employs a skilful integration of
static and dynamic buffer management schemes to handle the request dynamics and streaming
efficiency in a seamless fashion. We can describe Temporal-DHT as an augmented version of
generic DHT semantics by incorporating the query reformulation, TTL filtering, and access
workload self-profiling techniques.
The initial Temporal-DHT framework involved a static query reformulation mechanism without
considering the possible effects of content popularity distributions and other related factors
which are common phenomenon’s in present day P2P systems. The concept of popularity
awareness is generally employed for optimizing certain objectives such as search cost or
server/peer load factor utilizing the content popularity ratios. One of the important intentions is
to reduce the search cost of more popular contents since they are queried more frequently which
will eventually help to improve the overall performance of the system (Rao, Chen, Fu, & Wang,
2010). It has already proved to be highly useful in web-caching and file-sharing systems where
the data objects are typically characterized with different popularity ratios (e.g., some popular
files are downloaded with a higher frequency or some popular web pages are accessed more
frequently). Different studies reported that web requests in Internet are highly skewed with a
Zipf-like distribution (Yiu, J, & Chan, 2007) with typical characteristics of a few objects having
a very high popularity, a medium number of objects with average popularity, followed by a long
tail with a huge number of objects with very low popularity. Zipf-distributions are universally
used for modeling popularity in various scenarios. The generic approach to deal with this kind of
skewed popularities is to cache the data objects at the various intermediate relay nodes in the
query resolution path which will eventually help to reduce the number of search hops for the
popular queries. This type of caching should be adaptive under dynamic popularity scenarios
(popularity of data objects change with time) since there is an associated trade-off relation
between the higher performance due to lower search complexity and the cost for caching the data
objects at the intermediate nodes. In the context of media streaming applications, caching is not a
reasonable choice since it does not make sense to continuously cache large media-sized objects
at the intermediate nodes which consumes a lot of network bandwidth. Various proposals such as
VMesh (Yiu, J, & Chan, 2007) employs a popularity-based content storage mechanism where the
cached segments are continuously replaced in accordance with the recent content popularity
distribution. Continuous replacement of the cached segments to adapt to the dynamic popularity
variations is one of the downside of this kind of mechanism which consumes large network
bandwidth and thereby rendering this as a heavy-weighted technique.
As already mentioned, current approaches for dealing the popularity skew is replication whereby
the less popular objects are replaced with more popular objects in a dynamic fashion. This
technique consumes excessive bandwidth for keeping the cache updated (i.e., proportional to
content popularities) and so we pursue a different approach of query resolution adaptation.
Distinct from other approaches where the query resolution adaptation depends on the
replication/caching strategies, our method avoids the expensive method of replication by
adopting a range query adaptation technique. This is possible due to the availability of a range
query reformulation technique inherently present in a Temporal-DHT framework where the
generic exact-match DHT prefix routing is augmented with a range query and the range query
span is dependent on the object update interval. The initial Temporal-DHT framework assumed a
fixed value for the object update interval thereby rendering increased search cost with respect to
popularity skewness of the content (Bhattacharya, Yang, & Zhang, Temporal DHT and its
Application in P2P-VoD Systems., 2010). There exists a tradeoff relation between the
performance benefits of decreasing search cost and the increased cost of update operations i.e., if
we intend to minimize the search cost then we need to decrease the update interval which will
trigger more number of update operations thereby increasing the messaging overhead. Due to
this situation, it is essential to find an efficient solution that optimizes certain performance
objectives and then perform the adaptations based on the optimization solutions. In this context,
we address the following problems:
P1: How to minimize the search cost with a given threshold constraint of update
interval?
P2: How to minimize the server load with a given constraint of available outbound
bandwidth and update interval?
P3: How to jointly minimize the search cost-server load with given constraints of
available bandwidth, update interval, messaging overhead?
P1 is addressed in (Bhattacharya, Yang, & Pan, Popularity Awareness in Temporal-DHT for
P2P-based Media Streaming Applications, 2011) and in this paper we undertake P2 and P3. We
present formulations for optimization objectives of P1, P2, P3, and present practical solutions for
them which will help to develop techniques to perform dynamic adaptation of the object update
intervals in the context of a Temporal-DHT with varying popularity distributions.
To summarize, our contributions are as follows: (a) We incorporate the notion of popularity-
awareness in the context of Temporal-DHT with different performance objectives for optimizing
the search cost, server load, update interval, and messaging overhead in a dynamic fashion under
varying conditions; (b) We formulate the problems P1, P2, P3, in a representative manner and
propose solutions to achieve the objectives; (c) We implement the three adaptation strategies in a
Temporal-DHT based P2P Video-on-Demand system model and provide extensive simulation
studies to show the effectiveness of the adaptive query resolution strategies in a media streaming
scenario and the performance benefits associated with the optimization of P1, P2, P3. The rest of
the paper is organized as follows: We present some basic background stuff related to DHT and
Temporal-DHT in the next section. In the following section, we present the detailed adaptation
mechanisms and the optimization problems P1, P2, P3, and the various solutions with its
interpretation in the Temporal-DHT framework. We analyze our simulation studies in the
following section. The next section summarizes related work from the literature followed by the
section for conclusion.
RELATED WORK
The general trend in dealing with popularity skews is caching and replication, where the queried
data objects are cached or replicated in the intermediate relay nodes or some strategic nodes near
the query originator. The typical problem in this domain mainly involves in the placement
strategies of replicas or cached objects to reduce the search cost for the more popular objects.
Web-caching systems are benefited from these techniques since the web-based objects typically
follow a Zipf-like popularity distribution. CFS (Dabek, Kaashoek, Karger, Morris, & Stoica,
2001) is a cooperative file system over Chord DHT which caches the popular objects along the
lookup path towards the home node where the popular objects are originally stored. PAST
(Rowstron & Druschel, Storage management and caching in PAST, a large-scale, persistent peer-
to-peer storage utility, 2001) is a storage system over Pastry DHT where the search for some
object is redirected to the nearest replicas of the targeted object. One of the more technique is
proposed in Beehive (Ramasubramanian & S, 2004) where it replicates the object copies to all
the nodes that have at least l common prefixes matching with object hash ID where l is defined
as the replication level. Replication was proposed in (Cohen & Shenker, 2002) to optimize
search efficiency where the number of replicas of an object is kept proportional to the square-
root of the object popularity. A square-root topology for unstructured P2P networks was
proposed in (Cooper, 2005) where the in/out degree of a peer is proportional to the square-root of
the node popularity. PRing/PCache (Rao, Chen, Fu, & Wang, 2010) presented a replica
placement strategy for web-caching systems with data objects having skewed popularities in both
deterministic and randomized structured P2P networks. They gave detailed analytical results
with closed form optical solutions for different resource optimization objectives. LAR
(Gopalakrishnan, Silaghi, Bhattacharjee, & Keleher, 2004) proposed a lightweight, adaptive, and
system-neutral replication framework that maintains low access latencies and good load balance
even under higly skewed demand. Now, let us discuss some replication/caching strategies
specifically for multimedia data object: VMesh (Yiu, J, & Chan, 2007) uses a static-cache based
DHT overlay for P2P VoD streaming where the cached objects are continuously refreshed with
different video segments and this segment replacement strategy is proportional to the probability
of the derived segment popularities. (Tan & Massoulie, 2011) proposed optimal content
placement strategy and request acceptance policy for P2P-VoD systems which jointly maximize
uplink bandwidth utilization. Statistical modeling is proposed in (Zhou, Fu, & Chiu, 2011) to
derive relationship among storage capacity, number of videos, number of peers, server load,
which is later used for a replication algorithm that balances load among all the peers for both
deterministic and random demand models, and both homogenous and heterogeneous upload
bandwidth distribution. (Wu & Lui, 2011) presented mathematical models and optimization
framework for understanding the impact of popularity on server load where they argued the
conventional wisdom of proportional replication strategy to be non-optimal and expanded the
design space by deriving passive replacement and active push policies based on optimal
replication ratios. (Tewari & Kleinrock, 2006) advocated to tune the number of replicas in
proportion to the request rate of the corresponding content, based on a simple queuing formula
from the standpoint of load on network links. Investigations for content placement in P2P-VoD
systems were conducted in (Kyoungwon, et al., 2007) in the context of both queuing and loss
models. (Wu & Li, 2009) used dynamic programming to derive the optimal replication strategy
for P2P-VoD system where the peers have homogenous upload capacity.
BACKGROUND MODEL DESCRIPTION
We introduce the following notations to describe our model of Temporal-DHT based P2P-VoD
system as follows:
P is the set of participating peers, pi such as { } . N is the number of peers in the system.
is the upload capacity of peer i.
is the download capacity of peer i.
S is the media server with an outbound bandwidth of .
C is the video stream such that { } is made up of M chunks or segments.
D is the size of one chunk or segment in MB.
d is the video data rate in Kbps required to maintain for uninterrupted streaming.
is the playtime of each video segment
is a dynamic/random buffer with size of k segments i.e., kD MB.
is a static/sequential buffer of size b segments i.e., bD MB.
T is the publish interval i.e., where z is a system defined parameter.
TTL is the Time-to-Live which indicates the freshness index for each indexing record.
/ are the successor/predecessor pointers in the content space.
A novel conceptual augmentation of the traditional DHT semantics for indexing content with
temporal dynamics provide considerable savings in messaging overhead is proposed in one of
our earlier work as the framework for Temporal-DHT (Bhattacharya, Yang, & Zhang, Temporal
DHT and its Application in P2P-VoD Systems., 2010). The proposed framework has two
distinctive properties: (a) Application-level Characteristics: DHT takes a more active role by
exposing the internal behavior of the application which allows for a chance to better service the
dynamic needs of the application by advocating a proactive design approach; (b) Data
Transiency: Unlike traditional DHT, the stale indexing records are flushed off from the system at
a periodic interval and the predictive temporal dynamics is exploited for effective query
resolution in Temporal-DHT.
represents a 3-tuple indexing record in a typical Temporal-DHT with
and indicating contains at and this record will be flushed off from the system at
. Temporal-DHT can accommodate both static (like traditional DHT indexing records)
and dynamic indexing records within the same framework by initializing the value of TTL to z
(for dynamic case) and ∞ (for static case). Temporal-DHT exploits the technique of lazy
updations by allowing certain degree of inconsistencies in the indexing structure which enable
the record to update in a coarser granularity i.e., predefined constant periodic interval T. To
allow this kind of inconsistency relaxation, the query resolution mechanism of the traditional
DHT is augmented by employing query reformulation and TTL filtering techniques by taking
hint from the dynamics of content workload. Next, we try to illustrate the underlying idea with
the help of an intuitive example: Referring to Fig: 1(a) and suppose k=1, we notice that VoD
peer ( ) perform an update operation at time ( ) with z=4 which is represented as an indexing
record . After each time interval , the buffer slides by one segment and the
next update is performed by peer ( ) at time ( + *4) with the record < , , + *4,
*4>. During the time interval [ ] where δ is a very small time unit signifying
already loaded in of but the Temporal-DHT update is not yet performed, the
traditional exact-match query resolution will fail to return as a result in this framework due to
the allowed inconsistency. For effectively returning as a result, we transform the exact-match
query resolution with a range query reformulation of <q, q-z>. Fig: 1(b) depicts an illustrative
example with playing buffer of VoD peers and sliding over the section of video stream
during the time interval [ ] ( ) with z=6. The accurate
query resolution formalization is given in Theorem 1 taken from (Bhattacharya, Yang, & Zhang,
Temporal DHT and its Application in P2P-VoD Systems., 2010). Further proof details and TTL
filtering schemes are covered in (Bhattacharya, Yang, & Zhang, Temporal DHT and its
Application in P2P-VoD Systems., 2010).
Figure 1: (a) Temporal-DHT content linkage and updates and (b) Range query reformulation
and buffer sliding.
Theorem 1: Given the playback buffer of size k and the publish interval z, a peer that
searches for dynamic segment needs to perform a range query of at most k+z segments.
There are also some other distinctive features associated with Temporal-DHT which will be
briefly discussed as follows: A content based overlay is initiated for supporting in-order access
and range-query resolution by maintaining pointers with respect to the semantic sequential
relationship i.e., . This content linkage pointers can also support short
random jumps as long as the number of routing hops (which can be easily calculated in this case
using content distance i.e., jump from to segment is 3) is less than O(log N) in DHT generic
routing (Yiu, J, & Chan, 2007). An overarching framework was proposed as a Temporal-DHT
based mesh (TDHTM) which can seamlessly integrate the power of asynchronous interactivity
support with static cache based indexing and smooth in-order streaming efficiency with dynamic
cache based indexing. A combined static-dynamic buffer management scheme is employed in
TDHTM where the static or segments are indexed with TTL=∞ (kept constant throughout the
peer’s lifetime) and the dynamic or segments are indexed with TTL=z (keeps changing with
player viewing position by buffer sliding after each segment playback). Static indexing involves
a one-time publication of indexing record at initialization, and query processing follows the
generic DHT based exact-match resolution mechanism, whereas dynamic indexing is concerned
with the publication of indexing records in a periodic interval of T with augmented Temporal-
DHT based range query resolution technique. Moreover, TDHTM also employs access workload
self-profiling at the client end for adaptive content distribution by dynamic switching between
random seek mode (handled by static indexing) and continuous playback mode (handled by
dynamic indexing). We provide a high-level overview of the algorithm in pseudocode as follows:
Peer joins the system and initializes the temporal DHT by deriving the finger table
Randomly selects b segments for filling followed by search and download.
Publish static indexing records of content.
Fix / pointers by joining the content overlay.
Accepts user’s request of starting video segment and searches static/dynamic indexing
records.
Peer fills dynamic buffer by invoking temporal DHT queries and download video
segments from neighbors with available .
At each gossip time interval, exchange (send/receive) messages with neighboring peers
for segment query/access popularity information.
Peer computes the access/query popularity index values for each segment and set
the update interval according to the proportionality of popularity indexes.
After each time interval of , publish a dynamic indexing record of the
representative segment in to the temporal DHT.
During each segment playback, search and download the next segment from tree parent
or neighbors and perform buffer sliding of .
Peer leaves the system by informing the temporal DHT or no information through
failures.
FRAMEWORK DETAILS
We now present detailed optimization strategies for different objectives by incorporating content
popularity and other resource management techniques in the context of a Temporal-DHT based
P2P-VoD system model.
Search Cost
Let us analyze the cost of a search query in the Temporal-DHT framework which will be
obviously more than the generic DHT cost due to the range query reformulation and so it is one
of the important metric for optimization. A typical Temporal-DHT query is composed of two
sections: (1) a basic exact-match generic DHT query with prefix routing executed through the
pointers in the finger table, (2) a range query reformulation performed by the linear traversal of
the content overlay by moving forward/backward directions with the help of / linkage
pointers. The query cost in a generic DHT search between any pair of source and destination is
given by O(log N) (Stoica, Morris, Karger, Kaashoek, & Balakrishnan, 2001). Let the ith node in
Chord DHT have a node ID i in the hash identifier space, then the kth entry in the finger table
points to the successor node of ID where 1≤k≤log N and therefore, the distance
travelled by a routing hop is given by for 1≤x≤log N and the query is forwarded to the node in
the xth entry from the finger table. This can be generalized to the fact that the query can traverse
at most half of the remaining distance between the source and destination in the identifier space
in each routing hop. The range query cost can be derived from Theorem 1, where it was stated
that the search span can traverse at most for k+z segments. The time complexity for the range
search can be equated to O(k+z) since each sequential segments can be reached by a single
application hop from each other which is facilitated by the content based overlay. So, the total
cost (in terms of messaging) to search any content in the Temporal-DHT framework
represented as number of hops is given as follows:
Search Cost Optimization
We formulate the problem of search cost by MIN-SEARCH as follows:
MIN-SEARCH: Minimize the total query cost ( ) in terms of lookup hops with a given
threshold constraint of update interval ( ).
This problem involves in maximizing the performance benefits of the Temporal-DHT framework
by associating the cost with the messaging complexity required for query resolution which is
crucial in conserving valuable network bandwidth. In our initial proposal (Bhattacharya, Yang,
& Zhang, Temporal DHT and its Application in P2P-VoD Systems., 2010), we had a fixed value
of z which essentially fails to realize the query load skew due to time-varying popularity of
individual segments in a video stream. This will essentially generate a higher total search cost of
the system mainly contributed by the large number of popular query segments.
Suppose, the total query set for a P2P session is defined by { } ⋃ where
{ } and denotes the set of segments with similar ID ( ) for each value of i. Now, we
can define popularity index ( ) of as follows:
∑
The total search cost for a single Temporal-DHT query is as derive before. Now, given
certain query popularity distribution function as , then the total search cost H for M data
objects can be represented as follows:
∑
∑
The optimization objective is to minimize the value of H. We derive our solution by exploiting
the theorem stated in (Rao, Chen, Fu, & Wang, 2010) as follows:
Theorem 2: Let the cost of each update operation be denoted as (i.e., ) and the total
number of update operations as L, then it is observed that for ∑
; H is minimized
when
.
From Eq: 10 in (Rao, Chen, Fu, & Wang, 2010) and adapting for our scenario we have:
∑
Now, let us substitute and
in Eq: 4 for H as follows:
∑
∑
It is interesting to note that the term ∑ denotes the entropy of the query popularity
index . This is in accordance with our intuition that the expected popularity distribution
skewness will play a crucial role in the cost optimization objective. It can be observed that
considering the entropy of query popularity is a sound measure for modeling the skewness
distribution. Thus, we can notice that the total search cost H depends upon N, k, L, M, z, Entropy
( ). In our framework, {N, k, M} are kept fixed, and our goal is to minimize the value of H by
adapting z with respect to . We define the estimated update interval adaptation for segment
as follows:
∑
where is the estimated popularity index for segment using the number of received requests
and derived in a later section.
Popularity of Video Segments
Popularity models are typically based on Zipf distribution which is usually derived from the
popularity of web objects in the Internet. It is generally true that in a video stream, some portions
are more popular than other as evident from previous studies which will essentially generate a
skewed query pattern by overwhelming the system with more popular queries. If all the video
segments are ranked in the descending order if their popularities, then the popularity index of the
ith segment ( ), can be denoted as follows:
⁄
∑ ⁄
where α is a Zipf constant. We assume that the segment popularities are linked to the user
request distribution which is reasonable since the more popular segments are requested by a
larger number of users with a high probability. We model the VoD query distribution as follows:
A peer initializes from a randomly selected segment and start to watch the video from that point.
The user continues to play in a normal sequential playback mode for a random time period with
an exponential distribution of mean seconds. Then, the process goes on repeating by
jumping to another random segment and remaining in normal playback mode for a certain
period. This process continues until the user leaves the system. The life of a peer in the system is
given by an exponential distribution with mean . Our main objective is to derive the segment
popularities based on the above user access model in a typical VoD system where peers
randomly join/leave. Let be the state when a peer is accessing segment i. The average time of
a peer staying in the system or its expected life period can be denoted as ⌈
⌉. The peer plays
the media in a sequential in-order mode by traversing from to with a probability of
⁄ . The average number of sequential segments accessed by the peer during this
phase is
(geometric summation series). The random jump probability
from any
segment i to another non-sequential segment j is defined as follows:
∑
We can formulate the one-step transition probability function from to for any { } as follows:
{
Let us denote as the probability of segment i at time x such that
since the
peer starts from a random point which is typically evenly distributed among all the segments.
Suppose at the end of time slot t, the peer still stays in the system with probability
⁄ which follows an exponential distribution. Thus, the expected access probability for
segment i is given as follows:
∑
It is possible to calculate the access probabilities of each segment from the above equations
provided the one-step probability function is known. The one-step probability function is
typically random in nature due to asynchronous user access patterns whereby a peer can jump to
any position at any time. This is unlike any static distribution pattern which is typically assumed
to study various theoretical properties. Moreover, it also considers the knowledge of global
information at each peer to make optimal decisions. Hence, we take a practical approach of
popularity estimation in a distributed fashion suitable for realistic conditions as described in the
next section.
Query Popularity Estimation
We use a distributed averaging algorithm for estimating query popularity in a dynamic and
decentralized fashion without any static assumption of load distribution. The average number of
queries received from a set of distributed peers is utilized for estimating the popularity indices by
exploiting the algorithm proposed in (Yiu, J, & Chan, 2007). We provide a brief description of
the algorithm as follows: Each node exchange messages with r randomly connected nodes.
Assume node i have a local value of and the objective is to estimate the average value of all over the network. The value can be conceptualized as which represents the total number
of queries for video segment in the P2P system. A local dynamic variable which is
initialized with a value of is also maintained at each peer. Each node periodically
communicates with its set of random neighbors and performs a set of action as follows: (1)
Node i send its local value to Node j, (2) Node j update its local value to
where 0 < <1 is a local parameter. Node j also sends back the value ( ) to Node i, (3)
Node i updates its local value to ( ). The central idea behind the algorithm is based
on alternate increment and decrement operations of the same value in two neighboring nodes
which helps to conserve the sum of all the values in the system and approaching closer to the
global average value after each update. This technique can also be extended to cope with peer
dynamics where each node i maintains a variable for each neighbor j, which accumulates all
the changes made to j. On detection of failure of node j, node i performs which helps to
conserve the total sum of all values.
The above distributed algorithm can be utilized to keep track of the total number of requests
from different peers which is used for the calculation of . Each peer maintains an array for
each indicating its access to . If receives a request for , then it sets , otherwise it
remains as . A peer also maintains another set of local variables which stores the
frequency of received requests for . The averaging algorithm is then executed to exchange and
update the value of continuously with its neighboring nodes. The information gets propagated
through each peers neighborhood and thereby converge to a local value which can be assumed
to be a good approximation of the global popularity distribution for . Now, it is trivial to
compute the estimated popularity of from its local set of average values as follows:
∑
Server Load Optimization
An efficient P2P-VoD system will tend to minimize the upload bandwidth traffic of S for
reducing the total operating cost. The upload bandwidth consumption of S depends on various
factors but some important of them are: (1) peer scheduling policies, and (2) content replication
strategies. We employ a practical peer scheduling strategy where a peer initially strives to locate
and download data from other peers already in the system, and only when other peers cannot
supply due to content/bandwidth bottleneck, the request is redirected to S. A peer scheduling
strategy involve two design issues: (1) peer seeking to download needs to decide which peers to
request for data (suppliers); (2) peer seeking to upload needs to decide which peer for fulfilling
its request (provider). Both these decisions are based on a queue of requests since there will be a
list of suppliers and providers and the choice need to be made with priorities. We incorporate an
intuitive approach for scheduling and assigning priority to requests based on node capacity. Node
capacity is a function of the node’s access bandwidth, processing power, disk speed, etc. This
strategy will ensure fair load sharing among the different node heterogeneities. They can be
calculated locally at each node and the information is propagated to the decision making peer by
piggybacking on request messages.
Assume to be the set of peers that currently hold segment in buffer . Obviously,
⋃ and ∑
. The expected upload bandwidth consumption of server S can be
expressed as follows:
(∑
) ∑[ (∑( )
) ]
where is the server bandwidth consumption with respect to segment (in other words it can
be conceptualized as the number of segments downloaded from server S); is the peer-
assisted bandwidth throughput provided by all the other peers in the set who currently hold
segment in buffer (in other words it can be conceptualized as the number of segments
downloaded by peer from other peers in set currently holding in buffer); is the
maximal upload bandwidth from all peers that can contribute to . For solving the above
problem, the notion of shortage bandwidth with respect to segment i is defined as follows:
∑( )
∑
and we denote
[ ]
where can be defined as the expected shortage bandwidth in the peer-set which is
actually the gap between the demand bandwidth and the available bandwidth supported by peer-
set . Thus, we can obtain the following:
∑
Our objective is to find an adaptation strategy such that the average upload bandwidth
consumption U of the server S can be minimized. The shortage bandwidth can be efficiently
calculated in an iterative way as follows:
∑
{
(∑( )
)
where ; since the first peer entering do not have any suppliers and have to
download the segment from server. Based on this framework, we can show the impact of
popularity indices of each segment and shortage bandwidth on the server upload bandwidth
consumption. To model the number of peers ( ) currently holding in buffer, we assume the
Zipf-based popularity distribution as already defined before. Let a random variable , denote the
probability for segment having number viewers ( number of peers possessing segment
in buffer ) as follows:
( ) ⁄
∑ ⁄
Thus, the average upload bandwidth consumption of server S can be derived as follows:
∑{ ⁄
∑ ⁄
∑[ ( ) ⁄
∑ ⁄
]
}
Now, the model of server load optimization can be formulated as follows: ;
In general it is difficult and also not a practical approach to find a closed form solution of this
optimization problem which will require global knowledge not inherently present for scalable
P2P systems. Rather, we define the following practical and distributed solution where the
estimated update interval adaptation is based upon shortage bandwidth proportionality ratio
which can be derived as follows:
∑
We can estimate the denominator ∑ by exploiting the distributed averaging algorithm
as discussed before. This strategy of update adaptation based upon shortage bandwidth
proportionality ratio is found to produce good result as shown later in experimental evaluation.
Search Cost-Server Load Joint Optimization
Now, we define the joint optimization problem of search cost and server load in a single function
as follows:
∑{
⁄
∑ ⁄
( ⁄
∑ ⁄
)}
∑{ ⁄
∑ ⁄
∑[ ( ) ⁄
∑ ⁄
]
}
and the objective is: ( ) .
We derive the joint optimization solution by a linear combination of their respective solutions as
follows:
∑
∑
where and are their respective weightage values and can be tuned to find the
implications and we study them in experimental evaluation. Both the denominators can be
estimated using the previous distributed averaging algorithm piggybacked in the same
communication message.
EXPERIMENTAL EVALUATION
We present extensive simulation results to validate our models and evaluate the performance of
our adaptation strategies with the help of different system properties. We implemented a discrete
event-driven simulator for various P2P operations in C++. All the P2P events such as media
playback, random jump, and peer join/leave/failure are simulated by events scheduled at
respective times. Chord (Stoica, Morris, Karger, Kaashoek, & Balakrishnan, 2001) is used as the
base DHT due to its simplistic construction and provable performance guarantees. The following
adaptation strategies are evaluated:
STATIC: No query adaptation strategy employed;
SP: Query adaptation based upon proportionality ratio of segment popularities;
∑
; .
SB: Query adaptation based upon shortage bandwidth proportionality ratio;
∑
; .
SP-SB: Joint query adaptation based upon linear combination of popularity and shortage
bandwidth proportionality ratios;
∑
∑
; .
Some of the simulator details are as follows: The underlying network topology generated using
GT-ITM (Zegura, Calvert, & Bhattacharjee, 1996) consist of 15 transit domains, each with 25
transit nodes, and each transit node connected to 10 stub domains, each with 15 stub nodes. We
randomly place the server in a transit node and peers in the stub nodes. The latency of each link
is computed in proportional to the Euclidean distance between the nodes. For each point in the
plot, we repeated the placement and simulation for 10 times to mitigate the effect of randomness.
The number of peers (N) in the system is varied from 256 to 4096. We model the user arrival
process as a Poisson distribution with an inter-arrival time λ=1 sec. The peer lifetime is modeled
as an exponential distribution with an expected mean of 30 mins. The peer upload bandwidth
( ) is randomly distributed between 250~1000 kbps and the video data rate is d=500 Kbps. The
user request pattern or the segment popularities follow a Zipf distribution with different values of
α. Each segment size (d) is set to be 3.84 MB which corresponds to one minute of video length.
The total viewing length of the video stream is 128 mins and each simulation session is set for 2
hrs. Other parameters are: =4 Mbps, =500 Mbps; k=5, b=4.
Query Resolution Cost
The query resolution cost is measured by the number of lookup hops required for returning the
result set. It is calculated by the average of lookup hops initiated through the entire query set by
all the peers in the system in a simulation session. It is an important P2P performance metric
which enables to control the query messaging overhead and also improve the chances of the
request to be served within the deadline. Figure 2 illustrates the performance of different
adaptation strategies with respect to the average number of lookup hops for various peer
populations. Some of the fixed parameters for this plot are: α=1.0; ; . It can
be observed that adaptation strategies (SP/SB/SP-SB) provide considerable performance gains
compared to no-adaptation strategies (STATIC). Among the adaptation strategies, SP provides
slight improvement over SB which is justified since SP is specifically geared to minimize the
lookup cost, but the joint adaptation technique (SP-SB) performs slightly better among all the
variants.
Figure 2: Plot of average number of lookup hops for different number of peers in the system.
Now, we study the effect of skewness degree of the popularity distribution (i.e., variation of α)
with respect to the different strategies. The performance metric is kept the same (i.e., query
resolution cost in terms of average lookup hops) and the skew degree (i.e., α) is varied from 1.0
to 5.0 for N=406 peers plotted in Figure 3. The no-adaptation strategy fairs badly with increasing
α as expected. SP performs better than SB/SP-SB which is justified since the adaptation is tuned
in proportion to the estimated query popularities (i.e., ) . SP performs better with increasing
value of α, which indicates that is able to capture the variation of skew (i.e., α).
Figure 3: Plot of lookup cost for different popularity skew degree (Zipf constant ) with N=4096.
Request Rejection Ratio
0
2
4
6
8
10
12
14
16
256 512 1024 2048 4096
Avg
# o
f lo
oku
p h
op
s
Number of peers (N)
STATIC
SP
SB
SP-SB
0
5
10
15
20
25
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
Avg
# o
f lo
oku
p h
op
s
Zipf constant (α)
STATIC
SP
SB
SP-SB
The next performance metric for our experimental study is request rejection ratio which is
defined by the ratio of the number of requests fulfilled with respect to the total number of
requests initiated by all the peers in a P2P system. There are various types of requests made by
the peers at different time but we will be restricting ourselves only to the requests dealing with
content/bandwidth and do not consider any requests for control parameters for effective
performance evaluation since they are not a major focus in this framework. Figure 4 plots the
variation of request rejection rate for different population of peers in the P2P system. The no-
adaptive strategy generates the highest rejection rate since the large number of popular queries
traverse through a long query path which invokes request rejections and high query overhead.
SP-SB and SB strategies give good performance values since they adapt the query resolution
based on popularity and available bandwidth. The increase in peer population does not seem to
have a high influence on rejection rate which also renders the strategies (SB/SP-SB) to be
scalable.
Next, we study the variation of request rejection with change in popularity models with different
skew ratios. Figure 5 plots the request rejection rates for the different adaptation strategies with
varying α from 1.0 to 5.0. In accordance to the previous discussions, the STATIC strategy is not
able to control the high request rejections in the system and it is observed that it grows quite
significantly with increasing α. It is not a scalable solution and in a system with 4096 peers, the
rejection rate increase from 48.47% (α=1.0) to 60.39% (α=5.0) which is not a desirable property.
SP-SB strategy performs the best and even with increase in α it is able to considerable reduce the
request rejection rate. SB performs significantly better than SP which suggests that available
bandwidth based adaptation is better suited to minimize the request rejection rate as compared to
popularity based adaptation.
Figure 4: Plot of request rejection rate for different system size at α=1.0
0
10
20
30
40
50
60
256 512 1024 2048 4096
% R
eq
ue
st R
eje
ctio
n
Number of peers (N)
STATIC
SP
SB
SP-SB
Figure 5: Variation of request rejection rate with different popularity skew (α) for N=4096.
Server bandwidth consumption
We study the server load or upload bandwidth consumption which is one of the most important
concerns for content providers with respect to varying system parameters. Figure 6 illustrates the
variation of different adaptation strategies for varying system sizes on the server bandwidth
consumption where the popularity model is kept constant at α=1.0. The no-adaptation strategy
(STATIC) does not perform quite well and generate a steep increase of server load from75.0932
(N=256) to 557.6189 (N=4096). The shortage bandwidth based adaptation strategy (SB)
performs the best with a higher control on server load from 51.5297 (N=256) to 273.3985
(N=4096) rendering it to be a scalable solution. Joint adaptation (SP-SB) also performs
considerably better than SP which infers that the request popularity based adaptation does not
help to reduce the server load to a significant extent. Instead SB and SP-SB strategies are more
preferred for the conservation of server bandwidth.
0
10
20
30
40
50
60
70
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
% R
eq
ue
st R
eje
ctio
n
Zipf constant (α)
STATIC
SP
SB
SP-SB
0
100
200
300
400
500
600
256 512 1024 2048 4096
Nu
mb
er
of
serv
er
stre
ams
Number of peers (N)
STATIC
SP
SB
SP-SB
Figure 6: Server stress for varying user population with α=1.0
Next, we study the effect of varying segment popularity skew and its influence on the server load
generated by the four variants of adaptation technique as depicted in Figure 7. Note, that the
range of X-axes values are magnified to fit in the (max, min) range for studying the properties of
each curve in a higher granularity. We make the following observations: (1) STATIC increases
consistently with higher values of α and the average rate of increase for every interval (i.e.,
increase of α=0.5) is 4.08685 with the least in α= [2.5→3.0] interval (value=1.7078); (2) SP
performs better than STATIC but still generate a considerable server load and the curve is
somewhat invariant to change of α with an average value of 493.9939; (3) The performance of
SP-SB is better than the above two and is able to lower the server consumption to a considerable
extent. The curve is initially invariant to α till 3.5 but after that it consistently shows a better
performance in lowering the server load and thus can be concluded that this adaptation strategy
performs well after α=3.5; (4) SB performs the best in terms of conserving server bandwidth and
a closer look at the curve shows that it is able to consistently drop the server load till α=3.5 after
which it seems to stay at a constant level (α>3.5). So, it can be concluded that the best operating
point for SB strategy is α≤3.5.
Streaming Quality
Though all the above metrics are important in a P2P system performance point of view, but
streaming quality is a more relevant parameter from a user-centric perspective for improving
Quality of Experience (QoE). Streaming Quality can be defined from various context, but here
we consider playback continuity which can be defined as the number of segments that are
received within deadline and used for continuous playback divided by the total number of
segments that can fit in its lifetime. Figure 8 plots the result for different values of α from 1.0 to
5.0 with N=4096 peers. The general observations are: no-adaptive (STATIC) strategy has the
lowest playback continuity index; performance of SP and SB are within comparable limits; SP-
SB produces the best result with consistently high continuity. Note that, the range of X-axes is
made to fit in the (max, min) range to get a closer look. Now, let us take a detail look in each of
the respective curves as follows: (1) STATIC undergoes a consistent drop in continuity as the
value of α varies from 1.0 to 5.0 with an average index of 0.7551 which means that it is unable to
download 25% segments within deadline due to content/bandwidth deficiency; (2) SP improves
the continuity index from 0.8528 (α=1.0) to 0.8902 (α=2.5) but after α=2.5 it consistently drops
to 0.7848 (α=5.0) and thus its performance is reasonable till α=2.5; (3) SB have an initial drop of
continuity index from 0.8424 (α=1.0) to 0.7751 (α=3.0), but improves consistently in the later
part till 0.9031 (α=5.0) and thus its performance starts to enhance post α=3.0; (4) SP-SB
generates the best streaming quality among all the four techniques and consistently improves the
continuity from 0.9128 (α=1.0 ) to 0.9838 (α=3.0) after which it saturates and levels off without
any further possible improvements.
Figure 7: Server load for different popularity ratios (α) with N=4096.
Figure 8: Playback continuity index variation for different popularity ratios with N=4096.
Finally, we study the inter-relationship of weights and in the joint adaptation strategy
SP-SB and their effects on streaming quality for a fixed segment popularity distribution (α=5.0)
illustrated in Figure 9. From the figure it can be observed that the optimal weight ratios are
.4 and where it generates the highest continuity index of 0.9916. We have
also experimented with different popularity distribution skew (i.e., different values of α) and the
results follow similar trend with similar optimal weights and . These values are obtained
with our experimental assumptions for synthetic workload pattern and we do not claim that these
are universally optimal. More informed values can be derived by experimenting with real
network traces and the operating point can be dynamically adjusted in real and dynamic
environments.
245
295
345
395
445
495
545
595
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
Nu
mb
er
of
serv
er
stre
ams
Zipf constant (α)
STATIC
SP
SB
SP-SB
0.69
0.74
0.79
0.84
0.89
0.94
0.99
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
Co
nti
nu
ity
Ind
ex
Zipf constant (α)
STATIC
SP
SB
SP-SB
Figure 9: Plot of streaming quality variation with different values of and for α=5.0
CONCLUSION
Query adaptation is important in the context of Temporal-DHT, especially when the popularity
distributions are skewed. We formulated optimization problems to address search cost and server
load. We derived practical optimized solutions which can help to adapt the query resolution
mechanism for dealing with popularity skewed distribution of content. The essential objective is
to minimize the search cost and server load which are important performance parameters in the
context of a P2P-VoD system. The basic mechanism involves the adaptation of object update
interval for minimizing the search cost and server load under dynamic changing content
popularity distributions. We also showed distributed approaches for estimating the adaptation
parameter ratios reliably. Simulation results demonstrated the effectiveness of the proposed
techniques for improving various performance indicators such as search cost, request rejection
rate, server bandwidth consumption, and streaming quality.
REFERENCES
Bhattacharya, A., Yang, Z., & Pan, D. (2011). Popularity Awareness in Temporal-DHT for P2P-
based Media Streaming Applications. IEEE International Symposium on Multimedia, (pp. 241-
248).
Bhattacharya, A., Yang, Z., & Zhang, S. (2010). Temporal DHT and its Application in P2P-VoD
Systems. IEEE International Symposium on Multimedia, (pp. 81-88).
Chu, Y., Rao, S., & Zhang, H. (2000). A case for end system multicast. Proceedings of the 2000
ACM SIGMETRICS international conference on Measurement and modeling of computer
systems, (pp. 1-12).
0.69
0.74
0.79
0.84
0.89
0.94
0.99
0.0 0.2 0.4 0.6 0.8 1.0
Co
nti
nu
ity
Ind
ex
α_SP
α_SB = 1.0
α_SB = 0.8
α_SB = 0.6
α_SB = 0.4
α_SB = 0.2
α_SB = 0.0
Cohen, E., & Shenker, S. (2002). Replication strategies in unstructured peer-to-peer networks.
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols
for computer communications, (pp. 177-190).
Cooper, B. (2005). An optimal overlay topology for routing peer-to-peer searches. Proceedings
of the ACM/IFIP/USENIX 6th international conference on Middleware, (pp. 82-101).
Dabek, F., Kaashoek, M. F., Karger, D., Morris, R., & Stoica, I. (2001). Wide-area cooperative
storage with CFS. Proceedings of the eighteenth ACM symposium on Operating systems
principles, (pp. 202-215).
Gopalakrishnan, V., Silaghi, B., Bhattacharjee, B., & Keleher, P. (2004). Adaptive replication in
peer-to-peer systems. Distributed Computing Systems, 2004. Proceedings. 24th International
Conference on, (pp. 360-369).
Hei, X. L., Liang, J., Liu, Y., & Ross, K. (2007). A Measurement Study of a Large-Scale P2P
IPTV System. IEEE Transactions on Multimedia , 1672 -1687.
Huang, C., Li, J., & and Ross, K. W. (2007). Can internet video-on-demand be profitable?
Proceedings of the ACM SIGCOMM 2008 Conference on Applications,Technologies,
Architectures, and Protocols for Computer Communications, (pp. 375-388).
Kyoungwon, S., Diot, C., Kurose, J., Massoulie, L., Neumann, C., Towsley, D., et al. (2007).
Push-to-Peer Video-on-Demand System: Design and Evaluation. IEEE Journal on Selected
Areas in Communications , 1706-1716.
Liao, X., Jin, H., Liu, Y., Ni, L., & Deng, D. (2006). AnySee: Peer-to-Peer Live Streaming.
INFOCOM 2006. 25th IEEE International Conference on Computer Communications.
Proceedings, (pp. 1-10).
Qiu, D., & Srikant, R. (2004). Modeling and performance analysis of BitTorrent-like peer-to-
peer networks. Proceedings of the 2004 conference on Applications, technologies, architectures,
and protocols for computer communications, (pp. 367-378).
Ramasubramanian, V., & S, G. E. (2004). Beehive: O(1) Lookup Performance for Power-Law
Query Distributions in Peer-to-Peer Overlays. USENIX NSDI.
Rao, W., Chen, L., Fu, A. W., & Wang, G. (2010). Optimal Resource Placement in Structured
Peer-to-Peer Networks. IEEE Transactions on Parallel and Distributed Systems , 1011-1026.
Ratnasamy, S., Francis, P., Handley, M., Karp, R., & Shenker, S. (2001). A scalable content-
addressable network. Proceedings of the 2001 conference on Applications, technologies,
architectures, and protocols for computer communications, (pp. 161-172).
Rowstron, A., & Druschel, P. (2001). Pastry: Scalable, Decentralized Object Location, and
Routing for Large-Scale Peer-to-Peer Systems. Proceedings of the IFIP/ACM International
Conference on Distributed Systems Platforms Heidelberg (pp. 329-350). Springer-Verlag.
Rowstron, A., & Druschel, P. (2001). Storage management and caching in PAST, a large-scale,
persistent peer-to-peer storage utility. Proceedings of the eighteenth ACM symposium on
Operating systems principles, (pp. 188-201).
Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., & Balakrishnan, H. (2001). Chord: A
scalable peer-to-peer lookup service for internet applications. Proceedings of the 2001
conference on Applications, technologies, architectures, and protocols for computer
communications, (pp. 149-160).
Tan, B., & Massoulie, L. (2011). Optimal content placement for peer-to-peer video-on-demand
systems. INFOCOM, 2011 Proceedings IEEE, (pp. 694-702).
Tewari, S., & Kleinrock, L. (2006). Proportional Replication in Peer-to-Peer Networks.
INFOCOM 2006. 25th IEEE International Conference on Computer Communications.
Proceedings, (pp. 1-12).
Wu, J., & Li, B. (2009). Keep Cache Replacement Simple in Peer-Assisted VoD Systems.
INFOCOM 2009, IEEE, (pp. 2591-2595).
Wu, W., & Lui, J. C. (2011). Exploring the optimal replication strategy in P2P-VoD systems:
Characterization and evaluation. INFOCOM, 2011 Proceedings IEEE, (pp. 1206-1214).
Yann, H., Fu, T., Chiu, D. M., Lui, J., & Huang, C. (2008). Challenges, design and analysis of a
large-scale p2p-vod system. Proceedings of the ACM SIGCOMM 2008 conference on Data
communication, (pp. 375-388).
Yiu, W. P., J, X., & Chan, S. H. (2007). VMesh: Distributed Segment Storage for Peer-to-Peer
Interactive Video Streaming. IEEE Journal on Selected Areas in Communications , 1717-1731.
Zegura, E. W., Calvert, K. L., & Bhattacharjee, S. (1996). How to model an internetwork.
INFOCOM '96. Fifteenth Annual Joint Conference of the IEEE Computer Societies. Networking
the Next Generation. Proceedings IEEE, (pp. 594-602).
Zhang, X., Liu, J., Li, B., & Yum, Y. (2005). CoolStreaming/DONet: a data-driven overlay
network for peer-to-peer live media streaming. INFOCOM 2005. 24th Annual Joint Conference
of the IEEE Computer and Communications Societies. Proceedings IEEE, (pp. 2102-2111).
Zhou, Y., Fu, T. Z., & Chiu, D. M. (2011). Statistical modeling and analysis of P2P replication to
support VoD service. INFOCOM, 2011 Proceedings IEEE, (pp. 945-953).