Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?
-
Upload
brett-case -
Category
Documents
-
view
26 -
download
5
description
Transcript of Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?
![Page 1: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/1.jpg)
Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?
Samer Al-Kiswany – University of British Columbia
joint work with
Matei Ripeanu – University of British Columbia
Adriana Iamnitchi - University of South Florida
Sudharshan Vazhkudai - Oak Ridge National Laboratory
![Page 2: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/2.jpg)
2
Introduction
Data-intensive science: large-scale simulations and new scientific instruments generate huge volumes of data (PetaBytes).
User communities: large, geographically dispersed
Requirement : Efficient data dissemination tools
Samer Al-Kiswany EuroPar ‘07 /26
![Page 3: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/3.jpg)
3
Introduction - Example
Samer Al-Kiswany EuroPar ‘07 /26
![Page 4: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/4.jpg)
4
Question ?
What data dissemination strategies perform best in today's Grids deployments?
Samer Al-Kiswany EuroPar ‘07 /26
Data dissemination solutions: IP-Multicast, Bullet, BitTorrent, SPIDER, OMNI, ALMI, Logistical-Multicast, Narada, Scribe, GridoGrido, FastReplica… and many others.
![Page 5: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/5.jpg)
5
Workload characteristics
Deployment platform characteristics
Data dissemination proposed solutions
Evaluation Recommendations
What data dissemination strategies perform best in today's Grids deployments?
Roadmap
Samer Al-Kiswany EuroPar ‘07 /26
![Page 6: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/6.jpg)
6Samer Al-Kiswany EuroPar ‘07 /26
Data-intensive scientific collaboration characteristics:
Scale of data: massive data collections (TeraBytes) Data usage: Uniform popularity distributions, and co‑usage
Workload and Deployment Platform
Resource availability: low churn rate, high node availability, well-provisioned networks.
Collaborative environments: no freeriding, thus less effort is needed to control fair resource sharing
Deployment platform characteristics:
![Page 7: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/7.jpg)
7
Workload characteristics
Deployment platform characteristics
Data dissemination proposed solutions
Evaluation Recommendations
What data dissemination strategies perform best in today's Grids deployments?
Roadmap
Samer Al-Kiswany EuroPar ‘07 /26
![Page 8: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/8.jpg)
8
Classification of Approaches
TechniqueTechnique ProtocolProtocol
Tree based techniques ALM and SPIDER
Swarming Bullet and BitTorrent
Techniques employing intermediate storage capabilities
Logistical Multicasting
Samer Al-Kiswany EuroPar ‘07 /26
Base Cases:• IP-Multicast.• Parallel transfers: separate data channels from the source to
each destination.
![Page 9: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/9.jpg)
9
Separate Transfer from the Source to every Destination
/26
Drawbacks:
• Overwhelms the source – does not scale
• Generates high duplicate traffic at the links around the source
• Does not exploit all available transport capacity.
![Page 10: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/10.jpg)
10
IP Multicasting
/26
10
10
10
10
1010
10
10
1010
10
5
10
10
10
10
1010
10
10
1010
10
5
![Page 11: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/11.jpg)
11
IP Multicast
/26
Drawbacks:
• Limited deployment
• Vulnerability to nodes failures
• Does not exploit all available transport capacity.
• Throughput limited by bottleneck link
10
10
10
10
1010
10
10
10
10 10
5
![Page 12: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/12.jpg)
12
Tree Based Techniques: Application Level Multicast (ALM)
Source
1
3
2
4
5
6
Source
1 5
6 3 24
ALM Tree
/26
![Page 13: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/13.jpg)
13
Tree Based Techniques: Application Level Multicast (ALM)
/26
Source
1
3
2
4
5
6
Source
1 5
6 3 24
ALM Tree
Drawbacks:
• Vulnerability to nodes failures
• Does not exploit all possible routes in the network.
![Page 14: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/14.jpg)
14
Swarming Techniques: BitTorrent and Bullet
1 2 3 4Complete file
12
3
/26
4
![Page 15: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/15.jpg)
15
4
Swarming Techniques: BitTorrent and Bullet
1 2 3 4Complete file
1
2
3
4
1
/26
3
1
2
![Page 16: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/16.jpg)
16
Swarming Techniques: BitTorrent and Bullet
/26
1 2 3 4Complete file
12
3
4
1
1
2
3
4
Drawbacks:
• Generates high duplicate traffic.
![Page 17: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/17.jpg)
17
Logistical Multicasting
/26
![Page 18: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/18.jpg)
18
Roadmap
Question: What data dissemination strategies perform best in today's Grids deployments?
Evaluation
Workload characteristics
Deployment platform characteristics
Data dissemination proposed solutions
Recommendations
Samer Al-Kiswany EuroPar ‘07 /26
Analytical Modeling Implementation Simulation
Evaluation Approaches:
![Page 19: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/19.jpg)
19Samer Al-Kiswany
Methodology
Simulator Design:• Block-level simulation.• Simulates physical layer link-contention
EuroPar ‘07 /26
Inputs:- Real topologies of three deployed Grid testbeds: LCG, GridPP, EGEE.- Generated topologies: 100 (using BRITE)
![Page 20: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/20.jpg)
20Samer Al-Kiswany
Methodology
EuroPar ‘07 /26
Success criteria Metrics
Dissemination time Transfer time.
Overhead MB x hop
Load balancing Volume of in/out data.
Fairness Link stress
![Page 21: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/21.jpg)
21
Transfer Time
Number of destinations that have completed the file transfer for the original EGEE topology.
0
5
10
15
20
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19Time (10s)
# of
com
plet
ed tr
ansf
ers
. Logistical MT
0
5
10
15
20
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19Time (10s)
# of
com
plet
ed tr
ansf
ers
.
Bullet
ALM
Logistical MT
BitTorrent
0
5
10
15
20
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19Time (10s)
# of
com
plet
ed tr
ansf
ers
.
BulletALMIP-Multicast
Logistical MTBitTorrent
0
5
10
15
20
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19Time (10s)
# of
com
plet
ed tr
ansf
ers
.
BulletSeparate transfALMIP-MulticastLogistical MTBitTorrent
Samer Al-Kiswany EuroPar ‘07 /26
![Page 22: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/22.jpg)
22
Transfer Time – With reduced core-link bandwidth
Number of destinations that have completed the file transfer – EGEE topology with core bandwidth reduced to 1/8 of the
original one.
0
5
10
15
20
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30Time (10s)
# of
com
plet
ed tr
ansf
ers
.
Logistical MT
Conclusions:• On well-provisioned
topologies even naïve algorithms perform well.
• On constrained topologies application‑level techniques perform uniformly well: are among the first to finish the transfer with good intermediate progress,
0
5
10
15
20
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30Time (10s)
# of
com
plet
ed tr
ansf
ers
.
Bullet
ALM
Logistical MT
BitTorrent
0
5
10
15
20
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30Time (10s)
# of
com
plet
ed tr
ansf
ers
.
Bullet
ALM
IP-Multicast
Logistical MT
BitTorrent
0
5
10
15
20
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30Time (10s)
# of
com
plet
ed tr
ansf
ers
.
Bullet
Separate transf
ALM
IP-Multicast
Logistical MT
BitTorrent
Samer Al-Kiswany EuroPar ‘07 /26
![Page 23: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/23.jpg)
23
Protocol Overhead – Metric Definition
Samer Al-Kiswany EuroPar ‘07 /26
1
1
Useful
DuplicateUseful
![Page 24: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/24.jpg)
24
Protocol Overhead
Overhead of each protocol on EGEE Topology.
0
20
40
60
80
100
Bullet BitTorrent IP-Multicast ALM Separatetransfers
Tot
al tr
afic
vol
ume
(GB
) .
Duplicate
Useful
Conclusion:
Application-level techniques generates significant overheads. Up to 4 times more than IP layer solutions.
Reasons:
Samer Al-Kiswany EuroPar ‘07 /26
The dissemination decisions is based on application level metrics.
Ignore node topology location.
![Page 25: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/25.jpg)
25
Fairness
Link stress distribution for the EGEE topology. For BitTorrent and Bullet the plot presents maximum link stress.
0
5
10
15
20
25
30
0 10 20 30 40 50 60Rank ( links ranked by max. # of flows)
Num
ber
of f
low
sBullet Max
BitTorrent Max
ALM
Conclusion:
Application‑level solutions have a considerable impact on competing traffic.
Samer Al-Kiswany EuroPar ‘07 /26
![Page 26: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/26.jpg)
26
Summary
Samer Al-Kiswany EuroPar ‘07 /26
Motivating question: What data dissemination strategies perform best in today's Grids deployments?
In this project, we:
Simulated representative solutions.
Considering the characteristics of the workload and deployed platforms
Our results provide guidelines for selecting the data dissemination technique, depending on the:
Target environment.
Overall system workload characteristics.
Success Criteria.
![Page 27: Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?](https://reader035.fdocuments.net/reader035/viewer/2022062321/56813196550346895d9809eb/html5/thumbnails/27.jpg)
27
Thank you
www.ece.ubc.ca/~samera