1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura,...
-
Upload
rosamund-curtis -
Category
Documents
-
view
214 -
download
1
Transcript of 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura,...
1
An Adaptive File Distribution Algorithmfor Wide Area Network
Takashi Hoshino, Kenjiro Taura, Takashi ChikayamaUniversity of Tokyo
2
Background
New environments for parallel and distributed computationClusters, cluster of clusters, GRID
Offer scalability and good cost performanceSetting up computation in such
environments is complex, howeverInstall programs/data
3
Setting up computation in DS
Often involves copying large programs/data to many nodes
Manually copying large files is troublesome because:faults occur easilyfirewalls block (some) connectionstransfers must be scheduled carefully for good
performance
4
Contribution
NetSyncA file replicator optimized for copying large data
to many nodes in parallel (application-level)Features
Automatic load-balancingscalability
Self-stabilizing construction of transfer route fault-tolerant
Adaptive optimization of transfer routeNo reliance on physical topology information
5
Outline
What are efficient/inefficient transfer routes?DemoAlgorithm
Base algorithmAdaptive optimization
ImplementationExperimentsRelated workSummary and future work
6
Inefficient Transfer Routes
Many inter-subnet/cluster transfer connections
Many branches
Node
Subnet/cluster
Data transfer line
7
What’s Wrong with Branches?
Branchesshare hardware capability of nodes themselves
CPU powerDisk performanceNIC ability
enlarge possibilities of bottleneck
CPU
NIC
DISK
CPU
NIC
DISK
No bottleneck Bottleneck
One child Three children
100Mbps x1 33Mbps x3
8
Efficient Transfer Route
Minimum inter-subnet/cluster transfer connections
No or minimum branches
Node
Subnet/cluster
Data transfer line
9
Demo
Playback of our experiment using logs
A00
A01
B00 A07
A06
A05
A04
A03
A02
B07
B06
B05
B04
B03
B02
B01
CXX NodeData flow (Parent-Child)
A00
A01
B00
A07
A06
A05
A04
A03
A02
B07
B06
B05
B04
B03
B02
B01
11
Algorithm
Simple base algorithmFault-tolerance, scalability, self-stabilization
Add-on adaptive optimization heuristicsWell-adapted today’s typical network
Very easy configurationOnly need information of (some) neighborsNeed no physical topologyNeed no performance measurement
Pseudo-code is described in our paper
12
Base Algorithm (1)
Each node seeks a node to be its parentPipeline transfer in whole nodesFault leads to seeking new parent again
100%
0%
0%
0%
0%
25%
25%
50%
25%
25%
50%
75%
50%75%
50%
75%
100%
75%
100% 10
0%
100%
14
Base Algorithm (2)
Child (has not its parent) sidesend ASK to candidate to be its parentrecv OKstart getting datarecv NGseeks candidate again
Parent (received ask message) siderecv ASK from a node if my offset > node’s offset and # of children < LIMIT_CHILDREN then send OK and start putting data else send NG end
15
Adaptive Optimization
Two heuristicsNearParentTree2List
16
NearParent Heuristics
NearParent: reduce "long" connectionsEach node changes its parent to a closer node
parent candidate
self self
candidateparent
17
Tree2List Heuristics
Tree2List: reduce branchesIf the current parent is not closer than one of its
siblings X, change its parent to XA node which has more than one children
suggests its children to change their parent to one of their siblings
selfX
parent
X
parent
self
18
How to measure closeness?
FeaturesThroughputLatencyPrefix of IP address
A
BC
19
Property of Heuristics (1)
Assuming there is no firewall…1. Minimum inter-cluster/subnet connections2. All nodes connect each other as a list
subnet/cluster
20
Property of Heuristics (2)
If firewall blocks some connections…1. Minimum inter-cluster/subnet connections2. N – 1 branches for N subnets (assume
no firewalls inside a subnet)
subnet/cluster
Firewall
23
Property of Heuristics (3)
If multiple levels of groups exist (subnets, clusters), it optimizes all levels simultaneouslyMinimum inter-subnet edgesMinimum inter-cluster edges
subnet cluster
24
Implementation
File replicator for a large data and many nodes written in Java
Ability of detecting latency: about 1ms
Usage:Install and run NetSync in all nodes
Throw a file information to several nodes
Wait for finishing the replication
Very simple usage!!!
25
Experiments
Measure performance of our heuristics
Distributed a file to many nodesCompared completion time
EnvironmentsA single clusterMultiple clusters
26
Experiment in a single cluster (1)
Distributed 500MB from one node to other16nodes in the cluster
Only NIC (100Mbps) can be bottleneck
Compared two settingsRandom Tree
Only using base algorithmLimited # of children from 1 to 5.
Tree2ListNearParent has no effect
27
Experiment in a single cluster (2) Fewer children, better performance Tree2List is very close to optimal Limit 1 is not scalable (using our base algorithm)
28
Experiment in multiple clusters (1)
Distributed 300MB to over 150 nodes in seven clusters
Heuristics on, off, and fixed manually optimized tree
1G
100M
100M
1G100M100M
1G
100M
100M100M
100M
100M
100M
29
Experiment in multiple clusters (2)
Our heuristics is close to the ideal fixed tree
30
Related Work
Application-level MulticastOvercast[Jannotti], ALMI[Pendarakis], etc.Aims to optimize bandwidth and latency
Content Distribution Network (CDN)Has roots in HTTP accelerator and HTTP proxy.Aims to optimize latency and load-balancing.
Our approachMaximize throughput, even if sacrificing latency
31
Summary and Future Work
We designed a simple algorithmfor copying large data to many nodes in parallelwith fault-tolerance, scalability, self-
organization, and adaptive optimizationEvaluations show our implementation is
effective in real environment
Future WorkIntegration with searching for contents, or
storage systems for distributed computing