1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura,...

1

An Adaptive File Distribution Algorithmfor Wide Area Network

Takashi Hoshino, Kenjiro Taura, Takashi ChikayamaUniversity of Tokyo

2

Background

New environments for parallel and distributed computationClusters, cluster of clusters, GRID

Offer scalability and good cost performanceSetting up computation in such

environments is complex, howeverInstall programs/data

3

Setting up computation in DS

Often involves copying large programs/data to many nodes

Manually copying large files is troublesome because:faults occur easilyfirewalls block (some) connectionstransfers must be scheduled carefully for good

performance

4

Contribution

NetSyncA file replicator optimized for copying large data

to many nodes in parallel (application-level)Features

Automatic load-balancingscalability

Self-stabilizing construction of transfer route fault-tolerant

Adaptive optimization of transfer routeNo reliance on physical topology information

5

Outline

What are efficient/inefficient transfer routes?DemoAlgorithm

Base algorithmAdaptive optimization

ImplementationExperimentsRelated workSummary and future work

6

Inefficient Transfer Routes

Many inter-subnet/cluster transfer connections

Many branches

Node

Subnet/cluster

Data transfer line

7

What’s Wrong with Branches?

Branchesshare hardware capability of nodes themselves

CPU powerDisk performanceNIC ability

enlarge possibilities of bottleneck

CPU

NIC

DISK

CPU

NIC

DISK

No bottleneck Bottleneck

One child Three children

100Mbps x1 33Mbps x3

8

Efficient Transfer Route

Minimum inter-subnet/cluster transfer connections

No or minimum branches

Node

Subnet/cluster

Data transfer line

9

Demo

Playback of our experiment using logs

A00

A01

B00 A07

A06

A05

A04

A03

A02

B07

B06

B05

B04

B03

B02

B01

CXX NodeData flow (Parent-Child)

A00

A01

B00

A07

A06

A05

A04

A03

A02

B07

B06

B05

B04

B03

B02

B01

11

Algorithm

Simple base algorithmFault-tolerance, scalability, self-stabilization

Add-on adaptive optimization heuristicsWell-adapted today’s typical network

Very easy configurationOnly need information of (some) neighborsNeed no physical topologyNeed no performance measurement

Pseudo-code is described in our paper

12

Base Algorithm (1)

Each node seeks a node to be its parentPipeline transfer in whole nodesFault leads to seeking new parent again

100%

0%

0%

0%

0%

25%

25%

50%

25%

25%

50%

75%

50%75%

50%

75%

100%

75%

100% 10

0%

100%

14

Base Algorithm (2)

Child (has not its parent) sidesend ASK to candidate to be its parentrecv OKstart getting datarecv NGseeks candidate again

Parent (received ask message) siderecv ASK from a node if my offset > node’s offset and # of children < LIMIT_CHILDREN then send OK and start putting data else send NG end

15

Adaptive Optimization

Two heuristicsNearParentTree2List

16

NearParent Heuristics

NearParent: reduce "long" connectionsEach node changes its parent to a closer node

parent candidate

self self

candidateparent

17

Tree2List Heuristics

Tree2List: reduce branchesIf the current parent is not closer than one of its

siblings X, change its parent to XA node which has more than one children

suggests its children to change their parent to one of their siblings

selfX

parent

X

parent

self

18

How to measure closeness?

FeaturesThroughputLatencyPrefix of IP address

A

BC

19

Property of Heuristics (1)

Assuming there is no firewall…1. Minimum inter-cluster/subnet connections2. All nodes connect each other as a list

subnet/cluster

20


If firewall blocks some connections…1. Minimum inter-cluster/subnet connections2. N – 1 branches for N subnets (assume

no firewalls inside a subnet)

subnet/cluster

Firewall

23


If multiple levels of groups exist (subnets, clusters), it optimizes all levels simultaneouslyMinimum inter-subnet edgesMinimum inter-cluster edges

subnet cluster

24

Implementation

File replicator for a large data and many nodes written in Java

Ability of detecting latency: about 1ms

Usage:Install and run NetSync in all nodes

Throw a file information to several nodes

Wait for finishing the replication

Very simple usage!!!

25

Experiments

Measure performance of our heuristics

Distributed a file to many nodesCompared completion time

EnvironmentsA single clusterMultiple clusters

26

Experiment in a single cluster (1)

Distributed 500MB from one node to other16nodes in the cluster

Only NIC (100Mbps) can be bottleneck

Compared two settingsRandom Tree

Only using base algorithmLimited # of children from 1 to 5.

Tree2ListNearParent has no effect

27

Experiment in a single cluster (2) Fewer children, better performance Tree2List is very close to optimal Limit 1 is not scalable (using our base algorithm)

28

Experiment in multiple clusters (1)

Distributed 300MB to over 150 nodes in seven clusters

Heuristics on, off, and fixed manually optimized tree

1G

100M

100M

1G100M100M

1G

100M

100M100M

100M

100M

100M

29

Experiment in multiple clusters (2)

Our heuristics is close to the ideal fixed tree

30

Related Work

Application-level MulticastOvercast[Jannotti], ALMI[Pendarakis], etc.Aims to optimize bandwidth and latency

Content Distribution Network (CDN)Has roots in HTTP accelerator and HTTP proxy.Aims to optimize latency and load-balancing.

Our approachMaximize throughput, even if sacrificing latency

31

Summary and Future Work

We designed a simple algorithmfor copying large data to many nodes in parallelwith fault-tolerance, scalability, self-

organization, and adaptive optimizationEvaluations show our implementation is

effective in real environment

Future WorkIntegration with searching for contents, or

storage systems for distributed computing

1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura,...

Documents

Transcript of 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura,...