1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura,...

27
1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo

Transcript of 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura,...

Page 1: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

1

An Adaptive File Distribution Algorithmfor Wide Area Network

Takashi Hoshino, Kenjiro Taura, Takashi ChikayamaUniversity of Tokyo

Page 2: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

2

Background

New environments for parallel and distributed computationClusters, cluster of clusters, GRID

Offer scalability and good cost performanceSetting up computation in such

environments is complex, howeverInstall programs/data

Page 3: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

3

Setting up computation in DS

Often involves copying large programs/data to many nodes

Manually copying large files is troublesome because:faults occur easilyfirewalls block (some) connectionstransfers must be scheduled carefully for good

performance

Page 4: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

4

Contribution

NetSyncA file replicator optimized for copying large data

to many nodes in parallel (application-level)Features

Automatic load-balancingscalability

Self-stabilizing construction of transfer route fault-tolerant

Adaptive optimization of transfer routeNo reliance on physical topology information

Page 5: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

5

Outline

What are efficient/inefficient transfer routes?DemoAlgorithm

Base algorithmAdaptive optimization

ImplementationExperimentsRelated workSummary and future work

Page 6: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

6

Inefficient Transfer Routes

Many inter-subnet/cluster transfer connections

Many branches

Node

Subnet/cluster

Data transfer line

Page 7: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

7

What’s Wrong with Branches?

Branchesshare hardware capability of nodes themselves

CPU powerDisk performanceNIC ability

enlarge possibilities of bottleneck

CPU

NIC

DISK

CPU

NIC

DISK

No bottleneck Bottleneck

One child Three children

100Mbps x1 33Mbps x3

Page 8: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

8

Efficient Transfer Route

Minimum inter-subnet/cluster transfer connections

No or minimum branches

Node

Subnet/cluster

Data transfer line

Page 9: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

9

Demo

Playback of our experiment using logs

A00

A01

B00 A07

A06

A05

A04

A03

A02

B07

B06

B05

B04

B03

B02

B01

CXX NodeData flow (Parent-Child)

A00

A01

B00

A07

A06

A05

A04

A03

A02

B07

B06

B05

B04

B03

B02

B01

Page 10: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

11

Algorithm

Simple base algorithmFault-tolerance, scalability, self-stabilization

Add-on adaptive optimization heuristicsWell-adapted today’s typical network

Very easy configurationOnly need information of (some) neighborsNeed no physical topologyNeed no performance measurement

Pseudo-code is described in our paper

Page 11: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

12

Base Algorithm (1)

Each node seeks a node to be its parentPipeline transfer in whole nodesFault leads to seeking new parent again

100%

0%

0%

0%

0%

25%

25%

50%

25%

25%

50%

75%

50%75%

50%

75%

100%

75%

100% 10

0%

100%

Page 12: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

14

Base Algorithm (2)

Child (has not its parent) sidesend ASK to candidate to be its parentrecv OKstart getting datarecv NGseeks candidate again

Parent (received ask message) siderecv ASK from a node if my offset > node’s offset and # of children < LIMIT_CHILDREN then send OK and start putting data else send NG end

Page 13: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

15

Adaptive Optimization

Two heuristicsNearParentTree2List

Page 14: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

16

NearParent Heuristics

NearParent: reduce "long" connectionsEach node changes its parent to a closer node

parent candidate

self self

candidateparent

Page 15: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

17

Tree2List Heuristics

Tree2List: reduce branchesIf the current parent is not closer than one of its

siblings X, change its parent to XA node which has more than one children

suggests its children to change their parent to one of their siblings

selfX

parent

X

parent

self

Page 16: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

18

How to measure closeness?

FeaturesThroughputLatencyPrefix of IP address

A

BC

Page 17: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

19

Property of Heuristics (1)

Assuming there is no firewall…1. Minimum inter-cluster/subnet connections2. All nodes connect each other as a list

subnet/cluster

Page 18: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

20

Property of Heuristics (2)

If firewall blocks some connections…1. Minimum inter-cluster/subnet connections2. N – 1 branches for N subnets (assume

no firewalls inside a subnet)

subnet/cluster

Firewall

Page 19: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

23

Property of Heuristics (3)

If multiple levels of groups exist (subnets, clusters), it optimizes all levels simultaneouslyMinimum inter-subnet edgesMinimum inter-cluster edges

subnet cluster

Page 20: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

24

Implementation

File replicator for a large data and many nodes written in Java

Ability of detecting latency: about 1ms

Usage:Install and run NetSync in all nodes

Throw a file information to several nodes

Wait for finishing the replication

Very simple usage!!!

Page 21: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

25

Experiments

Measure performance of our heuristics

Distributed a file to many nodesCompared completion time

EnvironmentsA single clusterMultiple clusters

Page 22: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

26

Experiment in a single cluster (1)

Distributed 500MB from one node to other16nodes in the cluster

Only NIC (100Mbps) can be bottleneck

Compared two settingsRandom Tree

Only using base algorithmLimited # of children from 1 to 5.

Tree2ListNearParent has no effect

Page 23: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

27

Experiment in a single cluster (2) Fewer children, better performance Tree2List is very close to optimal Limit 1 is not scalable (using our base algorithm)

Page 24: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

28

Experiment in multiple clusters (1)

Distributed 300MB to over 150 nodes in seven clusters

Heuristics on, off, and fixed manually optimized tree

1G

100M

100M

1G100M100M

1G

100M

100M100M

100M

100M

100M

Page 25: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

29

Experiment in multiple clusters (2)

Our heuristics is close to the ideal fixed tree

Page 26: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

30

Related Work

Application-level MulticastOvercast[Jannotti], ALMI[Pendarakis], etc.Aims to optimize bandwidth and latency

Content Distribution Network (CDN)Has roots in HTTP accelerator and HTTP proxy.Aims to optimize latency and load-balancing.

Our approachMaximize throughput, even if sacrificing latency

Page 27: 1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

31

Summary and Future Work

We designed a simple algorithmfor copying large data to many nodes in parallelwith fault-tolerance, scalability, self-

organization, and adaptive optimizationEvaluations show our implementation is

effective in real environment

Future WorkIntegration with searching for contents, or

storage systems for distributed computing