Decentralizing Grids

102
Decentralizing Grids Jon Weissman University of Minnesota E-Science Institute Nov. 8 2007

description

Decentralizing Grids. Jon Weissman University of Minnesota E-Science Institute Nov. 8 2007. Roadmap. Background The problem space Some early solutions Research frontier/opportunities Wrapup. Background. Grids are distributed … but also centralized - PowerPoint PPT Presentation

Transcript of Decentralizing Grids

Page 1: Decentralizing Grids

Decentralizing Grids

Jon WeissmanUniversity of Minnesota

E-Science InstituteNov. 8 2007

Page 2: Decentralizing Grids

Roadmap

• Background• The problem space• Some early solutions• Research frontier/opportunities• Wrapup

Page 3: Decentralizing Grids

Background• Grids are distributed … but also centralized

– Condor, Globus, BOINC, Grid Services, VOs– Why? client-server based

• Centralization pros– Security, policy, global resource management

• Decentralization pros– Reliability, dynamic, flexible, scalable– **Fertile CS research frontier**

Page 4: Decentralizing Grids

Challenges

• May have to live within the Grid ecosystem– Condor, Globus, Grid services, VOs, etc.– First principle approaches are risky (Legion)

• 50K foot view– How to decentralize Grids yet retain their existing

features? – High performance, workflows, performance

prediction, etc.

Page 5: Decentralizing Grids

Decentralized Grid platform

• Minimal assumptions about each “node”• Nodes have associated “assets” (A)

– basic: CPU, memory, disk, etc.– complex: application services– exposed interface to assets: OS, Condor, BOINC, Web service

• Nodes may up or down• Node trust is not a given (do X, does Y instead)• Nodes may connect to other nodes or not• Nodes may be aggregates• Grid may be large > 100K nodes, scalability is key

Page 6: Decentralizing Grids

Grid Overlay

Grid service

Raw – OS services

Condor network

BOINC network

Page 7: Decentralizing Grids

Grid Overlay - Join

Grid service

Raw – OS services

Condor network

BOINC network

Page 8: Decentralizing Grids

Grid Overlay - Departure

Grid service

Raw – OS services

Condor network

BOINC network

Page 9: Decentralizing Grids

Routing = Discovery

Query contains sufficient information to locate a node: RSL, ClassAd, etcExact match or semantic match

discover A

Page 10: Decentralizing Grids

Routing = Discovery

bingo!

Page 11: Decentralizing Grids

Routing = Discovery

Discovered node returns a handle sufficient for the “client” to interact with it- perform service invocation, job/data transmission, etc

Page 12: Decentralizing Grids

Routing = Discovery

• Three parties– initiator of discovery events for A– client: invocation, health of A– node offering A

• Often initiator and client will be the same

• Other times client will be determined dynamically– if W is a web service and results are returned to a calling

client, want to locate CW near W =>

– discover W, then CW !

Page 13: Decentralizing Grids

Routing = Discovery

discover A

X

Page 14: Decentralizing Grids

Routing = Discovery

Page 15: Decentralizing Grids

Routing = Discovery

bingo!

Page 16: Decentralizing Grids

Routing = Discovery

Page 17: Decentralizing Grids

Routing = Discovery

outsideclient

Page 18: Decentralizing Grids

Routing = Discovery

discover A’s

Page 19: Decentralizing Grids

Routing = Discovery

Page 20: Decentralizing Grids

Grid Overlay• This generalizes …

– Resource query (query contains job requirements)– Looks like decentralized “matchmaking”

• These are the easy cases …– independent simple queries

• find a CPU with characteristics x, y, z• find 100 CPUs each with x, y, z

– suppose queries are complex or related?• find N CPUs with aggregate power = G Gflops• locate an asset near a prior discovered asset

Page 21: Decentralizing Grids

Grid Scenarios• Grid applications are more challenging

– Application has a more complex structure – multi-task, parallel/distributed, control/data dependencies

• individual job/task needs a resource near a data source• workflow • queries are not independent

– Metrics are collective• not simply raw throughput• makespan• response • QoS

Page 22: Decentralizing Grids

Related Work

• Maryland/Purdue– matchmaking

• Oregon-CCOF– time-zone

CAN

Page 23: Decentralizing Grids

Related Work (cont’d)

None of these approaches address the Grid scenarios (in a decentralized manner)– Complex multi-task data/control dependencies– Collective metrics

Page 24: Decentralizing Grids

50K Ft Research Issues

• Overlay Architecture– structured, unstructured, hybrid– what is the right architecture?

• Decentralized control/data dependencies– how to do it?

• Reliability– how to achieve it?

• Collective metrics– how to achieve them?

Page 25: Decentralizing Grids

Context: Application Model

answer

= component service request job task …

= data source

Page 26: Decentralizing Grids

Context: Application Models

ReliabilityCollective metricsData dependenceControl dependence

Page 27: Decentralizing Grids

Context: Environment

• RIDGE project - ridge.cs.umn.edu– reliable infrastructure for donation grid envs

• Live deployment on PlanetLab – planet-lab.org– 700 nodes spanning 335 sites and 35 countries– emulators and simulators

• Applications – BLAST– Traffic planning – Image comparison

Page 28: Decentralizing Grids

Application Models

ReliabilityCollective metricsData dependenceControl dependence

Page 29: Decentralizing Grids

Reliability ExampleC

D

EG

B

B

G

Page 30: Decentralizing Grids

Reliability ExampleC

D

EG

B

B

CGG CG responsible for G’s health

Page 31: Decentralizing Grids

Reliability ExampleC

D

EG

B

B

G, loc(CG )

CG

Page 32: Decentralizing Grids

Reliability ExampleC

D

EG

B

BG

CGcould also discover G then CG

Page 33: Decentralizing Grids

Reliability ExampleC

D

EG

B

B

CG

X

Page 34: Decentralizing Grids

Reliability ExampleC

D

EG

B

CG

G. …

Page 35: Decentralizing Grids

Reliability ExampleC

D

EG

B

CG

G

Page 36: Decentralizing Grids

Client ReplicationC

D

EG

B

B

G

Page 37: Decentralizing Grids

Client ReplicationC

D

EG

B

BG

CG1

CG2

loc (G), loc (CG1), loc (CG2) propagated

Page 38: Decentralizing Grids

Client ReplicationC

D

EG

B

BG

CG1

CG2

X client “hand-off” depends on nature of G and interaction

Page 39: Decentralizing Grids

Component ReplicationC

D

EG

B

B

G

Page 40: Decentralizing Grids

Component ReplicationC

D

EG

B

CG

G1 G2

Page 41: Decentralizing Grids

Replication Research

• Nodes are unreliable – crash, hacked, churn, malicious, slow, etc.

• How many replicas?– too many – waste of resources– too few – application suffers

Page 42: Decentralizing Grids

System Model

0.9

0.4

0.8

0.8

0.7

0.8

0.8

0.7

0.4

0.3

• Reputation rating ri– degree of node reliability

• Dynamically size the redundancy based on ri

• Nodes are not connected and check-in to a central server

• Note: variable sized groups

Page 43: Decentralizing Grids

Reputation-based Scheduling

• Reputation rating – Techniques for estimating reliability based on

past interactions

• Reputation-based scheduling algorithms– Using reliabilities for allocating work – Relies on a success threshold parameter

Page 44: Decentralizing Grids

Algorithm Space

• How many replicas?– first-, best-fit, random, fixed, …– algorithms compute how many replicas to meet a

success threshold

• How to reach consensus?– M-first (better for timeliness)– Majority (better for byzantine threats)

Page 45: Decentralizing Grids

Experimental Results: correctness

This was a simulation based on byzantine behavior … majority voting

Page 46: Decentralizing Grids

Experimental Results: timeliness

M-first (M=1), best BOINC (BOINC*), conservative (BOINC-) vs. RIDGE

Page 47: Decentralizing Grids

Next steps

• Nodes are decentralized, but not trust management!

• Need a peer-based trust exchange framework– Stanford: Eigentrust project – local exchange until

network converges to a global state

Page 48: Decentralizing Grids

Application Models

ReliabilityCollective metricsData dependenceControl dependence

Page 49: Decentralizing Grids

Collective Metrics

BLAST

• Throughput not always the best metric• Response, completion time, application-centric

– makespan - response

Page 50: Decentralizing Grids

Communication Makespan

• Nodes download data from replicated data nodes– Nodes choose “data servers” independently (decentralized)– Minimize the maximum download time for all worker nodes

(communication makespan)

data download dominates

Page 51: Decentralizing Grids

Data node selection• Several possible factors

– Proximity (RTT)– Network bandwidth– Server capacity

[Download Time vs. RTT - linear]

[Download Time vs. Bandwidth - exp]

Mean Download Time / RTT

0

100

200

300

400

500

600

700

800

900

1000

1 3 5 7 10

Concurrency

Tim

e (m

sec)

flux

tamu

venus

ksu

ubc

wroc

Page 52: Decentralizing Grids

Heuristic Ranking Function

• Query to get candidates, RTT/bw probes• Node i, data server node j

– Cost function = rtti,j * exp(kj /bwi,j), kj load/capacity• Least cost data node selected independently• Three server selection heuristics that use kj

– BW-ONLY: kj = 1– BW-LOAD: kj = n-minute average load (past)– BW-CAND: kj = # of candidate responses in last m

seconds (~ future load)

Page 53: Decentralizing Grids

Performance Comparison

Page 54: Decentralizing Grids

Computational Makespan

compute dominates:BLAST

Page 55: Decentralizing Grids

Computational Makespan

variable-sizedequal-sized*

Page 56: Decentralizing Grids

Next Steps

• Other makespan scenarios• Eliminate probes for bw and RTT -> estimation• Richer collective metrics

– deadlines: user-in-the-loop

Page 57: Decentralizing Grids

Application Models

ReliabilityCollective metricsData dependenceControl dependence

Page 58: Decentralizing Grids

Application Models

ReliabilityCollective metricsData dependenceControl dependence

Page 59: Decentralizing Grids

Data Dependence• Data-dependent component needs access to

one or more data sources – data may be large

discover A

,

Page 60: Decentralizing Grids

Data Dependence (cont’d)

discover A

Where to run it?

,

Page 61: Decentralizing Grids

The Problem

• Where to run a data-dependent component?– determine candidate set – select a candidate

• Unlikely a candidate knows downstream bw from particular data nodes

• Idea: infer bw from neighbor observations w/r to data nodes!

Page 62: Decentralizing Grids

Estimation Technique

• C1 may have had little past interaction with – … but its neighbors may have

• For each neighbor generate a download estimate:– DT: prior download time to from neighbor– RTT: from candidate and neighbor to respectively– DP: average weighted measure of prior download times for

any node to any data source

C1

C2

Page 63: Decentralizing Grids

Estimation Technique (cont’d)• Download Power (DP) characterizes download capability of

a node– DP = average (DT * RTT)– DT not enough (far-away vs. nearby data source)

• Estimation associated with each neighbor ni– ElapsedEst [ni] = α ∙ β ∙ DT

• α : my_RTT/neighbor_RTT (to )• β : neighbor_DP /my_DP• no active probes: historical data, RTT inference

• Combining neighbor estimates– mean, median, min, ….– median worked the best

• Take a min over all candidate estimates

Page 64: Decentralizing Grids

Comparison of Candidate Selection Heuristics

2 4 8 16 320

5

10

15

20

25

30

35

40

45

50

55

Neighbor Size

Mea

n E

laps

e T

ime

(sec

)

Impact of Neighbor Size (Mix, N=8, Trial=50k)

OMNI

RANDOM

PROXIMSELF

NEIGHBOR

SELF usesdirect observations

Page 65: Decentralizing Grids

Take Away

• Next steps– routing to the best candidates

• Locality between a data source and component– scalable, no probing needed– many uses

Page 66: Decentralizing Grids

Application Models

ReliabilityCollective metricsData dependenceControl dependence

Page 67: Decentralizing Grids

The Problem

• How to enable decentralized control?– propagate downstream graph stages– perform distributed synchronization

• Idea: – distributed dataflow – token matching – graph forwarding, futures (Mentat project)

Page 68: Decentralizing Grids

Control ExampleC

D

EG

B

B control nodetoken matching

Page 69: Decentralizing Grids

Simple Example

C

D

EG

B

Page 70: Decentralizing Grids

Control Example

C

D

EG

B

{E, B*C*D}

{E, B*C*D}

{E, B*C*D}

{C, G}

{D, G}

Page 71: Decentralizing Grids

Control ExampleC

D

EG

CD

B

{E, B*C*D}

{E, B*C*D}

B

{E, B*C*D}

Page 72: Decentralizing Grids

Control ExampleC

D

EG

CD

B

{E, B*C*D, loc(SC) } {E, B*C*D,

loc(SD) }

{E, B*C*D, loc(SB) }

B

output stored at loc(…) – where component is run, or client, or a storage node

Page 73: Decentralizing Grids

Control ExampleC

D

EG

CD

B

B B

Page 74: Decentralizing Grids

Control ExampleC

D

EG

CD

B

B B

Page 75: Decentralizing Grids

Control ExampleC

D

EG

CD

B

B B

Page 76: Decentralizing Grids

Control ExampleC

D

EG

CD

B

BE

B

Page 77: Decentralizing Grids

Control ExampleC

D

EG

CD

B

B

E

B

Page 78: Decentralizing Grids

Control ExampleC

D

EG

CD

B

B

E

B

How to color and route tokens so that they arrive to the same control node?

Page 79: Decentralizing Grids

Open Problems

• Support for Global Operations– troubleshooting – what happened?– monitoring – application progress?– cleanup – application died, cleanup state

• Load balance across different applications– routing to guarantee dispersion

Page 80: Decentralizing Grids

Summary

• Decentralizing Grids is a challenging problem• Re-think systems, algorithms, protocols, and

middleware => fertile research• Keep our “eye on the ball”

– reliability, scalability, and maintaining performance

• Some preliminary progress on “point solutions”

Page 81: Decentralizing Grids

My visit

• Looking to apply some of these ideas to existing UK projects via collaboration

• Current and potential projects– Decentralized dataflow: (Adam Barker)– Decentralized applications: Haplotype analysis (Andrea

Christoforou, Mike Baker)– Decentralized control: openKnowledge (Dave Robertson)

• Goal – improve reliability and scalability of applications and/or infrastructures

Page 82: Decentralizing Grids

Questions

Page 83: Decentralizing Grids
Page 84: Decentralizing Grids

EXTRAS

Page 85: Decentralizing Grids

Non-stationarity• Nodes may suddenly shift gears

– deliberately malicious, virus, detach/rejoin– underlying reliability distribution changes

• Solution– window-based rating – adapt/learn target

• Experiment: blackout at round 300 (30% effected)

Page 86: Decentralizing Grids

Adapting …

Page 87: Decentralizing Grids

Adaptive Algorithmsuccess ratethroughput

Page 88: Decentralizing Grids

success ratethroughput

Page 89: Decentralizing Grids

Scheduling Algorithms

Page 90: Decentralizing Grids

Estimation Accuracy

• Download Elapsed Time Ratio (x-axis) is a ratio of estimation to real measured time– ‘1’ means perfect estimation

• Accept if the estimation is within a range measured ± (measured * error)– Accept with error=0.33: 67% of the total are accepted– Accept with error=0.50: 83% of the total are accepted

• Objects: 27 (.5 MB – 2MB)• Nodes: 130 on PlanetLab• Download: 15,000 times from a

randomly chosen node

Page 91: Decentralizing Grids

Impact of Churn

• Jinoh – mean over what?

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

2.5

3

3.5

4

4.5

5

5.5

6

6.5

Query

Ratio t

o O

mnis

cie

nt

Comparision of Elapsed Time (Candidate=8, Neighbor=8)

Without Churn

Churn 0.1%

Churn 0.5%

Churn 1.0%

Global(Prox) mean

Random mean

Page 92: Decentralizing Grids

Estimating RTT• We use distance = √(RTT+1)• Simple RTT inference technique based on triangle inequality• Triangle Inequality: Latency(a,c) ≤ Latency(a,b) + Latency(b,c) – |Latency(a,b)-Latency(b,c)| ≤ Latency(a,c) ≤

Latency(a,b)+Latency(b,c)• Pick the intersected area as the range, and take the mean

Via Neighbor A

Via Neighbor B

Via Neighbor C

InferenceRTT

Lower bound

Higherbound

Final inferenceIntersected range

Page 93: Decentralizing Grids

RTT Inference Result• More neighbors, greater accuracy• With 5 neighbors, 85% of the total < 16% error

-150 -100 -50 0 50 100 150 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Latency Difference (= |Inferred-Measured|)

F(x

)

Inferred Latency Difference CDF

N=2

N=3

N=5

N=10

Page 94: Decentralizing Grids

Other Constraints

C

D

EA

B

{E, B*C*D}

{E, B*C*D}

{E, B*C*D}

{C, A, dep-CD}

{D, A, dep-CD}

C & D interact and they should be co-allocated, nearby …Tokens in bold should route to same control point so a collective queryfor C & D can be issued

Page 95: Decentralizing Grids

Support for Global Operations

• Troubleshooting – what happened?• Monitoring – application progress?• Cleanup – application died, cleanup state• Solution mechanism: propagate control node

IPs back to origin (=> origin IP piggybacked)• Control nodes and matcher nodes report

progress (or lack thereof via timeouts) to origin• Load balance across different applications

Page 96: Decentralizing Grids

Other Constraints

C

D

EA

B

{E, B*C*D}

{E, B*C*D}

{E, B*C*D}

{C, A}

{D, A}

C & D interact and they should be co-allocated, nearby …

Page 97: Decentralizing Grids

Combining Neighbors’ Estimation

• MEDIAN shows best results – using 3 neighbors 88% of the time error is within 50% (variation in download times is a factor of 10-20)

• 3 neighbors gives greatest bang

0 5 10 15 20 25 300.83

0.84

0.85

0.86

0.87

0.88

0.89

0.9

0.91

Neighbor Size

Acc

epte

d R

ate

Acceptance 50%

RANDOMCLOSEST

MEAN

MEDIAN

RANK

WMEANTRMEAN

Page 98: Decentralizing Grids

Effect of Candidate Size

2 4 8 16 32 ALL0

20

40

60

80

100

120

Candidate Size

Mea

n E

laps

e T

ime

(sec

)Impact of Candidate Size (Mix, N=8, Trial=25k)

OMNI

RANDOM

PROXIMSELF

NEIGHBOR

Page 99: Decentralizing Grids

Performance Comparison

Parameters:Data size: 2MBReplication: 10Candidates: 5

Page 100: Decentralizing Grids

Computation Makespan (cont’d)

• Now bring in reliability … makespan improvement scales well

# components

Page 101: Decentralizing Grids

Token loss

• Between B and matcher; matcher and next stage– matcher must notify CB when token arrives (pass

loc(CB) with B’s token

– destination (E) must notify CB when token arrives (pass loc(CB) with B’s token

CD

BE

B

Page 102: Decentralizing Grids

RTT Inference

• >= 90-95% of Internet paths obey triangle inequality– RTT (a, c) <= RTT (a, b) + RTT (b, c)– RTT (server, c) <= RTT (server, ni) + RTT (ni, c)

– upper- bound– lower-bound: | RTT (server, ni) - RTT (ni, c) |

• iterate over all neighbors to get max L, min U• return mid-point