Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences...

47
Douglas Thain, John Bent drea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Li Computer Sciences Department, UW-Madison Gathering at the Well: Creating Communities for Grid I/O

Transcript of Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences...

Page 1: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

Douglas Thain, John BentAndrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny

Computer Sciences Department, UW-Madison

Gathering at the Well:

Creating Communities

for Grid I/O

Page 2: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

New framework needed› Remote I/O is possible anywhere

› Build notion of locality into system?

› What are possibilities? Move job to data Move data to job Allow job to access data remotely

› Need framework to expose these policies

Page 3: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Key elements

› Storage appliance, interposition agents, schedulers and match-makers

› Mechanism not policies

› Policies are exposed to an upper layer

We will however demonstrate the strength of this mechanism

Page 4: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

To infinity and beyond

› Speedups of 2.5x possible when we are able to use locality intelligently

› This will continue to be important Data sets are getting larger and

larger There will always be bottlenecks

Page 5: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Outline

› Motivation

› Components

› Expressing locality

› Experiments

› Conclusion

Page 6: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

I/O communities

› Mechanism which allow either jobs to move to data, or data to move to jobs, or data to be accessed remotely

› Framework to evaluate these policies

Page 7: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Grocers, butchers, cops

› Members of an I/O community Storage appliances Interposition agents Scheduling systems Discovery systems Match-makers Collection of CPU’s

Page 8: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Storage appliances

› Should run without special privilege Flexible and easily deployable Acceptable to nervous sys admins

› Should allow multiple access modes Low latency local accesses High bandwidth remote puts and gets

Page 9: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

NeST

Common protocol layer

GFTP Chirp HTTP FTP

Dispatcher

StorageManager

Physical storage layer

Multiple concurrencies

TransferManager

Control flowData flow

Page 10: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Interposition agents

› Thin software layer interposed between application and OS

› Allow applications to transparently interact with storage appliances

› Unmodified programs can run in grid environment

Page 11: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

PFS: Pluggable File System

Page 12: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Scheduling systems and discovery

› Top level scheduler needs ability to discover diverse resources

› CPU discovery Where can a job run?

› Device discovery Where is my local storage appliance?

› Replica discovery Where can I find my data?

Page 13: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Match-making

› Match-making is the glue which brings discovery systems together

› Allows participants to indirectly identify each other i.e. can locate resources without

explicitly naming them

Page 14: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Condor and ClassAds

Page 15: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Outline

› Motivation

› Components

› Expressing locality

› Experiments

› Conclusion

Page 16: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

I/O Communities

UW INFN

Page 17: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Two I/O communities

› INFN Condor pool 236 machines, about 30 available at

any one time Wide range of machines and networks

spread across Italy Storage appliance in Bologna

• 750 MIPS, 378 MB RAM

Page 18: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Two I/O communities

› UW Condor pool ~900 machines, 100 dedicated for us Each is 600 MIPS, 512 MB RAM Networked on 100 Mb/s switch One was used as a storage appliance

Page 19: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Who Am I This Time?

› We assumed the role of an Italian scientist

› Database stored in Bologna

› Need to run 300 instances of simulator

Page 20: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Hmmm…

Page 21: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Three way matching

MachineMachine NeSTJob

JobAd

MachineAd

StorageAd

matc

h

Refers toNearestStorage.

Knows whereNearestStorage is.

Page 22: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Two way ClassAdsType = “job”TargetType = “machine”Cmd = “sim.exe”Owner = “thain”Requirements = (OpSys==“linux”)

Job ClassAd

Type = “machine”TargetType = “job”OpSys = “linux”Requirements = (Owner==“thain”)

Machine ClassAd

Page 23: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Three way ClassAdsType = “job”TargetType = “machine”Cmd = “sim.exe”Owner = “thain”Requirements = (OpSys==“linux”) && NearestStorage.HasCMSData

Job ClassAd

Type = “machine”TargetType = “job”OpSys = “linux”Requirements = (Owner==“thain”)NearestStorage = ( Name = “turkey”) && (Type==“Storage”)

Machine ClassAd

Type = “storage”Name = “turkey.cs.wisc.edu”HasCMSData = trueCMSDataPath = /cmsdata”

Storage ClassAd

Page 24: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Outline

› Motivation

› Components

› Expressing locality

› Experiments

› Conclusion

Page 25: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

BOOM!

Page 26: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

CMS simulator sample run

› Purposefully choose a run with high I/O to CPU ratio

› Accesses about 20 MB of data from a 300 MB database

› Writes about 1 MB of output

› ~160 seconds execution time on a 600 MIPS machine with local disk

Page 27: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Policy specification› Run only with locality

Requirements = (NearestStorage.HasCMSData)

› Run in only one particular community Requirements = (NearestStorage.Name == “nestore.bologna”)

› Prefer home community first Requirements = (NearestStorage.HasCMSData) Rank = (NearestStorage.Name == “nestore.bologna” ) ? 10 : 0

› Arbitrarily complex Requirements = ( NearestStorage.Name ==

“nestore.bologna”) || ( ClockHour < 7 ) || ( ClockHour > 18 )

Page 28: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Policies evaluated› INFN local

› UW remote

› UW stage first

› UW local (pre-staged)

› INFN local, UW remote

› INFN local, UW stage

› INFN local, UW local

Page 29: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Completion Time

Page 30: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

CPU Efficiency

Page 31: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Conclusions

› I/O communities expose locality policies

› Users can increase throughput

› Owners can maximize resource utilization

Page 32: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Future work› Automation

Configuration of communities Dynamically adjust size as load

dictates

› Automation Selection of movement policy

› Automation

Page 33: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

For more info› Condor

http://www.cs.wisc.edu/condor

› ClassAds http://www.cs.wisc.edu/condor/classad

› PFS http://www.cs.wisc.edu/condor/pfs

› NeST http://www.nestproject.org

Page 34: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Local only

Page 35: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Remote only

Page 36: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Both local and remote

Page 37: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

I/O communities are an old idea, right?

› File servers and administrative domains

› No, not really. We need more flexible boundaries simple mechanism by which users can

express I/O community relationships hooks into system that allow users to use

locality

Page 38: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Grid applications have demanding I/O needs

› Petabytes of data in tape repositories

› Scheduling systems have demonstrated that there are idle CPUs

› Some systems move jobs to data move data to jobs allow job remote access to data

› No one approach is always “best”

Page 39: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Easy come, easy go

› In a computation grid, resources are very dynamic

› Programs need rich methods for finding and claiming resources CPU discovery Device discovery Replica discovery

Page 40: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Bringing it all together

CPU Discovery System

Replica Discovery System

Device Discovery System

Job Agent

Execution siteStorage appliance

Distributed Repository

Short-haul I/O

Long-haul I/O

Page 41: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Conclusions

› Locality is good

› Balance point between staging data and accessing it remotely is not static depends on specific attributes of the job

• data size, expected degree of re-reference, etc depends on performance metric

• CPU efficiency or job completion time

Page 42: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Implementation

› NeST storage appliance

› Pluggable File System (PFS) interposition agent built with Bypass

› Condor and ClassAds scheduling system discovery system match-maker

Page 43: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Jim Gast and Bart say ...

› Too many bullet slides

› Contributions scientist doesn’t want to name bec

• resources are dynamic and• name is irrelevant

hooks into system to allow users to express and take advantage of locality

Page 44: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Jim Gast and Bart say ...

everyone knows locality is good - but there is no way to express this and run jobs on the grid

I/O communities are mechanism by which user can use locality and specify policies to optimize job performance

Page 45: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

4 earth-shattering revelations

1) The grid is big.2) Scientific data-sets are large.3) Idle resources are available.4) Locality is good.

Page 46: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Mechanisms not policies

› I/O communities are a mechanism not a policy

› A higher layer is expected to choose application appropriate policies

› We will however demonstrate the strength of the mechanism by defining appropriate policies for one particular application

Page 47: Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

www.cs.wisc.edu/condor

Experimental results

› Implementation

› Environment

› Application

› Measurements

› Evaluation