John Bent Computer Sciences Department University of Wisconsin-Madison [email protected] ...
-
Upload
sabrina-hodge -
Category
Documents
-
view
218 -
download
0
Transcript of John Bent Computer Sciences Department University of Wisconsin-Madison [email protected] ...
John BentComputer Sciences DepartmentUniversity of Wisconsin-Madison
[email protected]://www.cs.wisc.edu/condor
Explicit Control in a Batch-aware Distributed
File System
www.cs.wisc.edu/condor
Focus of work
› Harnessing, managing remote storage
› Batch-pipelined I/O intensive workloads
› Scientific workloads
› Wide-area grid computing
www.cs.wisc.edu/condor
Batch-pipelined workloads
› General properties Large number of processes Process and data dependencies I/O intensive
› Different types of I/O Endpoint Batch Pipeline
www.cs.wisc.edu/condor
Batch-pipelined workloads
Endpoint
Endpoint
EndpointBatch
dataset
Batch dataset
Pipeline
Pip
elin
e
Endpoint Endpoint
EndpointEndpointEndpointEndpoint
Pipeline Pipeline
Pipeline Pipeline Pipeline
PipelinePipeline
www.cs.wisc.edu/condor
Wide-area grid computing
Home storage
Internet
www.cs.wisc.edu/condor
Cluster-to-cluster (c2c)› Not quite p2p
More organized Less hostile More homogeneity Correlated failures
› Each cluster is autonomous Run and managed by different entities
› An obvious bottleneck is wide-area
InternetHomestore
How to manage flow of data into, within and out of these clusters?
www.cs.wisc.edu/condor
Current approaches› Remote I/O
Condor standard universe Very easy Consistency through serialization
› Prestaging Condor vanilla universe Manually intensive Good performance through knowledge
› Distributed file systems (AFS, NFS) Easy to use, uniform name space Impractical in this environment
www.cs.wisc.edu/condor
Pros and cons
PracticalEasy
to use
Leverages workload
info
Remote I/O √ √ X
Pre-staging √ X √ Trad. DFS X √ X
www.cs.wisc.edu/condor
BAD-FS› Solution: Batch-Aware Distributed File System› Leverages workload info with storage control
Detail information about workload is known Storage layer allows external control External scheduler makes informed storage decisions
› Combining information and control results in Improved performance More robust failure handling Simplified implementation
PracticalEasy
to use
Leverages workload
info
BAD-FS √ √ √
www.cs.wisc.edu/condor
› User-level; requires no privilege › Packaged as a modified Condor system
› A Condor system which includes BAD-FS› General; glide-in works everywhere
Practical and deployable
Internet
SGE SGE
SGE SGE SGE
SGE SGE
SGEBAD-
FSBAD-
FSBAD-
FSBAD-
FSBAD-
FSBAD-
FSBAD-
FSBAD-
FS
Homestore
www.cs.wisc.edu/condor
BAD-FS == Condor ++
CondorDAGMan
Compute node
Condorstartd
Compute node
Condorstartd
Compute node
CondorStartd
Compute node
Condorstartd
Job queue
1 2
3 4Home storage
Job queue
3) Expanded Condor submit language
CondorDAGMan
++
4) BAD-FS scheduler
1) NeST storage management
2) Batch-Aware Distributed File System
NeSTNeSTNeSTNeST BAD-FS BAD-FS BAD-FS
www.cs.wisc.edu/condor
BAD-FS knowledge
› Remote cluster knowledge Storage availability Failure rates
› Workload knowledge Data type (batch, pipeline, or endpoint) Data quantity Job dependencies
www.cs.wisc.edu/condor
Control through lots› Abstraction that allows external storage control› Guaranteed storage allocations
Containers for job I/O e.g. “I need 2 GB of space for at least 24 hours”
› Scheduler Creates lots to cache input data
• Subsequent jobs can reuse this data Creates lots to buffer output data
• Destroys pipeline, copies endpoint Configures workload to access lots
www.cs.wisc.edu/condor
Knowledge plus control
› Enhanced performance I/O scoping Capacity-aware scheduling
› Improved failure handling Cost-benefit replication
› Simplified implementation No cache consistency protocol
www.cs.wisc.edu/condor
I/O scoping› Technique to minimize wide-area traffic
› Allocate lots to cache batch data
› Allocate lots for pipeline and endpoint
› Extract endpoint
› CleanupAMANDA:200 MB pipeline500 MB batch 5 MB endpoint
BAD-FSScheduler
Compute node Compute node
InternetSteady-state:Only 5 of 705 MB traverse wide-area.
www.cs.wisc.edu/condor
Capacity-aware scheduling
› Technique to avoid over-allocations
› Scheduler has knowledge of Storage availability Storage usage within the workload
› Scheduler runs as many jobs as fit
› Avoids wasted utilizations
› Improves job throughput
www.cs.wisc.edu/condor
Improved failure handling› Scheduler understands data semantics
Data is not just a collection of bytes Losing data is not catastrophic
• Output can be regenerated by rerunning jobs
› Cost-benefit replication Replicates only data whose replication cost
is cheaper than cost to rerun the job
› Can improve throughput in lossy environment
www.cs.wisc.edu/condor
Simplified implementation
› Data dependencies known
› Scheduler ensures proper ordering
› Build a distributed file system With cooperative caching But without a cache consistency
protocol
www.cs.wisc.edu/condor
Real workloads› AMANDA
Astrophysics study of cosmic events such as gamma-ray bursts
› BLAST Biology search for proteins within a genome
› CMS Physics simulation of large particle colliders
› HF Chemistry study of non-relativistic interactions between atomic
nuclei and electrons
› IBIS Ecology global-scale simulation of earth’s climate used to
study effects of human activity (e.g. global warming)
www.cs.wisc.edu/condor
Real workload experience› Setup
16 jobs 16 compute nodes Emulated wide-area
› Configuration Remote I/O AFS-like with /tmp BAD-FS
› Result is order of magnitude improvement
www.cs.wisc.edu/condor
BAD Conclusions› Schedulers can obtain workload
knowledge› Schedulers need storage control
Caching Consistency Replication
› Combining this control with knowledge Enhanced performance Improved failure handling Simplified implementation
www.cs.wisc.edu/condor
For more information
› http://www.cs.wisc.edu/condor/publications.html
› Questions?
““Pipeline and Batch Sharing in Grid Workloads,” Pipeline and Batch Sharing in Grid Workloads,” Douglas Thain, John Bent, Andrea Arpaci-Dusseau, Douglas Thain, John Bent, Andrea Arpaci-Dusseau, Remzi Arpaci-Dussea, Miron Livny. HPDC 12, 2003.Remzi Arpaci-Dussea, Miron Livny. HPDC 12, 2003.
““Explicit Control in a Batch-Aware Distributed File System,”Explicit Control in a Batch-Aware Distributed File System,” John John Bent, Douglas Thain, Andrea Arpaci-Dusseau, Bent, Douglas Thain, Andrea Arpaci-Dusseau, Remzi Arpaci-Dussea, Miron Livny. NSDI ‘04, 2004.Remzi Arpaci-Dussea, Miron Livny. NSDI ‘04, 2004.