Distributed Data Access and Resource Management in the D0 SAM System
description
Transcript of Distributed Data Access and Resource Management in the D0 SAM System
Distributed Data Access and Resource Management in the D0 SAM System
I. Terekhov, Fermi National Accelerator Laboratory,
for the SAM project:L.Carpenter, L.Lueking, C.Moore, J.Trumbo,
S.Veseli, M.Vranicar, S.White, V.White
Plan of Attack
The domainD0 overview and applications
SAM as a Data Grid MetadataFile replicationInitial resource management
SAM and generic Grid technologiesComprehensive resource management
D0: A Virtual Organization
High Energy Physics (HEP) collider experiment, multi-institutionalCollaboration of 500+ scientists, 72+ institutions, 18+ countriesPhysicists generate, analyze dataCoordinated resource sharing (networks, MSS, etc) for common problem (physics analysis) solving
Applications and Data Intensity
Real data taking from the detectorMonte-Carlo data simulationReconstructionAnalysis
The gist of experimental HEPExtremely I/O intensiveRecurrent processing of datasets: caching highly beneficial
Data Handling as the Core of D0 Meta-Computing
HEP Applications are data-intensive (see below)Computational Economy is extremely data-centric b/c costs are driven by DH resourcesSAM: primarily and historically a DH system: a working Data Grid prototypeJob control inclusion is in the Grid context (the D0-PPDG project)
SAM as a Data Grid
Mass Storage Systems Metadata Resource Management
Data Replication
Replica Selection
Comprehensive Resource Management
Replication Cost Estimation
Generic GridServices
(External to SAM)
Based on: A.Chervenak, I.Foster, C. Kesselman, C.Salisbury, S.Tuecke, The Data Grid: Towards an Architecture for the Distributed Management
And Analysis of Large Scientific Datasets,To appear in Journal of Network and Computer Applications
CoreServices
High LevelServices
Standard Grid Metadata
Application metadatacreation info and processing historydata types (tiers, streams, etc D0-specific)files are self-describing
Replica metadataeach file has zero or more locationsvolume ID’s and location details for RM - part of the interface with Mass Storage System
Standard Grid Metadata, cont’d
System Configuration MetadataHW configuration: locations and capacities of disks and tapes (network and disk bandwidths)resource ownership and allocation: • partition of disk, MSS bandwidths, etc by
group• fair shares parameters for resource
allocation and job scheduling (FSAS)• cost criteria (weight factors) for FSAS
Advanced Metadata
Dataset management (to a great advantage of the user)Job history (crash recovery mechanisms)File replica access history (used by RM)Resource utilization history (persistency in RM and accountability)See our complete data model for more details
Data Replica Management
Processing Station is a (locally distributed, semi-autonomous) collection of HW resources (disk, CPU, etc). A SW componentLocal data replication for parallel processing in a single batch system - within a StationGlobal data replication - worldwide data exchange among Stations and MSS’s
Local Data Replication
consider a cluster, physically distributed disk cachelogical partitioning by research groupseach group executes independent cache replacement algorithm (FIFO, LRU, many flavors)Replica catalog is updated in the course of the cache replacementAccess history of each local replica is maintained persistently in the MD
Local Data Replication, cont’d
While Resource Managers strive to have jobs and their data being in proximity (see below), the Batch System does not always dispatch jobs wherever the data liesStation executes intra-cluster data replication on demand, fully user-transparently
Replica
Site
WAN Data
flow
Station
Mass Storage System
User (producer)
Routing + Caching = Global Replication
Principles of Resource Management
Implement experiment policies on prioritization and fair sharing in resource usage, by user categories (access modes, research group etc)Maximize throughput in terms of real work done (i.e. user jobs and not system internal jobs such as data transfers)
Fair Sharing
Allocation of resources and scheduling of jobsThe goal is to ensure that, in a busy environment, each abstract user gets a fixed share of “resources” or gets a fixed share of “work” done
FS and Computational Economy
Jobs, when executed, incur costs (through resource utilization) and realize benefits (through getting work done)Maintain a tuple (vector) of cumulative costs/benefits for each abstract user and compare them to his allocated fair share to set priority higher/lowerIncorporated all known resource types and benefit metrics, totally flexible
The Hierarchy of Resource Managers
Global RMSitesConnectedby WAN
StationsAnd MSS’sConnectedBy LANs
Batch queuesand disks
Site RM
Station – Local RM
Experiment Policies,
Fair Share Allocations,
Cost Metrics
Job Control: Station Integration with the Abstract Batch System
ClientLocal RM
(Station Master)
Batch SystemProcess Manager
(SAM wrapper script)User Task
Job Manager(Project Master)
Sam submit
submit
dispatch invoke
Sam condition satisfied
resubmit
setJobCount/stop
invoke
jobEnd
SAM as a Data Grid
Mass Storage Systems Metadata Resource Management
Data Replication
Replica Selection
Comprehensive Resource Management
Replication Cost Estimation
(External to SAM)
CoreServices
High LevelServices
Preferredlocations
Caching,Forwarding,Pinning
DH-Batch system integration,Fair Share Allocation,MSS access controlNetwork access control
Cached data,File transfer queues,Site RM weather conditions
Replica catalog,System configuration,Cost/Benefit metrics
Batch System internal RMMSS internal RM(External to SAM)
SAM Grid Work (D0-PPDG)
Enhance the system by adding Grid services (Grid authentication, replica selection, etc)Adapt the system to generic Grid servicesReplace proprietary tools and internal protocols with those standard to the GridCollaborate with Computer Scientists to develop new Grid technologies, use SAM as a testbed for testing/validating them
Initial PPDG Work: Condor/SAM Job Scheduling, Preliminary Architecture
Condor MMS
Condor/SAM-Grid adapter
Abstract Batch System
Condor
SAM/Condor-Grid adapter
SAM
SAM
Sam submit Data and DH Resources
Schedule Jobs Costs of job placements?Standard GridProtocols
CondorG
Job Management
Data Management
Conclusions
D0 SAM is not only a production meta-computing system, but a functioning Data Grid prototype, with data replication and resource management being in advanced/mature stageWork continues to fully Grid-enable the systemSome of our components/services will hopefully be of interest to the Grid community