Distributed Data Access and Resource Management in the D0 SAM System

Distributed Data Access and Resource Management in the D0 SAM System

I. Terekhov, Fermi National Accelerator Laboratory,

for the SAM project:L.Carpenter, L.Lueking, C.Moore, J.Trumbo,

S.Veseli, M.Vranicar, S.White, V.White

Plan of Attack

The domainD0 overview and applications

SAM as a Data Grid MetadataFile replicationInitial resource management

SAM and generic Grid technologiesComprehensive resource management

D0: A Virtual Organization

High Energy Physics (HEP) collider experiment, multi-institutionalCollaboration of 500+ scientists, 72+ institutions, 18+ countriesPhysicists generate, analyze dataCoordinated resource sharing (networks, MSS, etc) for common problem (physics analysis) solving

Applications and Data Intensity

Real data taking from the detectorMonte-Carlo data simulationReconstructionAnalysis

The gist of experimental HEPExtremely I/O intensiveRecurrent processing of datasets: caching highly beneficial

Data Handling as the Core of D0 Meta-Computing

HEP Applications are data-intensive (see below)Computational Economy is extremely data-centric b/c costs are driven by DH resourcesSAM: primarily and historically a DH system: a working Data Grid prototypeJob control inclusion is in the Grid context (the D0-PPDG project)

SAM as a Data Grid

Mass Storage Systems Metadata Resource Management

Data Replication

Replica Selection

Comprehensive Resource Management

Replication Cost Estimation

Generic GridServices

(External to SAM)

Based on: A.Chervenak, I.Foster, C. Kesselman, C.Salisbury, S.Tuecke, The Data Grid: Towards an Architecture for the Distributed Management

And Analysis of Large Scientific Datasets,To appear in Journal of Network and Computer Applications

CoreServices

High LevelServices

Standard Grid Metadata

Application metadatacreation info and processing historydata types (tiers, streams, etc D0-specific)files are self-describing

Replica metadataeach file has zero or more locationsvolume ID’s and location details for RM - part of the interface with Mass Storage System

Standard Grid Metadata, cont’d

System Configuration MetadataHW configuration: locations and capacities of disks and tapes (network and disk bandwidths)resource ownership and allocation: • partition of disk, MSS bandwidths, etc by

group• fair shares parameters for resource

allocation and job scheduling (FSAS)• cost criteria (weight factors) for FSAS

Advanced Metadata

Dataset management (to a great advantage of the user)Job history (crash recovery mechanisms)File replica access history (used by RM)Resource utilization history (persistency in RM and accountability)See our complete data model for more details

Data Replica Management

Processing Station is a (locally distributed, semi-autonomous) collection of HW resources (disk, CPU, etc). A SW componentLocal data replication for parallel processing in a single batch system - within a StationGlobal data replication - worldwide data exchange among Stations and MSS’s

Local Data Replication

consider a cluster, physically distributed disk cachelogical partitioning by research groupseach group executes independent cache replacement algorithm (FIFO, LRU, many flavors)Replica catalog is updated in the course of the cache replacementAccess history of each local replica is maintained persistently in the MD

Local Data Replication, cont’d

While Resource Managers strive to have jobs and their data being in proximity (see below), the Batch System does not always dispatch jobs wherever the data liesStation executes intra-cluster data replication on demand, fully user-transparently

Replica

Site

WAN Data

flow

Station

Mass Storage System

User (producer)

Routing + Caching = Global Replication

Principles of Resource Management

Implement experiment policies on prioritization and fair sharing in resource usage, by user categories (access modes, research group etc)Maximize throughput in terms of real work done (i.e. user jobs and not system internal jobs such as data transfers)

Fair Sharing

Allocation of resources and scheduling of jobsThe goal is to ensure that, in a busy environment, each abstract user gets a fixed share of “resources” or gets a fixed share of “work” done

FS and Computational Economy

Jobs, when executed, incur costs (through resource utilization) and realize benefits (through getting work done)Maintain a tuple (vector) of cumulative costs/benefits for each abstract user and compare them to his allocated fair share to set priority higher/lowerIncorporated all known resource types and benefit metrics, totally flexible

The Hierarchy of Resource Managers

Global RMSitesConnectedby WAN

StationsAnd MSS’sConnectedBy LANs

Batch queuesand disks

Site RM

Station – Local RM

Experiment Policies,

Fair Share Allocations,

Cost Metrics

Job Control: Station Integration with the Abstract Batch System

ClientLocal RM

(Station Master)

Batch SystemProcess Manager

(SAM wrapper script)User Task

Job Manager(Project Master)

Sam submit

submit

dispatch invoke

Sam condition satisfied

resubmit

setJobCount/stop

invoke

jobEnd

SAM as a Data Grid

Mass Storage Systems Metadata Resource Management

Data Replication

Replica Selection

Comprehensive Resource Management

Replication Cost Estimation

(External to SAM)

CoreServices

High LevelServices

Preferredlocations

Caching,Forwarding,Pinning

DH-Batch system integration,Fair Share Allocation,MSS access controlNetwork access control

Cached data,File transfer queues,Site RM weather conditions

Replica catalog,System configuration,Cost/Benefit metrics

Batch System internal RMMSS internal RM(External to SAM)

SAM Grid Work (D0-PPDG)

Enhance the system by adding Grid services (Grid authentication, replica selection, etc)Adapt the system to generic Grid servicesReplace proprietary tools and internal protocols with those standard to the GridCollaborate with Computer Scientists to develop new Grid technologies, use SAM as a testbed for testing/validating them

Initial PPDG Work: Condor/SAM Job Scheduling, Preliminary Architecture

Condor MMS

Condor/SAM-Grid adapter

Abstract Batch System

Condor

SAM/Condor-Grid adapter

SAM

SAM

Sam submit Data and DH Resources

Schedule Jobs Costs of job placements?Standard GridProtocols

CondorG

Job Management

Data Management

Conclusions

D0 SAM is not only a production meta-computing system, but a functioning Data Grid prototype, with data replication and resource management being in advanced/mature stageWork continues to fully Grid-enable the systemSome of our components/services will hopefully be of interest to the Grid community

Distributed Data Access and Resource Management in the D0 SAM System

Documents

Transcript of Distributed Data Access and Resource Management in the D0 SAM System