Post on 13-Dec-2015
P.KunsztLCGP
13.3.20021
Data Managementon the GRID
Peter Z. KunsztCERN Database Group
EU DataGrid – Data Management
P.KunsztLCGP
13.3.20022
Personal Information
• PhD in theoretical physics (Lattice QCD) at U of Bern
• ‘Builder of the SDSS Project’ – design and implementation work on the SDSS science archive SX both Objectivity and MS SQLServer
• CERN Database Group• Activity task leader for Grid Data Management• Management of WP2 (Data Management) of the
EDG Project
P.KunsztLCGP
13.3.20023
Scope of Data Management
• Data Transfer– Transport protocols
• Data Access – Remote I/O– Security / Policies
• Data Storage– Hierarchical Storage– Mass Storage
• Replication– Peer-to-Peer – Centralized – Distributed– Automatic
• Metadata management– Scalable– Distributed– Consistent
• Persistency– Grid-enabled
databases and data stores
– Independent of back-end implementation
• Optimisation– Data Access
optimisation– Cost minimsation
P.KunsztLCGP
13.3.20024
Vision of Grid Data Management
• Distributed Shared Data Storage• Ubiquitous Data Access• Transparent Data Transfer and Migration• Consistency and Robustness• Optimisation
P.KunsztLCGP
13.3.20025
Vision of Grid Data Management
GRID
Distributed Shared Data Storage– Different architectures– Heterogenous data stores– Self-describing data and metadata
P.KunsztLCGP
13.3.20026
Vision of Grid Data Management
GRID
Ubiquitous Data Access– Global Namespace– Transparent security control and enforcement– Access from anytime anywhere, physical data location irrelevant– Automatic Data Replication and Validation
P.KunsztLCGP
13.3.20027
Vision of Grid Data Management
GRID
Transparent Data Transfer and Migration– Protocol negotiation and multiple protocol support– Management of data formats and database versions
P.KunsztLCGP
13.3.20028
Vision of Grid Data Management
GRID
Consistency and Robustness– Replicated data is reasonably up-to-date– Reliable data transfer– Self-detecting and self-correcting mechanisms upon data corruption
X
P.KunsztLCGP
13.3.20029
Vision of Grid Data Management
GRID
Optimisation– Customisation or self-adaptation to specific access patterns– Distributed Querying, Data Analysis and Data Mining
??
?!
P.KunsztLCGP
13.3.200210
Grid Data Management Dependencies
PerformanceReliabilityAvailabilityUsability
MediaHardware
Operating SystemLocal File SystemNetwork Software
ProtocolsStorage System
P.KunsztLCGP
13.3.200211
Existing Middleware for Grid Data Management -
Overview• Globus
– GridFTP– Replica Catalog– Replica Manager
• EU DataGrid– GDMP– Replica Catalog– Replica Manager– Spitfire
• Condor– NeST
• PPDG– Magda– JASMine– GDMP
• Griphyn/iVDGL– Virtual Data Toolkit
• Storage Resource Broker• Storage Resource
Manager• ROOT
– Alien
• Nimrod-G
Not exhaustive
P.KunsztLCGP
13.3.200212
Globus Data Management
• GridFTP– Fast, parallel file transfer– Towards self-optimising system– Work on reliable file transfer on top
• Replica Catalog – jointly with EDG WP2– Configurable– Distributed, hierarchical– Scalable
• Replica Manager• Security infrastructure
P.KunsztLCGP
13.3.200213
European DataGrid WP2
• GDMP – with PPDG– In production with CMS for
Objectivity replication– Subscription-based replication– Scalable architecture
• Replica Catalog with Globus• Replica Manager and Optimiser
– Take Globus RM as core– Additional modules for pre- postprocessing of data
• Replica Selection in the WP2 Optimisation task– Simulator to test replica selection
• Spitfire– Unified front-end to databases– Suitable for Grid and Application Metadata
Collective ServicesCollective Services
Information &
Monitoring
Information &
Monitoring
Replica Manager
Replica Manager
Grid Scheduler
Grid Scheduler
Local ApplicationLocal Application Local DatabaseLocal Database
Underlying Grid ServicesUnderlying Grid Services
Computing Element Services
Computing Element Services
Authorization Authentication and Accounting
Authorization Authentication and Accounting
Replica Catalog
Replica Catalog
Storage Element Services
Storage Element Services
SQL Database Services
SQL Database Services
Fabric servicesFabric services
ConfigurationManagement
ConfigurationManagement
Node Installation &Management
Node Installation &Management
Monitoringand
Fault Tolerance
Monitoringand
Fault Tolerance
Resource Management
Resource Management
Fabric StorageManagement
Fabric StorageManagement
Grid
Fabric
Local Computing
Grid Grid Application LayerGrid Application Layer
Data Management
Data Management
Job Management
Job Management
Metadata Management
Metadata Management
Object to File Mapping
Object to File Mapping
Service Index
Service Index
Collective ServicesCollective Services
Information &
Monitoring
Information &
Monitoring
Replica Manager
Replica Manager
Grid Scheduler
Grid Scheduler
Local ApplicationLocal Application Local DatabaseLocal Database
Underlying Grid ServicesUnderlying Grid Services
Computing Element Services
Computing Element Services
Authorization Authentication and Accounting
Authorization Authentication and Accounting
Replica Catalog
Replica Catalog
Storage Element Services
Storage Element Services
SQL Database Services
SQL Database Services
Fabric servicesFabric services
ConfigurationManagement
ConfigurationManagement
Node Installation &Management
Node Installation &Management
Monitoringand
Fault Tolerance
Monitoringand
Fault Tolerance
Resource Management
Resource Management
Fabric StorageManagement
Fabric StorageManagement
Grid
Fabric
Local Computing
Grid Grid Application LayerGrid Application Layer
Data Management
Data Management
Job Management
Job Management
Metadata Management
Metadata Management
Object to File Mapping
Object to File Mapping
Service Index
Service Index
P.KunsztLCGP
13.3.200214
WP2 Replica Manager Architecture
Core API
Optimisation API
Replica Catalogue Metadata Catalogue
P.KunsztLCGP
13.3.200215
Condor Data Management
• Condor Matchmaking– Find optimal resource
• Condor Network Storage (NeST)– Generic access to storage – abstract storage
interface– Virtual Protocol Layer– User Management and Reservation
• Chirp– Minimum set of file access
requests– Meta-management requests
• Condor Bypass
P.KunsztLCGP
13.3.200216
PPDG / Griphyn Data Management
• Globus, Condor, SRB• GDMP – with EDG• Magda
– To be used in ATLAS data challenges
– Metadata catalog• JASMine JLAB Asynchronous Storage Manager
– Storage Management and Resource– Replica catalog based on MySQL, as Web
Service– Replication service– File Server
• Griphyn Virtual Data System
Application
Planner
Executor
Catalog Services
Info Services
Policy/Security
Monitoring
Repl. Mgmt.
Reliable TransferService
Compute Resource Storage Resource
DAG
DAG
DAGMAN, Kangaroo
GRAM GridFTP; GRAM; SRM
GSI , CAS
MDS
MCAT; GriPhyN catalogs
GDMP
MDS
Globus
= initial solution is operationalApplication
Planner
Executor
Catalog Services
Info Services
Policy/Security
Monitoring
Repl. Mgmt.
Reliable TransferService
Compute Resource Storage Resource
DAG
DAG
DAGMAN, Kangaroo
GRAM GridFTP; GRAM; SRM
GSI , CAS
MDS
MCAT; GriPhyN catalogs
GDMP
MDS
Globus
DAGMAN, Kangaroo
GRAM GridFTP; GRAM; SRM
GSI , CAS
MDS
MCAT; GriPhyN catalogs
GDMP
MDS
Globus
= initial solution is operational
P.KunsztLCGP
13.3.200217
SRB, SRM
• SDSC Storage Resource Broker– Advanced resource techniques– Replica Catalog based on Oracle, catalog
itself is being replicated using Oracle’s replication mechanism
• Storage Resource Manager (LBNL)– Interfaces to any Storage System– Joint functional definition with EDG, PPDG,
Griphyn
P.KunsztLCGP
13.3.200218
Reference Technologies
• P2P technology– Gnutella– Napster– Freenet– Oceanstore– CHORD– CAN– JXTA Search– Mojo Nation
• Database technology– Replication– Distributed
heterogeneous databases
– Query planning and optimization
• Storage– Unitree– DMF– HPSS– Castor, Enstore,
Eurostore– SAM
• File Systems– AFS, Coda, Intermezzo– NFS– GPFS, CXFS, GFS, DFS,
DAFS– SlashGrid
P.KunsztLCGP
13.3.200219
Application to LCG Project
• Bridge the gap between immediate needs of experiments for production quality grid middleware and existing prototype middleware– Evolve existing grid middleware into
production quality services– LCG Project is a Deployment Grid –
nevertheless we will need to do some development
• Specialization of existing Grid Middlewareto the LHC environment – explicitly to the tiered architecture model
• Very close relations to Application Area Physics Data Management task
Tier2 Center
Online SystemOffl ine Farm
CERN Computer Center
FermilabFrance Regional Center
I taly Regional Center
UK Regional Center
I nstituteI nstituteI nstituteI nstitute
Workstations
~100 MBytes/sec
~100 MBytes/sec
.6 - 2.4 Gbits/sec
100 - 1000 Mbits/secPhysics data cache
~PBytes/sec
~2.4 Gbits/ sec
Tier2 CenterTier2 CenterTier2 Center
Tier 0Tier 0
Tier 1Tier 1
Tier 3Tier 3
Tier2 CenterTier 2Tier 2 Tier2 CenterTier2 Center
Online SystemOffl ine Farm
CERN Computer Center
FermilabFrance Regional Center
I taly Regional Center
UK Regional Center
I nstituteI nstituteI nstituteI nstitute
Workstations
~100 MBytes/sec
~100 MBytes/sec
.6 - 2.4 Gbits/sec
100 - 1000 Mbits/secPhysics data cache
~PBytes/sec
~2.4 Gbits/ sec
Tier2 CenterTier2 CenterTier2 CenterTier2 CenterTier2 CenterTier2 Center
Tier 0Tier 0
Tier 1Tier 1
Tier 3Tier 3
Tier2 CenterTier2 CenterTier 2Tier 2
AFS GDM
P.KunsztLCGP
13.3.200220
Issues / Dangers
• Commonalities – solving the same problems again and again ; potential for duplication of effort+ Think in Virtual Organisations+ RTAGs, like Common Persistency Framework
• Security – i can see what you can’t see+ EDG Security Group – see Dave Kelsey’s talk+ SciDAC+ Building Trust relationships
• Standardisation – bringing it all together and agree, agree, agree+ OGSA+ GGF
• Consensus – too many cooks spoil the broth+ Making decisions in time+ Keeping agreements, sticking to standards+ Avoid Micromanagement