Open Science Grid.. An introduction

16
1 Open Science Grid.. An introduction Ruth Pordes Fermilab

description

Open Science Grid.. An introduction. Ruth Pordes Fermilab. 1999. 2000. 2001. 2002. 2003. 2004. 2005. 2006. 2007. 2008. 2009. OSG Provenance. iVDGL. (NSF). OSG. GriPhyN. Trillium. Grid3. (NSF). (DOE+NSF). PPDG. (DOE). Introducing myself. - PowerPoint PPT Presentation

Transcript of Open Science Grid.. An introduction

Page 1: Open Science Grid.. An introduction

1

Open Science Grid.. An introduction

Ruth Pordes

Fermilab

Page 2: Open Science Grid.. An introduction

2

OSG Provenance

1999 2000 2001 2002 20052003 2004 2006 2007 2008 2009

PPDG

GriPhyN

iVDGL

Trillium Grid3 OSG

(DOE)

(DOE+NSF)(NSF)

(NSF)

Page 3: Open Science Grid.. An introduction

3

Introducing myself

at Fermilab for 25 years (well and 2 years in the “pioneer” ‘70s),started on data acquisition for High Energy Physics experiments, a “builder” of the Sloan Digital Sky Survey,led development of a common data acquisition system for 6

experiments at Fermilab (DART),coordinator of the CDF/D0 Joint Run II offline projects (with Dane), coordinator of the Particle Physics Data Grid SciDAC I

collaboratory,founder of Trillium collaboration of iVDGL, GridPhyN, PPDG, and

GLUE interoperability between US and EU.

Now I am variously:Executive Director of the Open Science Grid, an Associate Head of the Computing Division at Fermilab, andUS CMS Grid Services and Interfaces Coordinator.

Page 4: Open Science Grid.. An introduction

4

A Common Grid Infrastructure

Page 5: Open Science Grid.. An introduction

5

Overlaid by community computational environments of single to large groups of researchers located locally to

worldwide

Page 6: Open Science Grid.. An introduction

6

Grid of Grids - from Local to Global

Community Campus

National

Page 7: Open Science Grid.. An introduction

7

Current OSG deployment

96 Resources across production & integration infrastructures

20 Virtual Organizations +6 operations

Includes 25% non-physics.

~20,000 CPUs (from 30 to 4000 shared between OSG and local use)

~6 PB Tapes

~4 PB Shared Disk

Jobs Running on OSG over 9 months

Sustaining through OSG submissions:

3,000-4,000 simultaneous jobs .

~10K jobs/day

~50K CPUhours/day.

Peak short validation jobs ~15K

Using production & research networks

Page 8: Open Science Grid.. An introduction

8

Examples of Sharing

Site Max # Jobs

ASGC_OSG 9

BU_ATLAS_Tier2 154

CIT_CMS_T2 99

FIU-PG 58

FNAL_GPFARM 17

OSG_LIGO_PSU 1

OU_OCHEP_SWT2 82

Purdue-ITaP 3

UC_ATLAS_MWT2 88

UFlorida-IHEPA 1

UFlorida-PG (CMS) 1

UMATLAS

UWMadisonCMS 594

UWMilwaukee 2

osg-gw-2.t2.ucsd.edu 2

CPUHours 55,000

VO Max # Jobs

ATLAS 2

CDF 279

CMS 559 COMPBIOGRID 10 GADU 1

LIGO 75

Average # of Jobs (~300 batch slots)

253

CPUHours 30,000

#Jobs Completed 50,000

last week at UCSD -- CMS Site

last week of ATLAS

Page 9: Open Science Grid.. An introduction

9

OSG Consortium

ContributorsContributors

ProjectProject

Page 10: Open Science Grid.. An introduction

10

OSG Project

Page 11: Open Science Grid.. An introduction

11

OSG & its goalsProject receiving ~$6/M/Year for 5 years from DOE and NSF for effort to sustain

and evolve the distributed facility, bring on board new communities and capabilities and EOT. Hardware resources contributed by OSG Consortium members.

Goals:Support data storage, distribution & computation for High Energy, Nuclear & Astro Physics

collaborations, in particular delivering to the needs of LHC and LIGO science.

Engage and benefit other Research & Science of all scales through progressively supporting their applications.

Educate & train students, administrators & educators.

Provide a petascale Distributed Facility across the US with guaranteed & opportunistic access to shared compute & storage resources.

Interface, Federate and Collaborate with Campus, Regional, other national & international Grids, in particular with EGEE & TeraGrid.

Provide an Integrated, Robust Software Stack for Facility & Applications, tested on a well provisioned at-scale validation facility.

Evolve the capabilities by deploying externally developed new technologies through joint projects with the development groups.

Page 12: Open Science Grid.. An introduction

12

Middleware Stack and DeploymentOSG Middleware is deployed on existing farms and storage systems.

OSG Middleware interfaces to the existing installations of OS, utilities and batch systems.

VOs have VO scoped environments in which they deploy applications (and other files), execute code and store data.

VOs are responsible for and have control over their end-to-end distributed system using the OSG infrastructure.

End-to-end s/w Stack

Deployment into Production

Integration Grid has ~15 sites

Page 13: Open Science Grid.. An introduction

13

OSG will support Global Data Transfer, Storage & Access at GBytes/sec 365 days a year e.g.

CMS

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Data To / From Tape at Tier-1

Need to triple in ~1 year.

Data To / From Tape at Tier-1

Need to triple in ~1 year.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Data to Disk Caches - Data SamplesData to Disk Caches - Data Samples

200MB/sec200MB/sec600MB/sec600MB/sec

Tier-2 sites data distributed toTier-2 sites data distributed to~7 Tier-1s, CERN + Tier-2s~7 Tier-1s, CERN + Tier-2s

OSG must enable data placement, disk usage, resource managament policies, of 10s Gbit/Sec data movement, 10s Petabyte tape stores, local shared disk caches of 100sTBs across 10s of sites for >10 VOs.

Data distribution will depend on & integrate to advanced network infrastructures: Internet 2 will provide "layer 2” connectivity between OSG University Sites and peers in

Europe. ESNET will provide "layer 2" connectivity between OSG DOE Laboratory sites and EU

GEANT network. Both include the use of the IRNC link (NSF) from the US to Amsterdam

Page 14: Open Science Grid.. An introduction

14

Security Infrastructure• Identity: X509 Certificates. Authentication and Authorization using

VOMS extended attribute certficates.• Security Process modelled on NIST procedural controls -

management, operational, technical, starting from an inventory of the OSG assets.

• User and VO Management: VO Registers with Operations Center User registers through VOMRS or VO administrator Site Registers with the Operations Center Each VO centrally defines and assigns roles Each Site provides role to access mappings based on VO/VOGroup. Can reject individuals.

• Heterogeneous identity management systems – OSG vs TeraGrid/EGEE , grid vs. local, compute vs. storage, head-node vs. , old-version vs. new-version. Issues include: Cross domain right management Right/identity management of software modules and resources Error/rejection propagation Solutions/approaches that work end-to-end

Page 15: Open Science Grid.. An introduction

15

Education, Outreach, TrainingTraining Workshops -

for Administrators and Application Developers e.g. Grid Summer Workshop (in 4th year)

Outreach - e.g. Science Grid This Week

-> International Science Grid This Week

Education through e-Labs

Page 16: Open Science Grid.. An introduction

16

Integrated Network Management

OSG Initial Timeline & Milestones - Summary

LHC Simulations Support 1000 Users; 20PB Data Archive

Contribute to Worldwide LHC Computing Grid LHC Event Data Distribution and Analysis

Contribute to LIGO Workflow and Data Analysis

+1 Community

Additional Science Communities +1 Community

+1 Community

+1 Community

Facility Security : Risk Assessment, Audits, Incident Response, Management, Operations, Technical Controls

Plan V1 1st Audit Risk Assessment

Audit Risk Assessment

Audit Risk Assessment

Audit Risk Assessment

VDT and OSG Software Releases: Major Release every 6 months; Minor Updates as needed VDT 1.4.0VDT 1.4.1VDT 1.4.2 … … … …

Advanced LIGO LIGO Data Grid dependent on OSG

CDF Simulation

STAR, CDF, D0, Astrophysics

D0 Reprocessing

STAR Data Distribution and Jobs 10KJobs per Day

D0 SimulationsCDF Simulation and Analysis

LIGO data run SC5

Facility Operations and Metrics: Increase robustness and scale; Operational Metrics defined and validated each year.

Interoperate and Federate with Campus and Regional Grids

2006 2007 2008 2009 2010 2011

Project start End of Phase I End of Phase II

VDT Incremental

Updates

dCache with role based

authorization

OSG 0.6.0OSG 0.8.0 OSG 1.0 OSG 2.0 OSG 3.0 …

Accounting Auditing

VDS with SRMCommon S/w Distribution

with TeraGridEGEE using VDT 1.4.X

Transparent data and job movement with TeraGridTransparent data management with

EGEE

Federated monitoring and information services

Data Analysis (batch and interactive) Workflow

Extended Capabilities & Increase Scalability and Performance for Jobs and Data to meet Stakeholder needsSRM/dCache Extensions

“Just in Time” Workload Management

VO Services Infrastructure

Improved Workflow and Resource Selection

Work with SciDAC-2 CEDS and Security with Open Science

+1 Community

2006 2007 2008 2009 2010 2011

+1 Community

+1 Community

+1 Community

+1 Community