Experiment Requirements for Global Infostructure Irwin Gaines FNAL/DOE.

28
Experiment Requirements for Global Infostructure Irwin Gaines FNAL/DOE

Transcript of Experiment Requirements for Global Infostructure Irwin Gaines FNAL/DOE.

Experiment Requirements for Global Infostructure

Irwin GainesFNAL/DOE

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

Outline

Recall partnership principlesLHC computing modelCMS and ATLAS grid prototypingCategories of work packagesContributors

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

Agreement on 5 principles:The cost and complexity of 21st Century Science requires the creation of advanced and coherent global Infostructure (information infrastructure).The construction of a coherent Global Infostructure for Science requires definition and drivers from Global Applications (that will also communicate with each other) Further, forefront Information Technology must be incorporated into this Global Infostructure for the Applications to reach their full potential for changing the way science is done. LHC is a near term Global Application requiring advanced and un-invented Infostructure and is ahead in planning compared to many others.U.S. agencies must work together for effective U.S. participation on Global scale infostructure, and the successful execution of the LHC program in a 4 way agency partnership, with international cooperation in view.

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

LHC as exemplar of global science

Project already involves scientists (and funding agencies) from all over the worldHigh visibility scienceExperiments already making good use of prototype grids Sociological (as well as technical) reasons for decentralized computing systemsRecognized challenge of accumulating sufficient resources

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

LHC Global ScienceLHC is most exciting, challenging, and relevant scienceChallenges

scientifically, technically, culturally, manageriallyCollaboration

open and fair access and sharing of data, tools, ideasunique opportunities for discovery to small and remote groups

Data and Informationvast data beyond technical capabilities of any single organizationrevolutionary new applications of new tools of information technology

Globalizationbuilding truly global (science) communitiesacquiring data centrally, analyzing data globally, like a large corporation

Opportunities to advance Information TechnologyRelevant to Science at Large

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

Centres taking part in the LCG-1

around the world around the clock

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

LHC Computing ModelDistributed model from the start (distributed resources + coherent global access to data)Must support

Production (reconstruction, simulation)• Scheduled, predictable, batch• Run by experiment or physics group• Highly compute intensive, accesses predictable data sets

Data Analysis (including calibration and monitoring)• Random, chaotic, often interactive• Run by individuals and small groups• Mostly data intensive, accesses random data• Highly collaborative

Code development and testing• Highly interactive• Highly collaborative

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

LHC Computing Facilities Model

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

Adoption of Grids by LHC Experiments

Already some major successes:CMS and ATLAS production runningGood collaborations with computer scientists:

• iVDGL, GriPhyN, PPDG, EDG…

LHC Computing Grid Project (LCG)

We now have a scientific community that understands the components and value of grid computing.

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

Lawrence BerkeleyNational Laboratory

BrookhavenNationalLaboratoryIndiana

University

Boston University

ArgonneNationalLaboratory

U Michigan

University ofTexas atArlington

OklahomaUniversity

US -ATLAS testbed launched February 2001

ATLAS Grid Testbed Sites

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

Fermilab– 1+5 PIII dual 0.700 GHz processor machines

Caltech– 1+3 AMD dual 1.6 GHz processor machines

San Diego– 1+3 PIV single 1.7 GHz processor machines

Florida– 1+5 PIII dual 1 GHz processor machines

Rice– 1+? machines

Wisconsin– 5 PIII single 1 GHz processor machines

Total: ~40 1 GHz dedicated processors

UCSD

Florida

Wisconsin

Caltech

Fermilab

US-CMS Development Grid Testbed

Rice

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

US-CMS Integration Grid Testbed

Fermilab (Tier1)– 40 dual 0.750 GHz processor machines

Caltech (Tier2)– 20 dual 0.800 GHz processor machines– 20 dual 2.4 GHz processor machines

San Diego (Tier2)– 20 dual 0.800 GHz processor machines– 20 dual 2.4 GHz processor machines

Florida (Tier2)– 40 dual 1 GHz processor machines

CERN (LCG Tier0 site)– 36 dual 2.4 GHz processor machines

Total: 240 0.85 GHz processors: Red Hat 6 152 2.4 GHz processors: Red Hat 7

UCSD

Florida

Caltech

Fermilab

CERN

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

“work packages” for LHC computing HW infrastructure Distributed computing infrastructure Grid services Experiment software Collaboration tools Support services

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

Hardware Infrastructure

Tier 0 at CERNCompute elementsStorage elementsMass storage

Tier 1 national regional centersTier 2 regional centersLocal computing resources

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

Distributed computing infrastructure

NetworkingIntercontinentalRegional wide areaLocal “end user” connections

Servers for distributed computingMetadata serversResources brokersMonitoring centers

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

Grid ServicesLow level middleware (casual user doesn’t see this layer)Application specific middleware (service built on top of low level MW with flexible user interfaces and higher level functionality)Modeling and monitoringTrouble shooting and fault toleranceDistributed Data Analysis Environment

Grid hardware forResearch and development of toolsDeployment and integrationProduction

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

CMS Approach to R&D, Integration, Deploymentprototyping, early roll out, strong QC/QA & documentation, tracking of external “practices”

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

Experiment Software

core SW detector specific applications physics analysis support analysis group support

Some of this software suitable for common development

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

Collaboration tools

phone conferencing video conferencing Remote informal interaction (virtual coffee break) Document sharing Collaborative software development Collaborative data analysis Telepresence Remote control of experiment

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

Support Services

Training and documentation Information Services User support (help desk): 24x7

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

Grid Middleware I

1. User Management• 1.1 Registration of users as members

of a virtual organization (VO) (including subgroup credentials within the VO)

• 1.2 Authentication of users• 1.3 Authorization of users for

particular tasks

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

Grid Middleware II2. Resource Management

• 2.1 Resource declaration (making resources available to the Grid)

• 2.2 Resource discovery• 2.3 Resource assignment tools (eg,

these CPUs are only available for experiment A, only for physicists from country B, only for physics group C, etc)

• 2.4 Prioritization tools

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

Grid Middleware III3. Job Management

• 3.1 Job Submission• 3.2 job monitoring

4. Data management• 4.1 Data replication• 4.2 Data access• 4.3 data set management• 4.4 Data movement/job movement/data re-

creation decisions

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

Production Grids middleware support error recovery robustness 24x7 operation monitoring and system usage optimization strategy and policy for resource allocation authentication and authorization simulation of grid operations tools for optimizing distributed systems

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

Nat

ion

al S

cien

ce F

ou

nd

atio

n

Operations insupport of endusers

Developmentor acquisition

Coordination (synergy) Matrix

Research intechnologies,systems, andapplications

Applications of information technology to scienceand engineering research

Cyberinfrastructure in support of applications

Core technologies incorporated intocyberinfrastructure

Blue Ribbon Panel on Cyberinfrastructure

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

ContributorsFunding Agencies: Base ProgramFunding Agencies: LHC Research Program (LHC Software & Computing Projects)US Funding Agencies: networks and infrastructureCERN

Tier 0/1 facilities at CERNNetworking and infrastructureLCG Project

Other collaborating countries funding agenciesDOE/NSF Computatational Science Research Program

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

Who contributes where?

Hardware Infrastructure

Distributed Computing Infrastructure

Grid Services

Experiment Software

Collaboration Tools

Support Services

US Base ProgramUS Research Program

DOE/NSF Networking

CERNOther countries

DOE/NSF CS

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

7-Feb-03023rd NSF/DOE meeting on

partnerships for global inforstructure

Proposal for further action

Form small working group (representatives from experiments, both agencies, physics and CS side) to flesh out workplans and “sign up” for tasks (Road Map to Global Infostructure): report back in < 1 monthMeeting soon in Europe