The Promise of Computational Grids in the LHC Era
-
Upload
amena-villarreal -
Category
Documents
-
view
24 -
download
0
description
Transcript of The Promise of Computational Grids in the LHC Era
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 1
The Promise of Computational Grids in the LHC Era
Paul AveryUniversity of Florida
Gainesville, Florida, USA
[email protected]://www.phys.ufl.edu/~avery/
CHEP 2000Padova, Italy
Feb. 7-11, 2000
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 2
Example: CMS1800 Physicists150 Institutes32 Countries
LHC Computing ChallengesLHC Computing ChallengesComplexity of LHC environment and resulting dataScale: Petabytes of data per yearGeographical distribution of people and resources
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 3
Dimensioning / Deploying IT ResourcesDimensioning / Deploying IT ResourcesLHC computing scale is “something new”Solution requires directed effort, new initiativesSolution must build on existing foundations
Robust computing at national centers essentialUniversities must have resources to maintain intellectual
strength, foster training, engage fresh minds
Scarce resources are/will be a fact of life plan for itGoal: get new resources, optimize deployment of all
resources to maximize effectivenessCPU: CERN / national lab / region / institution /
desktopData: CERN / national lab / region / institution /
desktopNetworks: International / national / regional / local
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 4
Deployment ConsiderationsDeployment ConsiderationsProximity of datasets to appropriate IT resources
Massive CERN & national labsData caches Regional centersMini-summary InstitutionalMicro-summary Desktop
Efficient use of network bandwidthLocal > regional > national > international
Utilizing all intellectual resourcesCERN, national labs, universities, remote sitesScientists, students
Leverage training, education at universitiesFollow lead of commercial world
Distributed data, web servers
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 5
Hierarchical grid best deployment optionHierarchy Optimal resource layout (MONARC
studies)Grid Unified system
Arrangement of resourcesTier 0 Central laboratory computing resources
(CERN)Tier 1 National center (Fermilab / BNL)Tier 2 Regional computing center (university)Tier 3 University group computing resourcesTier 4 Individual workstation/CPU
We call this arrangement a “Data Grid” to reflect the overwhelming role that data plays in deployment
Solution: A Data GridSolution: A Data Grid
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 6
Layout of ResourcesLayout of ResourcesWant good “impedance match” between Tiers
TierN-1 serves TierN
TierN big enough to exert influence on TierN-1
TierN-1 small enough to not duplicate TierN
Resources roughly balanced across Tiers
Tier 0 Tier 15 6
Tier 1 Tier 2
Tier 2
Tier 3?
i
i
Tier 1 Tier 0
Tier 2 Tier 1
Reasonable balance?
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 7
Data Grid Hierarchy (Schematic)Data Grid Hierarchy (Schematic)
Tier 1
T2
T2
T2
T2
T2
3
3
3
3
33
3
3
3
3
3
3
Tier 0 (CERN)
44 4 4
33
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 862
2 M
bit
s/s 622 M
bits/s
CERN (CMS/ATLAS)350k Si95
350 Tbytes Disk; Robot
Tier 2 Center20k Si95
25 Tbytes Disk, Robot
Tier 1: FNAL/BNL70k Si95
70 Tbytes Disk; Robot
2.4 Gbps
N
622
Mb
its/
s
622Mbits/s
2.4 GbpsTier 3
Univ WG1
Tier 3Univ WG
M
US Model Circa 2005US Model Circa 2005
Tier 3Univ WG
2
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 9
InstituteInstitute
Data Grid Hierarchy (CMS)Data Grid Hierarchy (CMS)
Online System
Offline Farm~20 TIPS
CERN Computer Center
Fermilab~4 TIPS
France Regional Center
Italy Regional Center
Germany Regional Center
Workstations
~100 MBytes/sec
~100 MBytes/sec
~2.4 Gbits/sec
1-10 Gbits/sec
Bunch crossing per 25 nsecs.
100 triggers per second
Event is ~1 MByte in size
Physicists work on analysis “channels”.
Each institute has ~10 physicists workingon one or more channels
Data for these channels is cached by the institute server
Physics data cache
~PBytes/sec
~622 Mbits/sec
~622 Mbits/sec
TierTier 0 0
Tier 1Tier 1
Tier 3Tier 3
Tier 4Tier 4
1 TIPS = 25,000 SpecInt95
PC (today) = 10-20 SpecInt95
Tier 2Tier 2 Tier2 Center ~1 TIPS
Tier2 Center ~1 TIPS
Tier2 Center ~1 TIPS
Tier2 Center ~1 TIPS
Tier2 Center ~1 TIPS
InstituteInstitute ~0.25TIPS
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 10
Why a Data Grid: PhysicalWhy a Data Grid: PhysicalUnified system: all computing resources part of grid
Efficient resource use (manage scarcity)Averages out spikes in usageResource discovery / scheduling / coordination truly possible“The whole is greater than the sum of its parts”
Optimal data distribution and proximity Labs are close to the data they needUsers are close to the data they needNo data or network bottlenecks
Scalable growth
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 11
Why a Data Grid: PoliticalWhy a Data Grid: PoliticalCentral lab cannot manage / help 1000s of users
Easier to leverage resources, maintain control, assert priorities regionally
Cleanly separates functionalityDifferent resource types in different TiersFunding complementarity (NSF vs DOE)Targeted initiatives
New IT resources can be added “naturally”Additional matching resources at Tier 2 universitiesLarger institutes can join, bringing their own resourcesTap into new resources opened by IT “revolution”
Broaden community of scientists and studentsTraining and educationVitality of field depends on University / Lab partnership
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 12
Tier 2 Regional CentersTier 2 Regional CentersPossible Model : CERN:National:Tier 2 1/3 : 1/3 : 1/3Complementary role to Tier 1 lab-based centers
Less need for 24 7 operation lower component costsLess production-oriented respond to analysis prioritiesFlexible organization, i.e. by physics goals, subdetectorsVariable fraction of resources available to outside users
Range of activities includesReconstruction, simulation, physics analysesData caches / mirrors to support analysesProduction in support of parent Tier 1Grid R&D ...
Tier 0
Tier 1
Tier 2
Tier 3
Tier 4Mo
re O
rgan
izat
ion M
ore F
lexibility
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 13
Distribution of Tier 2 CentersDistribution of Tier 2 CentersTier 2 centers arranged regionally in US model
Good networking connections to move data (caches)Location independence of users always maintained
Increases collaborative possibilitiesEmphasis on training, involvement of students
High quality desktop environment for remote collaboration, e.g., next generation VRVS system
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 14
Strawman Tier 2 ArchitectureLinux Farm of 128 Nodes $ 0.30 MSun Data Server with RAID Array $ 0.10 MTape Library $ 0.04 M LAN Switch $ 0.06 MCollaborative Infrastructure $ 0.05 M Installation and Infrastructure $ 0.05 MNet Connect to Abilene network $ 0.14 MTape Media and Consumables $ 0.04 MStaff (Ops and System Support) $ 0.20 M*Total Estimated Cost (First Year) $ 0.98 M
Cost in Succeeding Years, for evolution, $ 0.68 Mupgrade and ops:
* 1.5 – 2 FTE support required per Tier 2. Physicists from institute also aid in support.
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 15
Strawman Tier 2 EvolutionStrawman Tier 2 Evolution
2000 2005Linux Farm: 1,500 SI95 20,000 SI95*Disks on CPUs4 TB 20 TBRAID Array 1 TB 20 TBTape Library 1 TB 50 - 100 TBLAN Speed 0.1 - 1 Gbps 10 - 100 GbpsWAN Speed 155 - 622 Mbps 2.5 - 10 GbpsCollaborative MPEG2 VGA Realtime HDTV
Infrastructure (1.5 - 3 Mbps) (10 - 20 Mbps)
RAID disk used for “higher availability” data
* Reflects lower Tier 2 component costs due to less demanding usage, e.g. simulation.
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 16
The GriPhyN ProjectThe GriPhyN ProjectJoint project involving
US-CMS, US-ATLASLIGO Gravity wave experimentSDSS Sloan Digital Sky Surveyhttp://www.phys.ufl.edu/~avery/mre/
Requesting funds from NSF to build world’s first production-scale grid(s)
Sub-implementations for each experimentNSF pays for Tier 2 centers, some R&D, some networking
Realization of unified Grid system requires researchMany common problems for different implementationsRequires partnership with CS professionals
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 17
R & D Foundations IR & D Foundations IGlobus (Grid middleware)
Grid-wide services Security
Condor (see M. Livny paper)General language for service seekers / service providersResource discoveryResource scheduling, coordination, (co)allocation
GIOD (Networked object databases)Nile (Fault-tolerant distributed computing)
Java-based toolkit, running on CLEO
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 18
R & D Foundations IIR & D Foundations IIMONARC
Construct and validate architectures Identify important design parametersSimulate extremely complex, dynamic system
PPDG (Particle Physics Data Grid)DOE / NGI funded for 1 yearTestbed systemsLater program of work incorporated into GriPhyN
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 19
The NSF ITR InitiativeThe NSF ITR Initiative Information Technology Research Program
Aimed at funding innovative research in IT$90M in funds authorizedMax of $12.5M for a single proposal (5 years)Requires extensive student support
GriPhyN submitted preproposal Dec. 30, 1999 Intend that ITR fund most of our Grid research programMajor costs for people, esp. students / postdocsMinimal equipmentSome networking
Full proposal due April 17, 2000
CHEP 2000 (Feb. 7-11)
Paul Avery (Data Grids in the LHC Era) 20
Summary of Data Grids and the LHCSummary of Data Grids and the LHCDevelop integrated distributed system, while meeting
LHC goalsATLAS/CMS: production, data handling oriented (LIGO/SDSS: computation, “commodity component” oriented)
Build, test the regional center hierarchyTier 2 / Tier 1 partnershipCommission and test software, data handling systems, and
data analysis strategies
Build, test the enabling collaborative infrastructureFocal points for student-faculty interaction in each regionRealtime high-res video as part of collaborative environment
Involve students at universities in building the data analysis, and in the physics discoveries at the LHC