lundmon.ppt
-
Upload
telepresence -
Category
Documents
-
view
178 -
download
1
description
Transcript of lundmon.ppt
- 1. MONARC : results and open issues Laura Perini Milano
2. Layout of the talk
- Most material from Irwin Gaines talk at Chep2000
-
- The basic goals and structure of the project
-
- The Regional Centers
-
-
- Motivation
-
-
-
- Characteristics
-
-
-
- Functions
-
-
- Same Results from the simulations
-
- The need for more realistic implementation oriented Models: Phase-3
-
- Relations with GRID
- Status of the project: Phase-3 LOI presented in January, Phase-2 Final Report to be published next week, Milestones and basic goals met
3. MONARC
-
- A joint project (LHC experiments and CERN/IT) to understand issues associated with distributed data access and analysis for the LHC
-
- Examine distributed data plans of current and near future experiments
-
- Determine characteristics and requirements for LHC regional centers
-
- Understand details of analysis process and data access needs for LHC data
-
- Measure critical parameters characterizing distributed architectures, especially database and network issues
-
- Createmodeling and simulation tools
-
- Simulate a variety of models to understand constraints on architectures
4. MONARC
- M odelsO fN etworkedA nalysisAtR egionalC enters
- Caltech, CERN, FNAL, Heidelberg, INFN,
- Helsinki, KEK, Lyon, Marseilles, Munich,Orsay, Oxford, RAL,Tufts, ...
- GOALS
- Specify the main parameters characterizing the Models performance: throughputs, latencies
- Determine classes of Computing Models feasible for LHC (matched to network capacity and data handling resources)
- Develop Baseline Models in the feasible category
- Verify resource requirement baselines:(computing, data handling, networks)
- COROLLARIES:
- Define theAnalysis Process
- DefineRegional CenterArchitectures
- ProvideGuidelines for the final Models
622 Mbits/s 622 Mbits/s Desk tops CERN n.10 7MIPS m Pbyte Robot University n.10 6 MIPS m Tbyte Robot FNAL 4.10 7MIPS 110 Tbyte Robot 622 Mbits/s N x 622 Mbits/s Optional Air Freight 622Mbits/s 622 Mbits/s Desk tops Desk tops 5. Working Groups
- Architecture WG
-
- Baseline architecture for regional centres, Technology tracking, Survey of computing model of current HENP experiments
- Analysis Model WG
-
- Evaluation of LHC data analysis model and use cases
- Simulation WG
-
- Develop a simulation tool set for performance evaluation of the computing models
- Testbed WG
-
- Evaluate the performance of ODBMS, network in the distributed environment
6. General Need for distributed data access and analysis:
- Potential problems of a single centralized computing center include:
- - scale of LHC experiments: difficulty of accumulating and managing all resources at one location
- - geographic spread of LHC experiments: providing equivalent location independent access to data for physicists
- - help desk, support and consulting in same time zone
- - cost of LHC experiments: optimizing use of resources located world wide
7. Motivations for Regional Centers
- A distributed computing architecture based on regional centers offers:
-
- A way of utilizing the expertise and resources residing in computing centers all over the world
-
- Provide local consulting and support
-
- To maximize the intellectual contribution of physicists all over the world without requiring their physical presence at CERN
-
- Acknowledgement of possible limitations of network bandwidth
-
- Allows people to make choices on how they analyze data based on availability or proximity of various resources such as CPU, data, or network bandwidth .
8. Future Experiment Survey
- Analysis/Results
-
- From the previous survey, we saw many sites contributed to Monte Carlo generation
-
-
- This is now the norm
-
-
- New experiments trying to use the Regional Center concept
-
-
- BaBar has Regional Centers at IN2P3 and RAL, a smaller one in Rome
-
-
-
- STAR has Regional Center at LBL/NERSC
-
-
-
- CDF and D0 offsite institutionspaying more attention as run gets closer.
-
9. Future Experiment Survey
- Other observations/ requirements
-
- In the last survey, we pointed out the following requirements for RCs:
-
-
- 24X7 support
-
-
-
- software development team
-
-
-
- diverse body of users
-
-
-
- good, clear documentation of all s/w and s/w tools
-
-
- The following are requirements for the central site (I.e. CERN)
-
-
- Central code repository easy to use and easily accessible for remote sites
-
-
-
- be sensitive to remote sites in database handling, raw data handling and machine flavors
-
-
-
- provide good, clear documentation of all s/w and s/w tools
-
-
- The experiments in this survey achieving the most in distributed computing are following these guidelines
10.
- Tier0: CERN
- Tier1: NationalRegional Center
- Tier2: RegionalCenter
- Tier3: InstituteWorkgroup Server
- Tier4: IndividualDesktop
- Total 5 Levels
11. 250 Gbps 0.8 Gbps 8 Gbps 1400 boxes 160 clusters 40 sub-farms 12 Gbps* 480 Gbps* 3 Gbps* 1.5 Gbps 100 drives 12 Gbps 5400 disks 340 arrays ... LAN-SAN routers LAN-WAN routers CMS Offline Farm at CERN circa 2006 lmr for Monarc study- april 1999 tapes 0.8 Gbps (daq) 0.8 Gbps 5 Gbps disks processors storage network storage network farm network * assumes all disk & tape traffic on storage network double these numbers if all disk & tape traffic through LAN-SAN router CERN 12. 13. Processor cluster basic box four 100 SI95 processors standard network connection (~2 Gbps) 15% of systems configured as I/O servers (disk server, disk-tape mover, Objy AMS, ..) with additional connection to the storage network cluster 9 basic boxes with a network switch (AOD AOD-->DPD Scheduled Physics groups IndividualAnalysis AOD-->DPD and plots Chaotic Physicists Desktops Tier 2 Localinstitutes CERN Tapes Support Services 16. Data Import Data Export Mass Storage & Disk Servers Database Servers Tapes Networkfrom CERN Network from Tier 2 and simulation centers Physics SoftwareDevelopment R&D Systems and Testbeds Info servers Code servers Web Servers Telepresence Servers Training Consulting Help Desk Production Reconstruction Raw/Sim-->ESD Scheduled, predictable experiment/ physics groups Production Analysis ESD-->AOD AOD-->DPD Scheduled Physics groups IndividualAnalysis AOD-->DPD and plots Chaotic Physicists Desktops Tier 2 Localinstitutes CERN Tapes Data Input Rate from CERN: Raw Data - 5%50TB/yr ESD Data - 50%50TB/yr AOD Data - All10TB/yr Revised ESD -20TB/yrData Input from Tier 2: Revised ESD and AOD - 10TB/yr Data Input from Simulation Centers: Raw Data - 100TB/yr Data Output Rate to CERN: AOD Data -8 TB/yr Recalculated ESD -10 TB/yrSimulation ESD data -10 TB/yr Data Output to Tier 2: Revised ESD and AOD - 15 TB/yr Data Output to local institutes: ESD, AOD, DPD data -20TB/yr Total Storage:Robotic Mass Storage - 300TB Raw Data: 50TB5*10**7 events (5% of 1 year) Raw (Simulated) Data: 100TB 10**8 events EDS (Reconstructed Data): 100TB - 10**9 events (50% of 2 years) AOD (Physics Object) Data: 20TB 2*10**9 events (100% of 2 years) Tag Data: 2TB (all) Calibration/Conditions data base: 10TB (only latest version of most data types kept here) Central Disk Cache - 100TB (per user demand) CPU Required for AMS database servers: ??*10**3 SI95 power 17. Physics SftwareDevelopment R&D Systems and Testbeds Info servers Code servers Web Servers Telepresence Servers Training Consulting Help Desk Data Import Data Export Mass Storage & Disk Servers Database Servers Tapes Networkfrom CERN Network fromTier 2and simulation centers Production Reconstruction Raw/Sim-->ESD Scheduled experiment/ physics groups Production Analysis ESD-->AOD AOD-->DPD Scheduled Physics groups IndividualAnalysis AOD-->DPD and plots Chaotic Physicists Tier 2 Localinstitutes CERN Tapes Farms of low cost commodity computers, limited I/O rate, modest local disk cache ----------------------------------------------------- Reconstruction Jobs:Reprocessing of raw data: 10**8 events/year (10%) Initial processing of simulated data: 10**8/year 1000 SI95-sec/event ==> 10**4 SI95 capacity: 100 processing nodes of 100 SI95 power Event Selection Jobs:10 physics groups * 10**8 events (10%samples) * 3 times/yr based on ESD and latest AOD data 50 SI95/evt ==> 5000 SI95 power Physics Object creation Jobs: 10 physics groups * 10**7 events (1% samples) * 8 times/yr based on selected event sample ESD data 200 SI95/event ==> 5000 SI95 power Derived Physics data creation Jobs: 10 physics groups * 10**7 events * 20 times/yr based on selected AOD samples, generates canonical derived physics data 50 SI95/evt ==> 3000 SI95 power Total 110 nodes of 100 SI95 power Derived Physics data creation Jobs: 200 physicists * 10**7 events * 20 times/yr based on selected AOD and DPD samples 20 SI95/evt ==> 30,000 SI95 power Total 300 nodes of 100 SI95 power Desktops 18. MONARC Analysis Process Example 19. Model and Simulation parameters
- Have a new set of parameters common to all simulating groups.
- More realistic values, but still to be discussed/agreed on the basis of Experiments information.
1000 Proc_time_RAWSI95sec/event(350) 25 Proc_Time_ESD(2.5) 5 Proc_Time_AOD(0.5) 3Analyze_Time_TAG 3Analyze_Time_AOD 15 Analyze_Time_ESD(3) 600 Analyze_Time_RAW(350) 100Memory of JobsMB 5000 Proc_Time_Create_RAWSI95sec/event(35) 1000 Proc_Time_Create_ESD(1) 25 Proc_Time_Create_AOD(1) 20. Base Model used
- Basic Jobs
-
- Reconstruction of 10 7events : RAW--> ESD --> AOD --> TAG at CERN Its the production while the data are coming from the DAQ (100 days of running collecting a billion of events per year)
-
- Analysis of 5 Working Groups each of 25 analyzers on TAG only (no request to higher level data samples).Every analyzer submit 4 sequential jobs on 10 6events. Each analyzer work start-time is a flat random choice in the range of 3000 seconds. Each analyzer data sample of 10 6events is a random choice in the complete data sample of TAG DataBase consisting of 10 7events.
-
- Transfer (FTP) of a 10 7events ESD, AOD and TAG from CERN to RC
- CERN Activities : Reconstruction, 5 WG Analysis, FTP transfer
- RC Activities : 5 (uncorrelated) WG Analysis, receive FTP transfer
- Jobs paper estimate:
-
- Single Analysis Job : 1.67 CPU hours at CERN = 6000 sec at CERN (same at RC)
-
- Reconstruction at CERN for 1/500 RAW to ESD : 3.89 CPU hours = 14000 sec
-
- Reconstruction at CERN for 1/500 ESD to AOD : 0.03 CPU hours = 100 sec
21. Resources: LAN speeds ?!
- In our Models the DB Servers are uncorrelated and thus one activity uses a single Server.The bottlenecks are theread and writespeed to and from the Server. In order to use the CPU power at reasonable percentage we need a read speed of at least300 MB/sand a write speed of100 MB/s(milestone already met today)
- We use100 MB/sin current simulations (10 Gbits/sec switched LANs in 2005 may be possible).
- Processing node link speed is negligible in our simulations.
- Of course the real implementation of the Farms can be different, but the results of the simulation do not depend on real implementation: they are based on usable resources.
Seefollowingslides 22. More realistic values for CERN and RC
- Data Link speeds at100 MB/sec(all values) except :
-
- Node_Link_Speed at 10 MB/sec
-
- WAN Link speeds at 40 MB/sec
- CERN
-
- 1000 Processing nodes each of 500 SI95
- RC
-
- 200 Processing nodes each of 500 SI95
1000 Processing nodes times 500SI95 = 500kSI95about the CPU power of CERN Tier0 disk space as for the number of DBs 100kSI95 processing Power = 20% CERN disk space as for the number of DBs 23. Overall Conclusions
- MONARC simulation tools are:
-
- sophisticated enough to allow modeling of complex distributed analysis scenarios
-
- simple enough to be used by non experts
- Initial modeling runs are alkready showing interesting results
- Future work will help identify bottlenecks and understand constraints on architectures
24. MONARC Phase 3
- More Realistic Computing Model Development
- Confrontation of Models with Realistic Prototypes;
- At Every Stage:Assess Use Cases Based on Actual Simulation, Reconstruction and Physics Analyses;
-
- Participate in the setup of the prototyopes
-
- We will further validate and develop MONARC simulation system using the results of these use cases (positive feedback)
-
- Continue to Review Key Inputs to the Model
-
-
- CPU Times at Various Phases
-
-
-
- Data Rate to Storage
-
-
-
- Tape Storage: Speed and I/O
-
- Employ MONARC simulation and testbeds to study CM variations,and suggest strategy improvements
25. MONARC Phase 3
- Technology Studies
-
- Data Model
-
-
- Data structures
-
-
-
- Reclustering, Restructuring; transport operations
-
-
-
- Replication
-
-
-
- Caching, migration (HMSM), etc.
-
-
- Network
-
-
- QoS Mechanisms: Identify Which are important
-
-
- Distributed System Resource Management and Query Estimators
-
-
- (Queue management and Load Balancing)
-
- Development of MONARC Simulation Visualization Tools for interactive Computing Model analysis
26. Relation to GRID
- The GRID project is great!
-
- Development of s/w toolsneededfor implementing realistic LHC Computing Models
-
-
- farm management, WAN resource and data management, etc.
-
-
- Help in getting funds for real life testbed systems (RC prototypes)
- Complementarity GRID-MONARC hierarchical RC Model
-
- Hierarchy of RC is asafeoption. If GRID will bring big advancements, less hierarchical models should alo become possible
- Timings well matched
-
- MONARC Phase-3 to last~1 year: bridge to GRID project starting early in 2001
-
- Afterwards common work by LHC experiments for developping the computing models will surely be still needed:in which project framework and for how long we will see then...