The High Energy PhysicsCommunity Grid Project
Inside D-Grid
ACAT 07Torsten Harenberg - University of Wuppertal
2/27
D-Grid organisational structure
3/27
technical infrastructure
Nutzer
User API
D-Grid resources
Grid services
Core services
Distributed data services
Distributed data services
D-Grid Services
Communities
Daten/Software
Distributed computing resources
Distributed computing resourcesnetwork
network
Security and VOmanagement
I/O
GAT API
Scheduling undWorkflow Management
Portal (GridSphere based)
UNICORE
Accounting undBilling
Data management
Globus Toolkit V4
LCG/gLiteMonitoring
4/27
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
EDG EGEE
LCG R&D WLCG Ramp-up ...
EGEE 2
HEP CG
Okt.HI run
Mar-Seppp run
today
EGEE 3 ?
GridKa / GGUS
DGI
HEP Grid effords since 2001
DGI 2
D-Grid Initiative
???
???
5/27
LHC Groups in Deutschland
Alice: Darmstadt, Frankfurt, Heidelberg, Münster
ATLAS: Berlin, Bonn, Dortmund, Dresden, Freiburg, Gießen, Heidelberg, Mainz, Mannheim, München, Siegen, Wuppertal
CMS: Aachen, Hamburg, Karlsruhe
LHCb: Heidelberg, Dortmund
6/27
German HEP institutes participating in WLCG
WLCG: Karlsruhe (GridKa & Uni), DESY, GSI, München, Aachen, Wuppertal, Münster, Dortmund, Freiburg
7/27
HEP CG participants:
Participants: Uni Dortmund, TU Dresden, LMU München, Uni Siegen, Uni Wuppertal, DESY (Hamburg & Zeuthen), GSI
Associated partners: Uni Mainz, HU Berlin, MPI f. Physik München, LRZ München, Uni Karlsruhe, MPI Heidelberg, RZ Garching, John von Neumann Institut für Computing, FZ Karlsruhe, Uni Freiburg, Konrad-Zuse-Zentrum Berlin
8/27
HEP Community Grid
WP 1: Data management (dCache)
WP 2: Job Monitoring and user support
WP 3: distributed data analysis (ganga)
==> Joint venture between physics and computer science
9/27
WP 1: Data managementcoordination: Patrick Fuhrmann
An extensible metadata catalogue for semantical data access:
Central service for gauge theory
DESY, Humboldt Uni, NIC, ZIB
A scaleable storage element:
Using dCache on multi-scale installations.
DESY, Uni Dortmund E5, FZK, Uni Freiburg
Optimized job scheduling in data intensive applications:
Data and CPU Co-scheduling
Uni Dortmund CEI & E5
10/27
WP 1: Highlights
Establishing a metadata catalogue for the gauge theory
Production service of a metadata catalogue with > 80.000 documents.
Tools to be used in conjunction with LCG data grid
Well established in international collaboration
http://www-zeuthen.desy.de/latfor/ldg/
Advancements in data management with new functionality
dCache could become quasi standard in WLCG
Good documentation and automatic installation procedure helps to provide useability for small Tier-3 installations up to Tier-1 sites.
High troughput for large data streams, optimization on quality and load of disk storage systems, giving high performant access to tape systems
11/27
dCache based scaleable storage element
dCache project well established
New since HEP CG:
Professional product management, i.e. code versioning, packaging, user support and test suits.
- single host- ~ 10 TeraBytes- Zero Maintenance
- thousands of pools- >> PB Disk Storage- >> 100 File transfers/ sec- < 2 FTEs
dCache.ORG
12/27
dCache: principle
P
Backend Tape Storage
Streaming Data
(gsi)FTPhttp(g)
Posix I/O
xRootdCap
Storage Control
SRMEIS
protocol Engines
dCache Controller
Managed Disk Storage
HS
M A
dapt
er
dCache.ORG
Information Prot.
13/27
dCache: connection to the Grid world
Storage Element
Firewall
IN - SITE
Compute Element
Information System
FTS Channels
gsiFtp
gsiFtp
SRM
Storage ResourceManager Protocol
File Transfer Service
dCap/rfio/root
OUT - SITE
14/27
dCache: achieved goals
Development of the xRoot protocol for distributed analysis
Small sites: automatic installation and configuration (dCache in 10mins)
Large sites (> 1 Petabyte):
Partitioning of large systems.
Transfer optimization from / to tape systems
Automatic file replication (freely configurable)
15/27
dCache: Outlook
Current usage
7 Tier I centres with up to 900 Tbytes on disk (pre center) plus tape system. (Karlsruhe, Lyon, RAL, Amsterdam, FermiLab, Brookhaven, Nordu Grid)
~ 30 Tier II centres, including all US CMS in USA, planned for US ATLAS.
Planned usage
dCache is going to be included in the Virtual Data Toolkit (VDT) of the Open Science Grid: proposed storage element in the USA.
Planned US Tier I will break the 2 PB boundary end of the year.
16/27
HEP Community Grid
WP 1: Data management (dCache)
WP 2: Job Monitoring and user support
WP 3: distributed data analysis (ganga)
==> Joint venture between physics and computer science
17/27
WP 2: job monitoring and user support co-ordination: Peter Mättig (Wuppertal)
Job monitoring- and resource usage visualizer
TU Dresden
Expert system classifying job failures:
Uni Wuppertal, FZK, FH Köln, FH Niederrhein
Job online steering:
Uni Siegen
18/27
Worker NodeJob Monitoring
_ monitoring sensorsJob Execution Monitoring
_ stepwise
User Application(Physics)
Worker NodeJob Monitoring
_ monitoring sensors
User Application(Physics)
Worker NodeJob Monitoring
_ monitoring sensors
User Application(Physics)
Worker NodeJob Monitoring
_ monitoring sensorsJob Execution Monitoring
_ stepwise
User Application(Physics)
Worker NodeJob Monitoring
_ monitoring sensors
User Application(Physics)
Monitoring Box_ R-GMA
User_ Browser_ Visualisation Applet_ Visualisations
_ Interactivity_ Overviews_ Details_ Timelines, Histograms
...
Analysis_ Web-Service_ Interface to
monitoring systemse.g. R-GMA Consumer
R -GMA_
_
_
Portal Server_ GridSphere_ Monitoring Portlet
Job monitoring- and resource usage visualizer
19/27
Integration into GridSphere
20/27
Job Execution Monitor in LCG
submitted
waiting
ready
scheduled
running
What is goingon here ?
done (failed) done (ok)
cleared
cancelled aborted
Motivation
1000s of jobs each day in LCG
Job status unknown while running
Manual error detection: slow and difficult
GridICE, ...: service/hardware based monitoring
Conclusion
Monitor job while running
JEM
Automatical error detection needed
expert system
21/27
gLite/LCGWorkernodePre-execution test
Script monitoring
Information exchange: R-GMA
Visualization: e.g. GridSphere
Bash
Python
Experten system for classification
Integration into ATLAS
Integration into GGUS
post D-Grid I: ... ?
JEM: Job Execution Monitor
22/27
JEM - status
Monitoring part ready for use
Integration into GANGA (ATLAS/LHCb distributed analysis tool) ongoing
Connection to GGUS planned
http://www.grid.uni-wuppertal.de/jem/
23/27
HEP Community Grid
WP 1: Data management (dCache)
WP 2: Job Monitoring and user support
WP 3: distributed data analysis (ganga)
==> Joint venture between physics and computer science
24/27
WP 3: distributed data managementCo-ordination: Peter Malzacher (GSI Darmstadt)
GANGA: distributed analysis @ ATLAS and LHCb
Ganga is an easy-to-use frontend for job definition and management
Python, IPython or GUI interface
Analysis jobs are automatically splitted into subjobs which are sent to multiple sites in the Grid
Data management for in- and output. Distributed output is collected.
Allows simple switching between testing on a local batch system and large-scale data processing on distributed resources (Grid)
Developed in the context of ATLAS and LHCb
Implemented in Python
25/27
GANGA schema
Storage
queues
manager
outputs
catalog
query
submit
files
jobsdata file splitting
myAna.C
mergingfinal analysis
26/27
PROOF schema
catalog Storage
scheduler
query
MASTER
PROOF query:data file list, myAna.C
files
final outputs
(merged)
feedbacks
27/27
DESY, DortmundDresden, Freiburg,
GSI, München,Siegen,
Wuppertal
Dortmund, Dresden, Siegen,Wuppertal, ZIB,
FH Köln,FH Niederrhein
Physics Departments Computer SciencesD-GRID: Germany‘s contribution to HEP computing:
dCache, Monitoring, distributed analysis
Effort will continue,
2008: Start of LHC data taking challenge for GRID Concept
==> new tools and developments needed
HEPCG: summary
Top Related