Common Application Software for the LHC experiments NEC’2007 International Symposium, Varna,...

27
Common Application Software for the LHC experiments NEC’2007 International Symposium, Varna, Bulgaria 10-15 September 2007 Pere Mato, CERN

Transcript of Common Application Software for the LHC experiments NEC’2007 International Symposium, Varna,...

Common Application Software for the LHC experiments

NEC’2007 International Symposium, Varna, Bulgaria

10-15 September 2007

Pere Mato, CERN

Pere Mato, CERN/PH

2

Foreword

“Common software” is the is the software that is used by at least two experiments

– In general, common software would be of a generic nature and non-specific to one experiment

– The borderline between generic and specific is somehow arbitrary

» It depends very much on the willingness of re-using (i.e. trusting) software developed by others and adapting own requirements to fit it

– Sharing software has become a necessity

» HEP experiments cannot afford developing complete specific solutions from scratch

Pere Mato, CERN/PH

3

Outline

Main software requirements Software structure Programming languages Non-HEP packages HEP generic packages Experiment’s software frameworks The LCG Applications Area Summary

Pere Mato, CERN/PH

4

Main Software Requirements

The new software being developed by the LHC experiments must cope with the unprecedented conditions and challenges that characterizes these experiments (trigger rate, data volumes, etc.)The software should not become the limiting factor for the

trigger, detector performance and physics reach for these experiments

In spite of its complexity it should be easy-to-use – Each one of the ~ 4000 LHC physicists (including people from

remote/isolated countries, physicists who have built the detectors, software-old-fashioned senior physicists) should be able to run the software, modify part of it (reconstruction, ...), analyze the data, extract physics results

Pere Mato, CERN/PH

5

individualphysicsanalysis

batchphysicsanalysis

batchphysicsanalysis

detectorEvent Summary

Data (ESD)

rawdata

eventreconstruction

eventreconstruction

eventsimulation

eventsimulation

event filter(selection &

reconstruction)

event filter(selection &

reconstruction)

processeddata

Processing Stages and Datasets

Analysis Object Data (AOD)(extracted by physics topic)

Pere Mato, CERN/PH

6

Software Structure

non-HEP specificsoftware packages

Experiment Framework

EventDet

Desc.Calib.

Applications

Core Libraries

SimulationData

Mngmt.Distrib.Analysis

Every experiment has a framework for basic services and various specialized frameworks: event model, detector description, visualization, persistency, interactivity, simulation, calibrarion, etc.

General purpose non-HEP libraries

Applications are built on top of frameworks and implementing the required algorithms

Core libraries and services that are widely used and provide basic functionality

Specialized domains that are common among the experiments

Pere Mato, CERN/PH

7

Programming Languages

Object-Oriented (O-O) programming languages have become the norm for developing the software for HEP experiments

C++ is in use by (almost) all Experiments– Pioneered by Babar and Run II (D0 and CDF)– LHC experiments with an initial FORTRAN code base have

basically completed the migration to C++ Large common software projects in C++ have been in

production for many years aready– ROOT, Geant4, …

FORTRAN still in use mainly by the MC generators– Large developments efforts are put for the migration to C++

(Pythia8, Herwig++, Sherpa,…)

Pere Mato, CERN/PH

8

Scripting Languages

Scripting has been an essential component in the HEP analysis software for the last decades– PAW macros (kumac) in the FORTRAN era– C++ interpreter (CINT) in the C++ era– Python recently introduced and gaining momentum

Most of the statistical data analysis and final presentation is done with scripts– Interactive analysis– Rapid prototyping to test new ideas– Driving complex procedures

Scripts are also used to “configure” complex C++ programs developed and used by the LHC experiments– “Simulation” and “Reconstruction” programs with hundreds or

thousands of options to configure

Pere Mato, CERN/PH

9

Python Role

Python language is interesting for two main reasons:– High level programming language

» Simple, elegant, easy to learn language» Ideal for rapid prototyping» Used for scientific programming (www.scipy.org)

– Framework to “glue” different functionalities

» Any two pieces of software can be glued at runtime if theyoffer a Pythoninterface

A word of caution:– Python is interpreted:

not for computation

GUI

Python

mathmathshell

GaudiPython

DatabaseEDG APIGUI

Very rich setof Pythonstandardmodules

Several GUItoolkits

XML

Very rich setspecialized genericmodules

GaudiFramework

PyROOT

ROOTClasses

PVSS

JPE

JavaClasses

LHC modules

Pere Mato, CERN/PH

10

Non-HEP Packages widely used in HEP Non-HEP specific functionality required by HEP programs can be

implemented using existing packages– Favoring free and open-source software– About 30 packages are currently in use by the LHC experiments

Here are some examples– Boost

» Portable and free C++ source libraries intended to be widely useful and usable across a broad spectrum of applications

– GSL» GNU Scientific Library

– Coin3D» High-level 3D graphics toolkit for developing

cross-platform real-time 3D visualization– XercesC

» XML parser written in a portable subset of C++non-HEP specific

software packages

Experiment Framework

Applications

Core Libraries

SimulationData

Mngmt.Distrib.Analysis

Pere Mato, CERN/PH

11

HEP Generic Packages (1)

Core Libraries– Library of basic types (e.g. 3-vector, 4-vector, points, particle,

etc.)– Extensions to C++ Standard Library– Mathematical libraries– Statistical libraries

Utility Libraries– Operating system isolation libraries– Component model and plugin management– Database interfaces– C++ Reflexion

Examples: ROOT, CLHEP, etc.non-HEP specific

software packages

Experiment Framework

Applications

Core Libraries

SimulationData

Mngmt.Distrib.Analysis

Pere Mato, CERN/PH

12

HEP Generic Packages (2)

MC Generators– This is the best example of common code used by all the

experiments» Well defined functionality and fairly simple interfaces

Detector Simulation– Presented in form of toolkits/frameworks (Geant4, FLUKA)

» The user needs to input the geometry description, primary particles, user actions, etc.

Data Persistency and Management– To store and manage the data produced by experiments

Data Visualization– GUI, 2D and 3D graphics

Distributed and Grid Analysis– To support end-users using the distributed computing

resources (PROOF, Ganga,…)non-HEP specific

software packages

Experiment Framework

Applications

Core Libraries

SimulationData

Mngmt.Distrib.Analysis

Pere Mato, CERN/PH

13

ROOT - Core Libraries and Services

ROOT provides the basic functionality needed by any application– Used basically by all HEP experiments

Current ROOT work packages– BASE: Foundation and system classes, documentation and

releases– DICT: Reflexion system, meta classes, CINT and Python

interpreters– I/O: Basic I/O, trees, queries– PROOF: parallel ROOT facility, xrootd– MATH: Mathematical libraries, histogramming, fitting– GUI: Graphical User interfaces and Object editors– GRAPHICS: 2-D and 3-D graphics– GEOM: Geometry system

Pere Mato, CERN/PH

14

ROOT - Core Integrating Elements

The common application software should facilitate the integration of independently developed components to build a coherent application

Dictionaries– Dictionaries provide meta data information (reflection) to allow

introspection and interaction of objects in a generic manner– The ROOT strategy is to evolve to a single reflection system (Reflex)

Scripting languages– Interpreted languages are ideal for rapid prototyping– They allow integration of independently developed software modules

(software bus)– Standardizing on CINT(C++) and Python scripting languages

Component model and Plugin Management– Modeling the application as components with well defined interfaces– Loading the required functionality at runtime

Pere Mato, CERN/PH

15

ROOT I/O

ROOT provides support for object input/output from/to platform independent files– The system is designed to be particularly efficient for objects

frequently manipulated by physicists: histograms, ntuples, trees and events

– I/O is possible for any user class. Non-intrusive, only the class “dictionary” needs to be defined

– Extensive support for “schema evolution”. Class definitions are not immutable over the life-time of the experiment

The ROOT I/O area is still moving after 10 years– Recent additions: Full STL support, data compression, tree I/O

from ASCII, tree indices, etc. All new experiments rely on ROOT I/O to store its data

Pere Mato, CERN/PH

16

Persistency Framework

FILES - based on ROOT I/O– Targeted for complex data structure: event data, analysis data– Management of object relationships: file catalogues– Interface to Grid file catalogs and Grid file access

Relational Databases – Oracle, MySQL, SQLite– Suitable for conditions, calibration, alignment, detector description

data - possibly produced by online systems– Complex use cases and requirements, multiple ‘environments’ –

difficult to be satisfied by a single solution – Isolating applications from the database implementations with a

standardized relational database interface» facilitate the life of the application developers» no change in the application to run in different environments» encode “good practices” once for all

Pere Mato, CERN/PH

17

POOL - Persistency framework The POOL project is delivering a number of “products”

– POOL – Object and references persistency framework– CORAL – Generic database access interface– ORA – Mapping C++ objects into relational database

Oracle

SQLite

MySQL

ROOT I/O

RDBMS

STORAGE MGRCOLLECTIONS

FILE CATALOG

PO

OL

API

USE

R C

OD

E

CO

OL

API

COOLCORAL

– COOL – Detector conditions database

Object storage and references successfully used in large scale production in ATLAS, CMS, LHCb

Need to focus on database access and deployment in Grid– basically starting now

Pere Mato, CERN/PH

18

MC Generators Many MC generators and tools are available to the experiments

provided by a solid community– Each experiment chooses the tools more adequate for their physics

Example: ATLAS alone uses currently– Generators

» AcerMC: Zbb~, tt~, single top, tt~bb~, Wbb~ » Alpgen (+ MLM matching): W+jets, Z+jets, QCD multijets » Charbydis: black holes » HERWIG: QCD multijets, Drell-Yan, SUSY... » Hijing: Heavy Ions, Beam-gas.. » MC@NLO: tt~, Drell-Yan, boson pair production » Pythia: QCD multijets, B-physics, Higgs production...

– Decay packages» TAUOLA: Interfaced to work with Pythia, Herwig and Sherpa, » PHOTOS: Interfaced to work with Pythia, Herwig and Sherpa, » EvtGen: Used in B-physics channels.

Pere Mato, CERN/PH

19

Detector Simulation - Geant4

Geant4 has become an established tool, in production for the majority of LHC experiments during the past two years, and in use in many other HEP experiments and for applications in medical, space and other fields

On going work in the physics validation Good example of common software

LHCb : ~ 18 million volumes ALICE : ~3 million volumes

Pere Mato, CERN/PH

20

PROOF – Parallel ROOT Facility

PROOF aims to provide the necessary functionality that allows to run ROOT data analysis in parallel– A major upgrade of the PROOF system has been started in

2005. – The system is evolving from processing interactive short blocking queries to a system that also supports long running queries in a stateless client mode.

– Currently working with ALICE to deploy it on the CERN Analysis Facility (CAF)

Pere Mato, CERN/PH

21

Experiment Data Processing Frameworks

Experiments develop Software Frameworks– General Architecture of any Event processing applications

(simulation, trigger, reconstruction, analysis, etc.)– To achieve coherency and to facilitate software re-use– Hide technical details to the end-user Physicists– Help the Physicists to focus on their physics algorithms

Applications are developed by customizing the Framework– By the “composition” of elemental Algorithms

to form complete applications– Using third-party components wherever

possible and configuring them ALICE: AliROOT; ATLAS+LHCb: Athena/Gaudi CMS: moved to a new framework 2 years ago

non-HEP specificsoftware packages

Experiment Framework

Applications

Core Libraries

SimulationData

Mngmt.Distrib.Analysis

Pere Mato, CERN/PH

22

Example: The GAUDI Framework

User “algorithms” consume event data from the “transient data store” with the help of “services” and “tools” with well defined interfaces and produce new data that is made available to other “algorithms”.

Data can have various representations and “converters” take care of theirtransformation

The GAUDI framework is used by LHCb, ATLAS, Harp, Glast, BES III

Converter

Algorithm

Event DataService

PersistencyService

DataFiles

AlgorithmAlgorithm

TransientEvent Store

Detec. DataService

PersistencyService

DataFiles

TransientDetector

Store

MessageService

JobOptionsService

Particle Prop.Service

OtherServices

HistogramService

PersistencyService

DataFiles

TransientHistogram

Store

ApplicationManager

ConverterConverterEventSelector

Pere Mato, CERN/PH

23

Software Configuration Re-using existing software packages

saves on development effort but complicates “software configuration” We need to hide this complexity

A configuration is a combination of packages and versions that are coherent and compatible

E.g. LHC experiments build their application software based on a given “LCG/AA configuration”, which is decided by the “architects”– Interfaces to the experiments

configuration systems (SCRAM, CMT)– Concurrent different configurations are

everyday situation

Pere Mato, CERN/PH

24

LCG Applications Area

The Applications Area is one of the six activity areas of the LHC Computing Project (LCG) that should deliver the common physics applications software for the LHC experiments

The area is organized to ensure focus on real experiment needs– Experiment-driven requirements and monitoring– Architects in management and execution– Open information flow and decision making – Participation of experiment developers– Frequent releases enabling iterative feedback

Success is defined by adoption and validation of the developed products by the experiments– Integration, evaluation, successful deployment

Pere Mato, CERN/PH

25

Applications Area Organization

AA Manager

Alice Atlas CMS LHCb

Architects Forum

Application Area Meeting

MB LHCC

External Collaborations Geant4ROOT

Work plansQuarterly

Reports

ReviewsResources

LCG AA Projects

EGEE

Chairs

Decisions

SIMULATIONSPI

WP1

WP2

ROOT

WP3

WP1

POOL

WP1WP2

WP2

WP1 Subproject1

Pere Mato, CERN/PH

26

AA Projects

SPI – Software process infrastructure (S. Roiser)– Software and development services: external libraries,

savannah, software distribution, support for build, test, QA, etc.

ROOT – Core Libraries and Services (R. Brun)– Foundation class libraries, math libraries, framework services,

dictionaries, scripting, GUI, graphics, SEAL libraries, etc. POOL – Persistency Framework (D. Duellmann)

– Storage manager, file catalogs, event collections, relational access layer, conditions database, etc.

SIMU - Simulation project (G. Cosmo)– Simulation framework, physics validation studies, MC event

generators, Garfield, participation in Geant4, Fluka.

Pere Mato, CERN/PH

27

Summary The next generation of software for experiments needs to cope

with more stringent requirements and new challenging conditions– The software should not be the limiting factor and should allow the

physicists extract the best physics from the experiment– The new software is more powerful but at the same time more complex

Some techniques and tools allow us to integrate functionality developed independently into a single and coherent application– Dictionaries, scripting languages, component models and plugin

management Substantial effort is put in software configuration to provide stable

and coherent set of software versions of the packages needed by the experiments

The tendency is to push the line of what is called common software upwards– LCG project is helping in this direction by organizing the requirements

gathering, the development and the adoption by the experiments of the common software products