Data and Computation for Physics Analysis

25
CERN 1999 Summer Student Lectures Computing at CERN Lecture 2 — Looking at Data Tony Cass — [email protected]

description

1999 Summer Student Lectures Computing at CERN Lecture 2 — Looking at Data Tony Cass — [email protected]. Data and Computation for Physics Analysis. event filter (selection & reconstruction). detector. processed data. event summary data. raw data. batch physics - PowerPoint PPT Presentation

Transcript of Data and Computation for Physics Analysis

Page 1: Data and Computation for Physics Analysis

CERN

1999 Summer Student Lectures

Computing at CERN

Lecture 2 — Looking at Data

Tony Cass — [email protected]

Page 2: Data and Computation for Physics Analysis

CERN

2Tony Cass

Data and Computation for Physics Analysis

batchphysicsanalysis

batchphysicsanalysis

detector

event summary data

rawdata

eventreconstruction

eventreconstruction

eventsimulation

eventsimulation

interactivephysicsanalysis

analysis objects(extracted by physics topic)

event filter(selection &

reconstruction)

event filter(selection &

reconstruction)

processeddata

Page 3: Data and Computation for Physics Analysis

CERN

3Tony Cass

Central Data Recording CDR marks the boundary between the experiment and

the central computing facilities. It is a loose boundary which depends on an

experiment’s approach to data collection and analysis. CDR developments are also affected by

– network developments, and

– event complexity.detector

rawdata

event filter(selection &

reconstruction)

event filter(selection &

reconstruction)

Page 4: Data and Computation for Physics Analysis

CERN

4Tony Cass

Monte Carlo Simulation From a physics standpoint, simulation is needed to study

– detector response

– signal vs. background

– sensitivity to physics parameter variations.

From a computing standpoint, simulation– is CPU intensive, but

– has low I/O requirements.

Simulation farms are therefore good testbedsfor new technology:– CSF for Unix and now PCSF for PCs and Windows/NT.

eventsimulation

eventsimulation

Page 5: Data and Computation for Physics Analysis

CERN

5Tony Cass

Data Reconstruction The event reconstruction stage turns detector information into

physics information about events. This involves– complex processing

» i.e. lots of CPU capacity

– reading all raw data» i.e lots of input, possibly read

from tape

– writing processed events» i.e. lots of output which

must be written topermanent storage.

event summary data

rawdata

eventreconstruction

eventreconstruction

Page 6: Data and Computation for Physics Analysis

CERN

6Tony Cass

Batch Physics Analysis

Physics analysis teams scan over all events to find those that are interesting to them.– Potentially enormous input

» at least data from current year.

– CPU requirements are high.

– Output is “small”» O(102)MB

– but there are many different teams andthe output must be stored for future studies

» large disk pools needed.

batchphysicsanalysis

batchphysicsanalysis

event summary data

analysis objects(extracted by physics topic)

Page 7: Data and Computation for Physics Analysis

CERN

7Tony Cass

Symmetric MultiProcessor Model

Experiment

TapeStorage TeraBytes of disks

Page 8: Data and Computation for Physics Analysis

CERN

8Tony Cass

Scalable model—SP2/CS2

Experiment

TapeStorage TeraBytes of disks

Page 9: Data and Computation for Physics Analysis

CERN

9Tony Cass

Distributed Computing Model

Experiment

TapeStorage

Disk Server

CPU Server

Switch

Page 10: Data and Computation for Physics Analysis

CERN

10Tony Cass

Today’s CORE Computing Systems

CERN Network

Home directories& registry

Central Data Services

Shared Disk Servers

CORE Physics Services

CERN

32 IBM, DEC,SUN servers

Simulation Facility

46 H-PPA-RISC

46 H-PPA-RISC

SHIFTData intensive services

70 computers, 250 processors(DEC, H-P, IBM, SGI, SUN)

8 TeraBytes embedded disk

70 computers, 250 processors(DEC, H-P, IBM, SGI, SUN)

8 TeraBytes embedded disk

2 TeraByte disk10 SGI, DEC, IBM servers

2 TeraByte disk10 SGI, DEC, IBM servers

3 tape robots100 tape drivesRedwood, DLT, Sony D1IBM 3590, 3490, 3480EXABYTE, DAT

3 tape robots100 tape drivesRedwood, DLT, Sony D1IBM 3590, 3490, 3480EXABYTE, DAT

Shared Tape Servers

CS-2 Service -Data Recording& Event Filter

Farm

komei

RSBATCH + PaRCPublic BatchService

15-node IBM SP236 PowerPC 604

15-node IBM SP236 PowerPC 604

consoles&

monitors

CSF - RISC servers

DECPLUS, HPPLUS,RSPLUS, WGS

InteractiveServices

66 systems (HP, SUN, IBM, DEC)66 systems (HP, SUN, IBM, DEC)

NAP - accelerator simulation service

NAP - accelerator simulation service

10-CPU DEC 840010 DEC workstations

10-CPU DEC 840010 DEC workstations

PCSF - PCs & NT

20 PentiumPro50 Pentium II

20 PentiumPro50 Pentium II

QSW CS-264-nodes (128-processors)

2 TeraBytes disk

QSW CS-264-nodes (128-processors)

2 TeraBytes disk

SUN & DECServers

SUN & DECServers

199

8!

Page 11: Data and Computation for Physics Analysis

CERN

11Tony Cass

Today’s CORE Computing Systems

PaRCEngineeringCluster

PaRCEngineeringCluster

CERN Network

Homedirectories& registry

Central Data Services

Shared Disk Servers

CORE Physics Services

CERN

32 IBM, DEC,SUN servers

SHIFTData intensive services

200 computers, 550 processors(DEC, H-P, IBM, SGI, SUN, PC)25 TeraBytes embedded disk

200 computers, 550 processors(DEC, H-P, IBM, SGI, SUN, PC)25 TeraBytes embedded disk

2 TeraByte disk10 SGI, DEC, IBM servers

2 TeraByte disk10 SGI, DEC, IBM servers

4 tape robots90 tape drivesRedwood, 9840 DLT,IBM 3590, 3490, 3480EXABYTE, DAT, Sony D1

4 tape robots90 tape drivesRedwood, 9840 DLT,IBM 3590, 3490, 3480EXABYTE, DAT, Sony D1

Shared Tape Servers

Data Recording, Event Filter and

CPU Farmsfor

NA45, NA48, COMPASS

consoles&

monitors

DXPLUS, HPPLUS,RSPLUS,LXPLUS, WGS

InteractiveServices

70 systems (HP, SUN, IBM, DEC, Linux)70 systems (HP, SUN, IBM, DEC, Linux)

RSBATCH Public BatchService

32 PowerPC 60432 PowerPC 604

NAP - accelerator simulation service

NAP - accelerator simulation service

10-CPU DEC 840010 DEC workstations

10-CPU DEC 840010 DEC workstations

Simulation Facility

25 H-PPA-RISC

25 H-PPA-RISC

CSF - RISC servers

PCSF - PCs & NT

10 PentiumPro25 Pentium II

10 PentiumPro25 Pentium II

60 dualprocessor PCs

60 dualprocessor PCs

13 DEC workstations3 IBM workstations

13 DEC workstations3 IBM workstations

PC Farms

Page 12: Data and Computation for Physics Analysis

CERN

12Tony Cass

Interactive Physics Analysis Interactive systems are needed to enable physicists to develop

and test programs before running lengthy batch jobs.– Physicists also

» visualise event data and histograms» prepare papers, and» send Email

Most physicists use workstations—either private systems or central systems accessed via an Xterminal or PC.

We need an environment that provides access to specialist physics facilities as well as to general interactive services.

analysis objects(extracted by physics topic)

Page 13: Data and Computation for Physics Analysis

CERN

13Tony Cass

Unix based Interactive Architecture Backup

& ArchiveReference

EnvironmentsCORE

Services

Optimized Access

X Terminals PCs PrivateWorkstations.

WorkGroupServer

Clusters

PLUSCLUSTERS

Central Services

(mail, news,ccdb, etc.)

ASIS :Replicated

AFS Binary Servers

AFS Home Directory Services

GeneralStaged Data

Pool

X-terminal Support

CERN InternalNetwork

Page 14: Data and Computation for Physics Analysis

CERN

14Tony Cass

PC based Interactive Architecture

Page 15: Data and Computation for Physics Analysis

CERN

15Tony Cass

Event Displays

Event displays, such as this ALEPH display help physicists to understand what is happening in a detector. A Web based event display, WIRED, was developed for DELPHI and is now used elsewhere.

Clever processing of events can also highlight certain features—such as in the V-plot views of ALEPH TPC data.

Standard X-Y view

V-plot view

Page 16: Data and Computation for Physics Analysis

CERN

16Tony Cass

Data Analysis Work

By selecting a dE/dx vs. p region on this scatter plot, a physicist can choose tracks created by a particular type of particle.

Most of the time, though, physicists will study eventdistributions rather than individual events.

RICH detectors provide better particle identification, however. This plot shows that the LHCb RICH detectors can distinguish pions from kaons efficiently over a wide momentum range.

Using RICH information greatly improves the signal/noise ratio in invariant mass plots.

Page 17: Data and Computation for Physics Analysis

CERN

17Tony Cass

CERN’s Network Connections

CERN

RENATER

C-IXP

IN2P3

TEN-155

C&W (US)

ATM Test Beds

SWITCH

39/155 Mb/s6 Mb/s2Mb/s

100

Mb/

s

12/20Mb/s

100 Mb/s

155 Mb/s

National Research Networks

Mission Oriented Link

Public

Test

CommercialWHO

TEN-155: Trans-European Network at 155Mb/s

2Mb/s

Page 18: Data and Computation for Physics Analysis

CERN

18Tony Cass

CERN’s Network TrafficMay - June 1999

CERN4.5Mb/s Out3.7Mb/s In

C&W (US)

RENATER

TEN-155

IN2P3

SWITCH

100Mb/s2Mb/s

20Mb/s

40Mb/s

6Mb/s

Link Bandwidth

0.6Mb/s

1.9Mb/s

2.5Mb/s

1Mb/s

0.1Mb/s

1.7Mb/s

1.8Mb/s

0.1Mb/s

~1 TB/month in each direction

1TB/month = 3.86Mb/s1Mb/s = 10GB/day

Incoming data rate

Outgoing data rate

Page 19: Data and Computation for Physics Analysis

CERN

19Tony Cass

Outgoing Traffic by ProtocolMay 31st-June 6th 1999

0

50

100

150

200

250

300

350

ftp www X afs int rfio mail news other Total

Protocol

Gig

aB

yte

s T

ran

sfe

rre

d

Elsewhere

USA

Europe

Page 20: Data and Computation for Physics Analysis

CERN

20Tony Cass

Incoming Traffic by Protocol May 31st-June 6th 1999

0

50

100

150

200

250

300

350

ftp www X afs int rfio mail news other Total

Protocol

Gig

aB

yte

s T

ran

sfe

rre

d

Elsewhere

USA

Europe

Page 21: Data and Computation for Physics Analysis

CERN

21Tony Cass

European & US Traffic GrowthFeb ’97-Jun ’98

USA

EU

Start of TEN-34 connection

199

8!

Page 22: Data and Computation for Physics Analysis

CERN

22Tony Cass

European & US Traffic GrowthFeb ’98-Jun ’99

USA

EU

Page 23: Data and Computation for Physics Analysis

CERN

23Tony Cass

Traffic GrowthJun 98 - May/Jun 99

Total

Outgoing

Incoming

0.00

1

2.00

3.00

4

ftp www X afs int rfio mail news other Total

0.00

1

2.00

3.00

4.00

5.00

6.00

7.00

8

ftp www X afs int rfio mail news other Total

0.00

1

2.00

3.00

4.00

5.00

6.00

7.00

8

ftp www X afs int rfio mail news other Total

0.00

1

2.00

3.00

4.00

5.00

6.00

7.00

8

ftp www X afs int rfio mail news other Total

Total EU

Other US

Page 24: Data and Computation for Physics Analysis

CERN

24Tony Cass

Round Trip times and Packet Loss ratesRound trip times for packets to SLAC

5 seconds!

Packet Loss rates to/from the US on the CERN link

[But traffic to, e.g., SLAC passes over other links in the US and these may also lose packets.]

[This is measured with ping; A packet must arrive and be echoed back; if it is lost, it does not give a Round Trip Time value.]

1998

Figure

s.

Page 25: Data and Computation for Physics Analysis

CERN

25Tony Cass

Looking at Data—Summary Physics experiments generate data!

– and physcists need to simulate real data to model physics processes and to understand their detectors.

Physics data must be processed, stored and manipulated. [Central] computing facilities for physicists must be designed to

take into account the needs of the data processing stages– from generation through reconstruction to analysis

Physicists also need to– communicate with outside laboratories and institutes, and to– have access to general interactive services.