PROOF and ALICE Analysis Facilities Arsen Hayrapetyan [email protected] Yerevan Physics...

20
PROOF PROOF and and ALICE Analysis ALICE Analysis Facilities Facilities Arsen Hayrapetyan [email protected] Yerevan Physics Institute, CERN

Transcript of PROOF and ALICE Analysis Facilities Arsen Hayrapetyan [email protected] Yerevan Physics...

Page 1: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

PROOF PROOF and and

ALICE Analysis ALICE Analysis FacilitiesFacilities

Arsen Hayrapetyan

[email protected]

Yerevan Physics

Institute, CERN

Page 2: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

PROOFPROOFRPOOF stands for Parallel ROOT Facility

It allows parallel processing of large amount of data. The output results can be directly visualised (e.g. the output histogram can be drawn at the end of the proof session.)

The data you process can reside on your computer disk (PROOF Lite), PROOF cluster disks or grid.

The usage of PROOF is transparent◦The same code can be run locally and in a PROOF system (certain rules have to be followed)

PROOF is part of ROOT

ALICE Offline Tutorial, 26-27 March 2012 2

Page 3: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

root

Remote PROOF Cluster

Data

root

root

root

Client – Local PC

ana.C

stdout/result

node1

node2

node3

node4

ana.C

root

How does PROOF analysis How does PROOF analysis work?work?

Data

Proof masterProof slave

Result

Data

Result

Data

Result

Result

ALICE Offline Tutorial, 26-27 March 2012 3

Page 4: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

Event based (trivial) Event based (trivial) ParallelismParallelism

ALICE Offline Tutorial, 26-27 March 2012 4

Page 5: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

TerminologyTerminologyClient

Your machine running a ROOT session that is connected to a PROOF master

Master PROOF machine coordinating work between slaves

Slave/Worker PROOF machine that processes data

Query A job submitted from the client to the PROOF system. A

query consists of a selector and a chainSelector

A class containing the analysis code. In ALICE we use the Analysis Framework, therefore a

AliAnalysisTaskSE is sufficientChain

A list of files (trees) to process (more details later)

ALICE Offline Tutorial, 26-27 March 2012 5

Page 6: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

How to use PROOFHow to use PROOFThe analysis framework is used

◦ Files to be analyzed are put into a chain TChain.

◦ Analysis written as a task (already introduced in previous tutorial) AliAnalysisTaskSE

◦ The same analysis like written previously can be used

If additional libraries are needed, these have to be distributed as a "package” (PAR: PRoof Archive )

Analysis(AliAnalysisTaskSE)

Input Files(TChain)

Output

ALICE Offline Tutorial, 26-27 March 2012 6

Page 7: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

once on your client

once on each slave

for each tree

for each event

AliAnalysisTaskSEAliAnalysisTaskSEClasses derived from AliAnalysisTaskSE can run locally, in PROOF and in AliEn

◦"Constructor"

◦UserCreateOutputObjects()

◦ConnectInputData()

◦UserExec()

◦Terminate()ALICE Offline Tutorial, 26-27 March 2012 7

Page 8: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

Class TTreeClass TTreeA tree is a container for data storage

It consists of several branches◦ These can be in one or several files

◦ Branches are stored contiguously (split mode)

◦ When reading a tree, certain branches can be switched off speed up of analysis when not all data is needed

Set of helper functions to visualize content(e.g. Draw, Scan)

Compressed

Tree

Bra

nc

h

Bra

nc

h

Bra

nc

h

point

x

y

z

x x x x x x x x x x

y y y y y y y y y y

z z z z z z z z z z

Branches File

ALICE Offline Tutorial, 26-27 March 2012 8

Page 9: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

TChainTChainA chain is a list of trees (in several files)

Normal TTree functions can be used◦ Draw(...), Scan(...)

these iterate over all elements of the chain

Chain

Tree1 (File1)

Tree2 (File2)

Tree3 (File3)

Tree4 (File3)

Tree5 (File4)

ALICE Offline Tutorial, 26-27 March 2012 9

Page 10: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

MergingMergingThe analysis runs on several slaves, therefore partial results have to be merged

Merging can be done in one of the following ways:◦ On few workers (submergers; their number and location is decided by PROOF) and, finally, on master

◦ Directly on master (not desirable in case of large output)

Result fromSlave 1

Result fromSlave 2

Final result

Merge()

ALICE Offline Tutorial, 26-27 March 2012 10

Page 11: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

Chain

Tree1 (File1)

Tree2 (File2)

Tree3 (File3)

Tree4 (File3)

Tree5 (File4)

Workflow SummaryWorkflow SummaryAnalysis

(AliAnalysisTask)Input

proof

proof

proof

ALICE Offline Tutorial, 26-27 March 2012 11

Page 12: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

Workflow SummaryWorkflow SummaryAnalysis

(AliAnalysisTask)

proof

proof

proof

Output

Output

Output MergedOutput

ALICE Offline Tutorial, 26-27 March 2012 12

Page 13: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

PackagesPackagesPAR files: PROOF ARchive. Like Java jar◦Gzipped tar file◦PROOF-INF directory

BUILD.sh, building the package, executed per slave

SETUP.C, set environment, load libraries, executed per slave

API to manage and activate packages◦ UploadPackage("package")◦ EnablePackage("package")

ALICE Offline Tutorial, 26-27 March 2012 13

Page 14: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

CERN Analysis FacilityCERN Analysis FacilityThe CERN Analysis Facility (CAF) will run PROOF for ALICE◦ Prompt analysis of pp data◦ Pilot analysis of PbPb data◦ Calibration & Alignment

Available to the whole collaboration but the number of users will be limited for efficiency reasons

Design goals◦ 500 CPUs◦ 100 TB of selected data locally available

ALICE Offline Tutorial, 26-27 March 2012 14

Page 15: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

ALICE Analysis ALICE Analysis Facilities (AAF)Facilities (AAF)http://aaf.cern.ch◦CAF - CERN◦SKAF - Slovakia◦KiAF - Korea◦SAF – France (Subatech)◦LAF – France (CCIN2P3, Lyon)◦JRAF – Russia (JINR)◦TAF – Italy (Torino)

ALICE Offline Tutorial, 26-27 March 2012 15

Page 16: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

PROOF datasetsPROOF datasetsA dataset represents a list of files (e.g. physics run X)◦ Correspondence between AliEn collection and PROOF dataset

Users register datasets◦ The files contained in a dataset are automatically staged from AliEn (and kept available)

◦ Datasets are used for processing with PROOF Contain all relevant information to start processing (location of files, abstract description of content of files)

Datasets are public for reading, common datasets are available (for data of common interest)

ALICE Offline Tutorial, 26-27 March 2012 16

Page 17: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

17

Datasets in PracticeDatasets in Practice

Upload to PROOF cluster◦ gProof->RegisterDataSet("myDataSet", proofColl);

Check status◦ gProof->ShowDataSets();

http://aaf.cern.ch -> Datasets -> CAF

ALICE Offline Tutorial, 26-27 March 2012 17

Page 18: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

Looking at the taskLooking at the task Constructor

◦ Called once when the task is created◦ Input/Output is connected

UserCreateOutputObjects ◦ Called once per slave◦ Create histograms

UserExec◦ Called once per event◦ Track loop, tracks are counted, histogram filled, output "posted"

Terminate◦ Called once on the client (your laptop/PC)◦ Histogram read back from the output stream, visualized, saved to disk

ALICE Offline Tutorial, 26-27 March 2012 18

Page 19: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

Reading log filesReading log filesWhen your task crashesYou can access the output of the last query by clicking on the “Show Log” button in the PROOF progress window

You can retrieve the output from any previous query◦ Open ROOT◦ Get a PROOF manager objectmgr = TProof::Mgr(”alice-caf")

◦ Get the log files from the last sessionlogs = mgr->GetSessionLogs(0) // 0=last query

◦ Display themlogs->Display()

◦ Search for a special word (e.g. segmentation violation)logs->Grep("segmentation violation")

◦ Save them to a filelogs->Save("*", "logs.txt")

ALICE Offline Tutorial, 26-27 March 2012 19

Page 20: PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Arsen.Hayrapetyan@cern.ch Yerevan Physics Institute, CERN.

Some Goodies...Some Goodies...Resetting environment◦TProof::Reset(”alicecaf")◦TProof::Reset(”alicecaf", kTRUE)

Compile with debug◦Load("<task>+g”)

ALICE Offline Tutorial, 26-27 March 2012 20