Post on 27-Mar-2015
ALICE Offline Tutorial
Markus Oldenburg – CERN
Markus.Oldenburg@cern.ch
May 15, 2007 – University of Sao Paulo
ALICE Offline Tutorial
F.Carminati, P.Christakoglou, J.F.Grosse-Oetringhaus, P.Hristov, A.Peters, P.Saiz
April 13, 2007 – v1.3
based on:
Part III: PROOF
available online at: http://cern.ch/Oldenburg -> Seminars
May 15, 2007 4
PROOF
Parallel ROOT Facility
Interactive parallel analysis on a local cluster
PROOF itself is not related to GridCan be used in the GridCan access Grid files
The usage of PROOF is transparentThe same code can be run locally and in a PROOF system (certain rules have to be followed)
PROOF is part of ROOT
May 15, 2007 5
root
Remote PROOF Cluster
Data
Data
Data
proof
proof
proof
Client - Local PC
$ root
ana.Cstdout/result
node1
node2
node3
node4
$ root
root [0] tree->Process(“ana.C”)
$ root
root [0] tree->Process(“ana.C”)
root [1] gROOT->Proof(“remote”)
$ root
root [0] tree->Process(“ana.C”)
root [1] TProof::Open(“remote”)
root [2] chain->Process(“ana.C”)
ana.C
proof
PROOF Schema
Data
master
slave
slave
slave
May 15, 2007 6
Terminology
ClientYour machine running a ROOT session that is connected to a PROOF master
MasterPROOF machine coordinating work between Slaves
SlavePROOF machine that processes data
QueryA job submitted from the client to the PROOF system.A query consists of a selector and a chain
SelectorA class containing the analysis code (more details later)
ChainA list of files (trees) to process (more details later)
May 15, 2007 7
TTree
A tree is a container for data storage with disk “overspill”
It consists of several branches
These can be in one or several filesBranches are stored contiguously (split mode)When reading a tree, certain branches can be switched off speed up of analysis when not all data is needed
TreeB
ran
ch
Bra
nc
h
Bra
nc
h
May 15, 2007 8
TTree
#include "TTree.h"#include "TFile.h"#include "TRandom.h"
class point {public: void Set() {x=gRandom->Rndm();y=gRandom->Rndm();z=gRandom->Rndm();}private: Float_t x, y, z; ClassDef(point, 1)};
Int_t t() { point *pp = new point(); TTree *tree = new TTree("Test","Test Tree",99); TFile *file = new TFile("test.root","recreate"); tree->Branch("point",&pp); for(Int_t i=0; i<100; ++i) { pp->Set(); tree->Fill();} tree->Write(); file->Close(); // file=new TFile("test.root","read"); tree->Print(); // return 0;}
May 15, 2007 9
TTree (2)
*******************************************************************************Tree :Test : Test Tree **Entries : 100 : Total = 4090 bytes File Size = 0 ** : : Tree compression factor = 1.00 ********************************************************************************Branch :point **Entries : 100 : BranchElement (see below) **............................................................................**Br 0 :x : **Entries : 100 : Total Size= 1006 bytes One basket in memory **Baskets : 0 : Basket Size= 32000 bytes Compression= 1.00 **............................................................................**Br 1 :y : **Entries : 100 : Total Size= 1006 bytes One basket in memory **Baskets : 0 : Basket Size= 32000 bytes Compression= 1.00 **............................................................................**Br 2 :z : **Entries : 100 : Total Size= 1006 bytes One basket in memory **Baskets : 0 : Basket Size= 32000 bytes Compression= 1.00 **............................................................................*
point
x
y
z
x x x x x x x x x x
y y y y y y y y y y
z z z z z z z z z z
BranchesFile
May 15, 2007 10
How to use PROOF
Files to be analyzed are put into a chain ( TChain)
Analysis written as a selector ( TSelector, AliSelector, AliSelectorRL)
Input/Output is sent using dedicated lists
If additional libraries are needed, these have to be distributed as a “package”
Analysis(TSelector)
Input Files(TChain) Output
(TList)Input (TList)
May 15, 2007 11
TChain
A chain is a list of trees (in several files)
Normal TTree functions can be used
Draw(...), Scan(...) these iterate over all elements of
the chain
Selectors can be used with chainsProcess(const char* selectorFileName)
After using SetProof() these calls are run in PROOF
Chain
Tree1 (File1)
Tree2 (File2)
Tree3 (File3)
Tree4 (File3)
Tree5 (File4)
May 15, 2007 12
once on your client
once on each Slave
TSelector
for each tree
for each event
Classes derived from TSelector can run locally and in PROOF
Begin()
SlaveBegin()
Init(TTree* tree)
Process(Long64_t entry)
SlaveTerminate()
Terminate()
May 15, 2007 13
Input / Output
The TSelector class has two members of type TList:
fInput, fOutputThese are used to get input data or put output data
Input listBefore running a query the input list is populatedproof->AddInput(myObj)In the selector (Begin, SlaveBegin) the object is retrieved: fInput->FindObject(“myObject”)
May 15, 2007 14
Input / Output (2)
Output listAfter processing, the output has to be added to the output list on each Slave (in SlaveTerminate)fOutput->Add(fResult)PROOF merges the results from each query automatically (see next slide)On your client (in Terminate) you retrieve the object and save it, display it, ...fOutput->FindObject(“myResult”)
May 15, 2007 15
Input / Output (3)
MergingObjects are identified by nameStandard merging implementation for histograms availableOther classes need to implement Merge(TCollection*)When no merging function is available all the individual objects are returned
Result fromSlave 1
Result fromSlave 2
Final result
Merge()
May 15, 2007 16
Chain
Tree1 (File1)
Tree2 (File2)
Tree3 (File3)
Tree4 (File3)
Tree5 (File4)
Workflow Summary
Analysis(TSelector)
Input (TList)
proof
proof
proof
May 15, 2007 17
Workflow Summary
Analysis(TSelector)
Input (TList)
proof
proof
proof
Output(TList)
Output(TList)
Output(TList)
MergedOutput
May 15, 2007 18
Packages
PAR files: PROOF ARchive. Like Java jar
Gzipped tar filePROOF-INF directory
• BUILD.sh, building the package, executed per Slave
• SETUP.C, set environment, load libraries, executed per Slave
API to manage and activate packages
UploadPackage(“package.par”)EnablePackage(“package”)
May 15, 2007 19
Accessing ESD
Use local ROOT
To access AliESDs.root, the ESD.par package has to be uploaded into the PROOF environment
Selector derives from AliSelector (in STEER)
Access to data by member: fESD
TSelector
AliSelector
<YourSelector>
May 15, 2007 20
Accessing the RunLoader
Use local AliRootAccess to Kinematics, Clusters, etc. requires access to the RunLoaderTherefore (nearly) full AliRoot needs to be loadedA AliRoot version is already deployed on the CAF test system and can be enabled by a 3 line macro(part of the tutorial files, see later)ESD package is not allowed to be loadedSelector derives from AliSelectorRL (in STEER)
GetStack(), GetRunLoader(), GetHeader()
TSelector
AliSelector
AliSelectorRL
<YourSelector>
May 15, 2007 21
CERN Analysis Facility
The CERN Analysis Facility (CAF) will run PROOF for ALICE
Prompt analysis of pp dataPilot analysis of PbPb dataCalibration & Alignment
Available to the whole collaboration but the number of users will be limited for efficiency reasons
Design goals500 CPUs100 TB of selected data locally available
May 15, 2007 22
Evaluation of PROOF
Test setup since May 200640 machines, 2 CPUs each, 200 GB disk
Tests performedUsability testsSimple speedup plotEvaluation of different query typesEvaluation of the system when running a combination of query types
Goal: Realistic simulation of users using the system
May 15, 2007 23
Query Type Cocktail
A realistic stress test consists of different users that submit different types of queries
4 different query types20% very short queries40% short queries20% medium queries20% long queries
User mix33 nodes available for the testMaximum average speedup for 10 users = 6.6 (33 nodes = 66 CPUs)
May 15, 2007 24
Relative Speedup