Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3...
Transcript of Axel Naumann - DESY · Motivation 2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3...
Axel Naumann
OutlineMotivationBasic ingredients of I/OX-Ray of a TTreeAnalysis EnvironmentsOptimizing a TTreeTSelector & PROOF
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 2
Motivation
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 3
First data @ LHC!Reports of mis-designed TTreesReports of mis-designed data transferBored coresMisleading recommendations, rumors, misunderstanding
Let’s explain how I/O and TTrees work!
Reflection, C++ Objects
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 4
Storing C++ Objects
Need to know:TypeMembersLocation in memoryHow to create an object when reading
Provided by dictionary (rootcint / genreflex)
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 5
TNamed n("name","title");file->WriteTObject(&n);
I/O's CPU TimeSerialization and zipping takes time
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 6
C++ Objects versus Disk
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 7
Disk stores series of bytesC++ objects structured:
Data membersBase classesPointers
ROOT I/O convertsStreaming or Serialization
Zipping: CPU vs. Real TimeExample for reading:
zipped file9.4MB/s disk I/OCPU unzipscorresponds to 34MB/s data
unzipped file25MB/s disk I/O
Zipping can increase bandwidth!Especially for concurrent disk access!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 8
There is more than branches…
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 9
Views Of A TTree: C++ Access
Branch / leaf structureSplitting: generate branches recursively according to C++ class layout;create sub-branches for
Data membersMembers of base classesContainers: split elements!(C's members, not vector<C>'s members)
Split level: where to stop splitting and put entire object into one branch instead
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 10
MyClass fMember
A: public B
vector<C> fC
Splitting Across CollectionsSplit vector<C>
D.fC.fM, D.fC.fNOr even vector<C*>
D.fC.fM, D.fC.fNOr even polymorph, with split level >100
D.fC.C.fM, D.fC.C.fND.fC.C1.fC1D.fC.C2.fC2
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 11
class C {int fM, fN;};
class D1{vector<C*> fC;};
class C1: public C {int fC1;};class C2: public C {float fC2;};class D2{vector<C*> fC;};
class D0{vector<C> fC;};
Obvious Data Considerations
Don't store empty or useless data!Can use //! to not store members
Combine branchesBetter store the jet algo name with the jets than one jet branch per algoConsider vector<T*> with split level > 100
Branch granularity is read selectivityAlways reading x,y,z,E? Don't split them!Split xyzE saves a bit of disk space, though
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 12
Data Layout ConsiderationsAllocating objects takes time
TClonesArray faster than vector<T>vector<T> faster than vector<T*>
Building objects takes timeFlat inheritance hierarchyReduce object containment:class A has member of class B, which has member of class C,…STL platform dependent; need extra layer
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 13
Data ReferencesReferences are easy to get wrong
NO map (dead slow) or uuid etc (slow + big)Better use indices
Optimal: TRef / TRefArrayGood reason to inherit from TObject!Extremely fast object dereferencing, embedded in ROOT I/OSupport for merged TTreesSupport for autoloading of branches
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 14
Non-Split CollectionsNon-split storing of C:1. object-wise
Faster object retrieval
2. member-wise
Faster member retrieval,Better compression
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 15
class C {int fM, fN;};
class D0{vector<C> fC;};
TTree's Data LayoutTTree::Fill() adds to baskets
Baskets: most important internal concept of TTrees!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 16
class C {int fM;long fN;}
BasketsObjects TTree HeaderC.fM
fMfN
…
C.fN offset …
offsetfMfM fMfMfM
fN fN fNfN fN
TTree's Data Layout: Baskets
Baskets concatenate collection elements across TTree entriesWhen basket full: zip, write to TFile, store file offset in TTree headerImportant parameters:
basket sizesizeof(element)sizeof(element)*collection entries per tree entry
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 17
e.g. std::vector<C>
1 versus 1M BranchesEach branch has management overhead (baskets,…):1 branch ideal!Each branch can be accessed independently, without reading anything else: 1M branches ideal!Reasonable number of branches:tens to few hundreds
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 18
Spin, little disk, spin!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 19
Reading TTreesTask: read (subset of) all branches for a
TTree entry numberGet file offsets for requested branchesRead necessary baskets from fileUnzip baskets and fill objects
Plus: schema evolution, endianness,…
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 20
Reading TTreesReading baskets: what can happen?
Only part of basket is needed
Need to skip baskets of other branches
Huge basket size, tiny contained values: basket might be written at end of file
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 21
Read Access PatternTraditional TTree has many small reads
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 22
I/O Performance AnalysisMonitor TTree reads with TTreePerfStats
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 23
TFile *f = TFile::Open("xyz.root");T = (TTree*)f->Get("MyTree");
TTreePerfStats *ps = new TTreePerfStats("ioperf",T);
Long64_t n = T->GetEntries();for (Long64_t i = 0;i < n; ++i) {
GetEntry(i);DoSomething();
}ps->SaveAs("perfstat.root");
New in v5.25/04!
Study TTreePerfStatsVisualizes read access:x: tree entry
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 24
TFile f("perfstat.root");ioperf->Draw();ioperf->Print();
y: file offsety: real time
Reading BasketsProblem: many seeks
Reduces throughputfrom O(100) MB/s to O(1) MB/s (real time)
Disk cannot support >1 userLatency for each request
typical network, typical file: 120ms round trip, 1M readsone day waiting time!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 25
Legi, Vidi, Vici!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 26
Fewer Requests, Part 1Less sensitive to latency
better ask once for 1M baskets than 1M times for 1 basket
NOT A SOLUTION: only one branchno granularityno parallelizationcharging network with irrelevant data
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 27
Fewer Requests, Part 2Less sensitive to latency
better ask once for 1M baskets than 1M times for 1 basket
NOT A SOLUTION: only one branchBetter: sending a collection of requests
Storage (kernel / disk / disk server) can sort requests
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 28
TTreeCacheSends a collection of read requests before analysis needs the basketsMust predict baskets:
learns from previous entriestakes TEntryList into account
Enabled per TTree
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 29
Improved in v5.25/04!f = new TFile ("xyz.root");T = (TTree*)f->Get("Events");T->SetCacheSize(30000000);T->AddBranchToCache("*");
TTreeCache UsageWithout: analysis after transfer + latency:
With TTreeCache, transfer and analysis of prior TTree entry in parallel:
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 30
CPU
I/O
CPU
I/O
TTreeCache vs. SeeksTTreeCache sends collected read request (readv)Merges only adjacent baskets, reducing number of requests by almost nothingDisks hate seeking, love sequential readingMuch cheaper to read 2MB than to read 1k at the beginning and 1k at the end
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 31
Read PaddingMerges all read requests within a given distance by also requesting bytes in betweenTypical window: 2MBDramatically reduces load onstorage device, even local diskDramatically increases throughputMust-use for concurrent storageaccess and / or network
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 32
Half WayMuch more ordered readsStill lots of jumps because baskets spread acrossfile
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 33
Problem: Basket SizeIdeally, reading TTree entry is one seekAll TTree entries' baskets consecutive
In reality, most baskets not full after filling a TTree entryBaskets shared by several TTree entriesNeed to seek to read all baskets
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 34
OptimizeBaskets, AutoFlushSolution, enabled by default:
Tweak basket size!Flush baskets at regular intervals!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 35
New in v5.25/04!
TTree Optimizations: Results
Studying Atlas and CMS AOD filesResults for Atlas: factor 6 improvement!That can be 1 hour instead of 6!Concurrent data access now possible
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 36
TTreeCache off 30MB TTreeCacheOriginal File 658s real time 183s real time
166s CPU time 126s CPU timeOptimized File 117s real time 109s real time
102s CPU time 99s CPU time
We know how to process your data!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 37
TSelectorEverybody uses TTreesObvious to create a common analysis frameworkDerive from TSelectorSeparates analysis into steps
Init() – "this is your tree!"SlaveBegin() – "create your histogram!"Process() – "analyze the event!"Terminate() – "we're done, fit!"
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 38
Parallel AnalysisAnalyze several TTree entries in parallel, e.g. in a batchTypical steps:1. send code2. send split data3. analyze4. merge resultsUse the same TSelector also for parallel analysis!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 39
PROOF
Axel Naumann • ROOT @ NTNU Tech Screening 40
PROOF farm
Storage
MASTER
commands,commands,scriptsscripts
list of outputlist of outputobjectsobjects
(histograms, (histograms, ……))
Client
Workers
2009-11-09
Creating a PROOF SessionIn ROOT type:
Connects ROOT to the master machine on the PROOF cluster (here: "uberpc")TSelectors will be run in PROOF
Axel Naumann • ROOT @ NTNU Tech Screening 41
TProof *p = TProof::Open("uberpc")
2009-11-09
PROOF Lite
Axel Naumann • ROOT @ NTNU Tech Screening 42
commands,commands,scriptsscripts
list of outputlist of outputobjectsobjects
(histograms, (histograms, ……))
Client
Multi-core Desktop/Laptop
2009-11-09
Creating a PROOF Lite SessionIn ROOT type:
TSelectors will be run on all cores in parallelConverts your multi-core computer into a PROOF cluster!
Axel Naumann • ROOT @ NTNU Tech Screening 43
TProof *p = TProof::Open("lite")
2009-11-09
PROOF AnalysisExample of local TChain analysis
Axel Naumann • ROOT @ NTNU Tech Screening 44
PROOF// Create a chain of treesroot[0] TChain *c = new TChain("myTree");root[1] c->Add("http://www.any.where/file1.root");root[2] c->Add("http://www.any.where/file2.root");
// MySelector is a TSelectorroot[3] c->Process("MySelector.C+");
2009-11-09
PROOF AnalysisSame example with PROOF
Axel Naumann • ROOT @ NTNU Tech Screening 45
// Create a chain of treesroot[0] TChain *c = new TChain("myTree");root[1] c->Add("http://www.any.where/file1.root");root[2] c->Add("http://www.any.where/file2.root");
// Start PROOF and tell the chain to use itroot[3] TProof::Open("lite");root[4] c->SetProof();
// Process goes via PROOFroot[5] c->Process("MySelector.C+");
2009-11-09
PROOF Is InteractiveSee results while they accumulate
Calculation wrong?Forgot to fill histogram?Restart now instead of in 8 hours
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 46
PROOF Is QuickOptimized for quick results,not batch system occupancy
Send TTree entries to workers while running, based on their past performanceReduces "tail"Allowed ALICE tosee first collisions after two minutesinstead of hours!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 47
time
PROOF AvailabilityPROOF Lite is "just there" >= 5.24Set up a local PROOF cluster, e.g. allow a batch cluster to also be used by PROOF!People who use it and the grid or traditional job-based batches love itBut you already have it:PROOF@NAF!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 48
We're still not done!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 49
Upcoming ChallengesDecrease CPU time of I/O
Parallel unzipping (CPU time / core)
Building objects in a smarter wayShorten merge time!
Merge in parallel to analysisEasy for histograms etc, tricky for TTrees…
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 50
I/O
CPUAnalysis
ZIP
What To Take To Your Office
Many optimizations enabled by default,Except for those:
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 51
f = new TFile ("xyz.root");T = (TTree*)f->Get("Events");T->SetBranchStatus("*", 0);T->SetBranchStatus("MyBranch*", 1);T->SetCacheSize(30000000);T->AddBranchToCache("MyBranch*");
SummaryI/O costs real time, CPU timePerformance monitoring and optimizations part of ROOTDefault optimizers show huge benefit for network transfer and even local files!Build a good tree, see how it behavesAnalyze with PROOF for quick results!
2009-11-30 Axel Naumann • ROOT I/O @ DESY Computing Seminar 52