Bob Jacobsen Feb 6, 2000 BaBar Summary What has BaBar learned? Background Experiences & Tools Key...
-
Upload
pierce-barber -
Category
Documents
-
view
213 -
download
0
Transcript of Bob Jacobsen Feb 6, 2000 BaBar Summary What has BaBar learned? Background Experiences & Tools Key...
Bob Jacobsen Feb 6, 2000BaBar Summary
What has BaBar learned?
Background
Experiences & Tools
Key Issues
SummaryXnnn Related talks
Bob Jacobsen Feb 6, 2000BaBar Summary
Project Statistics
CP B physics requires a 30 fb-1 yearly sample for > 5 years
• One year is 30M B events, 120M hadronic, 1.2B Bhabhas seen
• “Factory mode” running for greater than 80% of real time
• 100Hz of accepted L3 triggers to be read, 30Hz to fully processRoughly 3MB/sec of raw data, all day, every day
Similar size downstream processing and analysis streams
High capability detector
• 5 layer Silicon Vertex tracker
• 40 layer low mass drift chamber
• Novel “DIRC” particle ID
• Crystal calorimeter
• Highly segmented instrumented flux return
But life is never easy
• Severe machine backgrounds
• Significant compromises in geometry & regularity
Bob Jacobsen Feb 6, 2000BaBar Summary
Worldwide Collaboration of 80 Institutes
Bob Jacobsen Feb 6, 2000BaBar Summary
Offline Computing Goal: "Factory Running”
Do physics with a time lag that as small as possible
• Data to submission of paper in 6 months (asymptotically)
Strategy:
• Volume processing of data, MC to a high levelIncluding physics algorithms and selections in first pass processing
Requires high-quality results from the start
• Remainder of work on specific samplesNeed efficient access to subsets from large to small
Reprocessing in detail
Lambda0->p pi-
K0s->pi+ pi-
gamma
conversions
V0
family
Bremsstrahlung
recovery
Pi0
family
Psi(2S)
J/Psi
Charmonium
family
D+->K-pi+pi+
D0->K-pi+
D0->K-pi+pi0
D0->K-pi+ pi+ pi-D0->Kspi+pi-
D meson
family
D*+->D0 pi+
D*0->D0 pi0
D*
family
phi->K+K-
eta->gamma gamma
eta->pi+pi-pi0
eta'->rho0 gammaeta'->etapi+pi-
K*0->K-pi+K*0->Kspi0
K*+->Kspi+
K*+->K+pi0
w->pi+pi-pi0
rho0->pi+pi-
rho+->pi+pi0a1+->rho0pi+
Inclusive
charmless
B- -> D*0 pi-
B- ->D*0 rho-
B- ->D*- a1-
B- ->D0pi-
B- ->D0rho-
B- ->D0a1-
B0->D-pi+B0->D-rho+
B0->D-a1+
B0->D*-pi+
B0->D*-rho-
B0->D*-a1-
Exclusive B
to open charm
Bob Jacobsen Feb 6, 2000BaBar Summary
Does this strategy work?
• Too early to tell for sureFirst you have to get it running, then see if it improves ability to do physics
Still bringing the system up
• By next summer, expect to know more
All is not sweetness and light
• Can the system keep up with billions of events and hundreds of physicists?
• Our event store is not yet transparentThroughput problems
Data distribution problems
Still trying to get granularity right
• “Can senior people with good intuition contribute?”
Bob Jacobsen Feb 6, 2000BaBar Summary
Running Experience
First collision data was May 26, 1999
• First events processed that same shift
In subsequent 230 days:
• 194M colliding-beam events were recorded in 3224 runs
• 250M events were reconstructed (some more than once)
• 34TB of data were stored on 900 HPSS tapes
Several high visibility problems
• Not keeping up with dataAt all levels
• Lots of algorithm work to do
• Calibrations slow to converge
Bob Jacobsen Feb 6, 2000BaBar Summary
Fast startup
Bob Jacobsen Feb 6, 2000BaBar Summary
Overview
Have to refer you to previous explanations of the BaBar approach
• Obstacles as of 1995
• Changes in transition
• We stress cyclic involvement, evolvabilty
The Martin Uncertainty Principle:
"The problem cannot truly be understood until the solution exists.”
We pushed out new technologies fast
• Used flexibility of system to adapt to improved understanding
• You store up trouble this wayNot everything gets completely updated
Start to have trouble rememering/understanding our reasons
• Are we capturing what we know?
Was running on day one, continues to improve
• Capability comparable or better than past experiments
• Coping with a large processing load
• Flexibility is still important
Bob Jacobsen Feb 6, 2000BaBar Summary
Obstacles as of 1995
Nothing to build on
• Lots to do
• Too many possibilities for the first pieces
Inverted schedule
• Analysis and design are high skill activities
• But have to come at the beginning, before skills are developed
No clear agreement on product
• Everybody knows what a track finder does, but few agree
• Real product is flexibilityWaiting for that “smart idea” in 2003
Not much expertise & effort available
• Much existing expertise of doubtful applicablity
• C++ advocates had limited design experience
• Mismatch between enthusiasm and effectiveness
I expect these are common issues beyond BaBar
CHEP97 slide
Bob Jacobsen Feb 6, 2000BaBar Summary
What changes in the transition to OO/C++?
Better/worse?
• FORTRAN77 Betterperhaps with VAX extensions
• “Extensions” to add functionality and controlZEBRA/BOS/... some native,
ADAMO some missing
• Code management toolsHISTORIAN, homegrown tools easier, but needed development
• Standards, practices, policiesCommon lore of the HEP programming community missing
Design idioms and normal practices missing
Locally developed, customized, documented missing
• Programmer skill, commitment and ingenuity Much Worse
“If you expect a language to solve all your problems, you don’t have interesting problems” - A. Koenig
CHEP97 slide
Bob Jacobsen Feb 6, 2000BaBar Summary
Multi-prong attack
Architecture
• Solving the “too many first choices” problem
• OO design expressing “traditional” concepts
Iterative design & implementation process
• Applying “evolutionary pressure”Design and implementation intertwined
• Strike multiple balancesLearn by doing, do while learning
“Getting something in place” vs. design work
• In spite of imposed waterfall-model schedule
Gain control of the process by controlling the product
• Code management
• Quality control and assurance
Team-building
• Getting the people
• Formal training balancing experience and exposureCHEP97 slide
Bob Jacobsen Feb 6, 2000BaBar Summary
Our “design process”
We use an evolutionary approach
• People enter coding
• Eventually, they start to draw clouds and blobs
• Many of them become good designers
Evolution improves the system
• Relevant code is usedComments are not always gentle
• Release system controls the paceBiweekly timescale
• New designs, redesigns are ongoingDriven by perceived needs
Policy: “Get them engaged, then work with them”
Whatnext?
Design it
Code it
Release & use it
CHEP97 slide
Bob Jacobsen Feb 6, 2000BaBar Summary
C++ & OO
People will write C++
Structure varies a lot
• FORTRAN with ; and #include
• C with abstract datatypes
• "The True Style” (whatever that means)data hiding
reuse by inheritance
abstract interfaces
generic programming
Flexibility is both a strength and a weakness
We’ve had some very significant successes
• Calibration model
• Track model
• Physics analysis tools
Dec-95 Jul-96 Feb-97 Aug-97 Mar-98 Sep-98 Apr-99
F77C++
Lines of Code (releases)
A328
B112C106
Bob Jacobsen Feb 6, 2000BaBar Summary
“C++ is harder to learn than FORTRAN”
Unfortunate, but true
• Perhaps you can justify it
• Can lead to mistakes
Need efforts to limit impact
• Training
• Mentoring
C++ and especially OO puts off a number of senior, experienced people
• Even with specific efforts to couple in, this has cost us
• PI's less likely than postdocs to contribute to reconstruction and simulation
• Will it extend to analysis? For how long?
Size (arbitrary units)
Method 1
Method 2
Method 3
Bob Jacobsen Feb 6, 2000BaBar Summary
“Our code is slow”
Strategy was structure & function first, then worry about time
Lessons:
• Make sure you understand which are the most severe problems
• If it gets too far out of hand, you strain working relations
• You can never catch up with a bad impression
CPU time (333 MHz sun)
0102030405060708090
100
Date
no bkgw bkg
Bob Jacobsen Feb 6, 2000BaBar Summary
Natural trend is upward
• Updated algorithms always get more complex, esp, when real data arrives
• New algorithms run in parallel with existing ones
• Generality costs speed, but still too early to sacrifice
But a lot of code is just inefficient
• Ongoing attention to detail recovers large performance increments
0
50
100
150
200
250
300
350
400
450
11/8/99 12/8/99 1/7/00 2/6/00
DCH
DRC
EMC
IFR
MISC
SVT
TRK
Bob Jacobsen Feb 6, 2000BaBar Summary
“Still hard to find bugs”
New places for them to hide
• Harder to even know they exist
• "C++ is a pig of a language from a memory leak point of view”
Need unfamiliar tools
• Purify, Great Circle, etc
• Need to be routinely runWhich means centrally
We do per release
• But nobody wants that job
Bob Jacobsen Feb 6, 2000BaBar Summary
“But easier to scale/modify/adapt”
Examples
• Algorithm flux at first dataNew pattern recognition & fitting
Without trashing interconnections
• Layered physics tools
• Adding another persistency mechanism
Usually due to abstract data types & information hiding
• Physicists connect with these well
• Rather than more advanced techniques
"Where we have experienced problems, could it have been that we weren't OO enough?”
• Quote is not from a computing specialist!
Size (arbitrary units)
Method 1
Method 2
Method 3
Bob Jacobsen Feb 6, 2000BaBar Summary
How do people learn these skills?
Some fraction of people will only read a C++ book and generalize
• Often not interested to seek out BaBar-specific information
• This has implications for system designPeople haven’t even seen the recommended solutions!
Many will seek out information
• Interested in learning design principles, vocabularyWe use commercial courses, as did not find anything better
• HEP & BaBar specifics were taught informally
Bob Jacobsen Feb 6, 2000BaBar Summary
Battle-testing the architecture
Now solving new and harder problems
• Real data and real use are not quite as expectedDespite useful results from Mock Data Challenges
Current work made possible by design choices a long time ago
• Can architecture simultaneously flex and support weight?
Next slides discuss experience with key aspects:
• Module/Event/Environment structure
• Transient/persistent split
• Rolling calibrations
• Low-latency processing
• Objectivity persistency
Bob Jacobsen Feb 6, 2000BaBar Summary
Module, event and environment structure - reminder
Modules provide the algorithms
• Use existing information to create new objectsStyles range from procedural monoliths to OO castles
• Framework/AC++ provides control & configUses TCL scripting, command line
Production executables run 300 modules
Objects have behaviors, not just values
• “Networks of objects collaborate to provide semantics”
• Internal form of our track objects is irrelevant
Objects kept in event and environment
• Named access in a flat spaceevent -> Ifd<EmcCluster>::get(“MergedClusters”)
• Implemented via ProxyDictProxies provide complex access when needed
Ensures physical decoupling
Lots ofEmcDigis
Lots ofEmcClusters
Lots ofRecoTracks
EmcClustering
TrackAssociator
Lots ofAssociations
Size (arbitrary units)
Method 1
Method 2
Method 3
Bob Jacobsen Feb 6, 2000BaBar Summary
A success!
Linear processing model well suited to production work
• Command-line configuration vital for development
Configuration issues at this size
• Largest executables are becoming rigid due to amount of configurationTCL does work at this scale, but we have to invest in cleaning up
• Ad-hoc application setup hard to maintain
• Tools to deal with this have not been a high priorityConfiguration dump/restore
Configuration tool with knowledge of prerequisites and large-scale options?
Event/Environment model works well
• Average collaborator completely shielded from underlying access
• Have been able to add deferred I/O, caching, context control
Bob Jacobsen Feb 6, 2000BaBar Summary
Good points:
• ”Pointers like you read in a C++ book" are valuableThat's all many people will want to know
• Performance gain at reference-timeBaBar objects are small, with lots of pointer interconnections
• Allows complexity in the transient model, independent of persistent modelNecessary for reprocessing and replication
• Fine controlPartial read, incremental read possible
Handling semantics (schema) changes
• Proven effective in Kanga projectBuilt by non-experts
Bad points:
• Performance cost at I/O timeAdds 1-2 msec + 15% to access
Significant for lightweight, high-speed processing
• Creation of smartest scribes requires some effort
Now believe structure robust enough to move to proxied access for some data
Persistent - transient split
1..n
P1..n1..n1..n
0..n0..n0..n111 0..n0..n0..n111 111111 SubClusterClusterDigi
0..nP111111 SubClusterP1..n1..n1..nDigiP ClusterP
Size (arbitrary units)
Method 1
Method 2
Method 3
Bob Jacobsen Feb 6, 2000BaBar Summary
Prompt reconstruction
Rolling calibration
• Technically difficultScatter/gather over entire farm
Requires automation
• Difficult organizationallyCrosses many lines
• Necessary for BaBar
Low-latency processing
• Causes much entropy during experiment startupReprocessing needed to understand initial data
But farms and people busy processing newly arrived data
• How to balance new and old?
Run 1 Run 2 Run 3
Bob Jacobsen Feb 6, 2000BaBar Summary
Objectivity persistance
Objectivity, OO databases, event store are three different things
Objectivity
• Commercial product, strengths and weaknesses
• BaBar-provided licenses in use at 30 institutions in 6 countries
OO database
• BaBar has 508 persistant classes, developed by about 60 people
• Works well for online, conditions, config information
Event store
• Currently holding 33TB of data in 28000 collections on 12 serversTypically 90 simultaneous users at SLAC
430 people have used it
• Provides a number of significant possibilities & questions Can it be matched to the sequential access demands?
Is drill-down analysis worthwhile?
• Not yet a proven concept
C103
Bob Jacobsen Feb 6, 2000BaBar Summary
Prompt reconstruction production
Steady-state rate pretty good
But still problems
• Startup/shutdown timeBaBar takes short runs
• Running fraction
A288
Bob Jacobsen Feb 6, 2000BaBar Summary
Got here via program of optimizationC110
Bob Jacobsen Feb 6, 2000BaBar Summary
Analysis running is more complicatedC103
Unscheduled outagesScheduled outages
Bob Jacobsen Feb 6, 2000BaBar Summary
Data distribution inside and outside of SLAC
• Local swapping via HPSS is a success
• Regional centers have invested in making this work
• Problem at remote universitiessize
skill and effort
• esp. for MC production"Here's a tape, you deal with it”
Remote MC production
• Very hard to set up remote productionToo much SLAC-specific context has crept into the system
Support for local infrastructure not generally available
• Large-scale production is resource intensive
C372
Bob Jacobsen Feb 6, 2000BaBar Summary
Collections and tag bits
A collection is a set of events
• Gives direct access to all the parts of the event
• Created during processing/scanning/reprocessingCollection gathers together the results of partial event reprocessing
• Novel concept for physics analysisUsage is rapidly ramping up, with 28k now
Requires organization to use these collaboratively
Collection maintains “Tag” quantities400+ tag quantities now - bits and values
Logical operations allow faster scans
Relentless pressure to increase size of Tag data
Bob Jacobsen Feb 6, 2000BaBar Summary
Roles for ROOT
“Kanga” project: use ROOT I/O to store micro-DST
• Added late, as plug-in to existing system
• Limited-function copy of some conditions values
• Allows ROOT/Objectivity hybrid
• Middle-road solutions suffers by comparison
Interactive used in several ways in analysis
• Ntuple and histogram tool
• Pico-Analysis-Framework (PAF)Separate classes from rest of system
Access via replicated interfaces to underlying analysis data
How get access to the functionality of the objects?
• ROOT interactive analysis for non-ROOT event store?
• Requires deep understanding of object relationships
Size (arbitrary units)
Method 1
Method 2
Method 3
Bob Jacobsen Feb 6, 2000BaBar Summary
CORBA/Java/XML
Highest-tech lives in the online system
• Much more literate, homogenous group
Offline use limited to event display and browsers
WIRED display
• CORBA servers
• Access to central event store
D161
B374
D290
F118
Bob Jacobsen Feb 6, 2000BaBar Summary
The concept of software project management
Something new in this generation of experiments
• Bigger systems are possible/necessary now, and they sop up all the technical gains
• Example: BaBar analysis tools run in production
Still learning how to do this
• Similar, yet different from hardware projects
• BaBar’s matrix organization by system and computing areaWhich way will people sign up in the beginning?
Which is more stable in the long run?
"Data handling" as respected subject
• Collaborations paying attention in advance
• Bookkeeping critical to success
• Robots allow access to raw data, instead of waiting for yearly bulk reprocessing
• Big issue for off-site work - is this really getting better?
"In art, intentions are not enough. What counts is what one does, not what one intends to do.” - Pablo Picasso
Bob Jacobsen Feb 6, 2000BaBar Summary
Code Management & Release Management
CVS - SoftRelTools approach
• Collaboration-wide read/write access to code in CVSOrganized as 630 packages
We don’t attempt to keep the HEAD production quality
• Package coordinators One per package
Tags and announces when new version ready for use
• Build periodic releases from these tagged versionsIntegration and testing to production now takes two weeks
“100KLOC is easy; we know how to do 1MLOC; 10MLOC is hard”
• Examples of what we've had trouble withTransition to "use & production" instead of development
Introduced a more reliable (rigid) one month cycle
Imposing a freeze on processing code now to create summer CP violation sample
• Cannot imagine getting this "right”
New issue - runtime environment management
Bob Jacobsen Feb 6, 2000BaBar Summary
Distributed multi-platform development
Use native compilers & tools on Sun/Solaris, DEC/Compaq and Linux
• Code to a common subset, empirically enforced
We're still waiting for the compiler promised land
• Recent migration to Linux was interesting
• Still need to think about issues beyond C++ semantics/syntaxE.g. template instantiation, inline tricks
• Ongoing problems with STL, bool
Complete builds take days
• Especially with optimization
• Poor interactions with templates
People keep saying compilers are getting better.
It is not happening fast.
E309
Bob Jacobsen Feb 6, 2000BaBar Summary
Collaborations2
We collaborate with a number of others
• GEANT4, RD45
• CDF
• CLHEP / ZOOM
• JAS, ROOT
Generally most successful as intellectual collaborations
• Work on areas of common concern
• BaBar timescale often forces different approaches
• Limited common code development
Truly unfortunate how hard it is to share code
• “You can’t avoid choosing base classes”
• Net result is continuing reimplementation of common tasks
• Are we missing some simple technology for this?
Bob Jacobsen Feb 6, 2000BaBar Summary
Connection to GEANT4
GEANT4 simulation is a critical part of our strategy
• Need integration with rest of system
• GEANT3 is not a good neighbor
“BOGUS”, BaBar G4 sim, works at several levels
• “Detailed BOGUS” is replacement for “bbsim”, our G3 simulation
• “Fast Bogus” is replacement for ASLUND, our smeared simulation
• “Very Fast Bogus” fills a new niche
• All three are in use, but not yet default
We consider GEANT4 a success
• Good interactions!
• Able to build real products with it!
It has been a long, major effort, and its not done yet
• Still not as reliable as GEANT3(G3 had a headstart)
• Concerned about continued evolution
Bob Jacobsen Feb 6, 2000BaBar Summary
Connection to RD45
What RD45 thinks is important:
• Single federation image across the collaboration
• Direct access to objects, removing need for event catalog, etc.
What BaBar thinks is important:
• Robustness
• Ability to evolve
• Timing
We encountered issues they didn’t appreciate
• How do people test, including making mistakes?
• What if somebody leaves a lock on a DAQ container?
• AMS limited to 1024 open files
• Only 64,000 files/databases of 2GB/10GB per federation
• Single threaded AMS, 400 collaborators
We find it useful to cooperate, but hard to use each other’s code
Bob Jacobsen Feb 6, 2000BaBar Summary
Is analysis different?
What people are used to: “I read the beamspot from CCC. I get the track parameters, then displace the origin to the observed beamspot with the XXX subroutine, then use the new d0 as my miss distance, giving it a sign using John’s version of YYY”
• Physicists quite comfortable with this detail, and will ask for it if they don’t get it
“I called signedDoca(evt->foundSpot())”
• Generates lots of questions with unpopular answers
Physicists think with "models" => system of equations & constants/values which doesn't do anything. You use them by plugging in numbers and calculating.
• This leads to a deep misunderstanding of "objects”, resulting in a procedures & structures approach.
• Invoking member functions with unknown implementation feels very different from passing formulae via email/paper, then implementing them
Will invoking member functions ever replace passing formulae around?
Bob Jacobsen Feb 6, 2000BaBar Summary
The physicists desktop in BaBar
Production code writes histograms, ntuple using HepTuple interface
• In theory, allows replacement of downstream analysis package
• We have PAW (HBOOK), ROOT, JAS implementations
• But keeping enough functionality makes HepTuple a moving target
Java Analysis Studio
• Used for some aspects of online presenters
• Some partisans are using it in offline
ROOT
• Used for some aspects of online presenters
• Some partisans are using it offline
Some people write FORTRAN to manipulate ntuples
Mostly, people use PAW
Bob Jacobsen Feb 6, 2000BaBar Summary
The real issues:
How to give downstream tools the complete set of capabilities?
How can we access full power of offline for analysis?
How to put tools in the analysts hands?
• (Partial) reconstruction & drill-down analysis
• Visualization of calculations - code, results, processing
• Access to TeraEvents both fast and in detail
Very hard to do the entire phase space!
Bob Jacobsen Feb 6, 2000BaBar Summary
But the bottom line is: It works
B -> J/ KsB -> J/ K±