O 2 Project :Upgrade of the online and offline computing Pierre VANDE VYVRE
Belle computing upgrade
-
Upload
elvis-barker -
Category
Documents
-
view
37 -
download
2
description
Transcript of Belle computing upgrade
Belle computing upgrade
Ichiro Adachi
22 April 2005
Super B workshop in Hawaii
2
Belle’s computing goal
• Data processing 3 months to reprocess entire data accumulated so far using all of K
EK computing resources efficient resources flexibility
• Successful ( I think at least ) 1999 - 2004 all data processed and used for analysis for summer c
onferences ( good or bad? ) Example: Ds
J(2317) from David Brown’s CHEP04 talk
BaBar discovery paper : Feb 2003 Belle: confirm Ds
J(2317) : Jun 2003 Belle: discover B Ds
J(2317)D: Oct 2003 BaBar: confirm B Ds
J(2317)D: Aug 2004
“How can we keep computing power ?”
also validate software reliability
3
Present Belle computing system
Athron 1.67GHz50TB disk
50TB IDE disk
155 TB disk+
Tape Library1.29PB S-AIT
Xeon 2.8GHz
Tape Library500TB DTF2
Sparc 0.5GHz
HSM 4TB diskTape Library120TB DTF2Xeon 0.7GHz
8TB disk
Pen3 1.26GHz
Xeon 3.2GHz
Athron 1.67GHz
Xeon 3.4GHz 2 major components• under rental contract
• start from 2001• Belle own system
4
Computing resources evolving
• Purchased what we needed as we accumulated integrated luminosities so far
• Rental system contract Expired on 2006 Jan. Has to be replaced to new one
0
500
1000
1500
2000
2500
3000
3500
2001.2 2005.10
400
800
1200
1600
2000
2001.2 2005.10
40
80
120
160
200
2001.2 2005.1
CPU HSM volume Disk capacity
GHz TB TB
Processing power at 2005: 7fb-1/day 5fb-1/day at 2004
5
New rental system
Rental period
x 6 data
• Specifications Based on Oide’s luminosity scenari
o 6-year contract to 2012 Jan Middle of bidding process
• 40,000 specCINT2000_rates compute servers at 2006
• 5(1)PB tape(disk) storage system with extensions
• fast enough network connection to read/write data at the rate of 2-10GB/s (2 for DST, 10 for physics analysis)
• User friendly and efficient batch system that can be used collaboration wide
In a single 6-year lease contract we hope to double the resource in the middle, assuming Moore’s law in the
IT commodity market
6
Lessons and remarks
• Data size and access• Mass storage
Hardware Software
• Compute server
7
0
40
80
120
160
200
0 50 100 150 200
Data size & access
• Possible consideration rawdata
rawdata size integ. lum 1 PB for 1 ab-1 (at least) Read once or twice/year Keep at archive
compact beam data for analysis (“mini-DST”) 60 TB for 1 ab-1
Access frequently and (almost) randomly
Easy access preferable MC
180 TB for 1 ab-1
3 beam data in Belle’s law Read all data files by most of u
sers
Belle
2000 20012002
2003
2004
rawdata/yr(TB)
Integ.luminosity/yr(fb-1)
Detector & accelerator upgrades can change this slope
on disk
on disk? where to go?
8
Mass storage : hardware
• Central system in the coming computing• Lesson from Belle
We have been using SONY DTF drive technology since 1999. SONY DTF2…No roadmap to future development. Dead-end. S
ONY’s next technology choice is S-AIT. Testing a tape library of S-AIT from 2004. Already recorded in 5000 DTF2 tapes. We have to move…
2Gbit FC switch
The front-end disks•18 dual Xeon PC servers with two SCSI channels•8(10) connecting one 16 320(400)GB IDE disk RAID system•Total capacity is 56(96)TB
The back-end S-AIT system•SONY Petasite tape library system in 7 rack wide space• main system (12 drives) + 5 cassette consoles with total capacity of 1.3 PB (2500 tapes)
vendor’s trend
cost & time
9
Mass storage : software
• 2nd lesson We are moving from direct tape access to hierarchical storage s
ystem We have learned that automatic file migration is quite conveni
ent. But we need a lot of capacity so that we do not need operator
s to mount tapes Most of users go through all of (MC) data available in HSM, and
each access from user is random, not controlled at all. Each access requires tape reloading to copy data onto disk. # of reloading for a tape is hitting its limit !
in our usage, HSM not archive, but a big cache
need optimization in both of HSM control & user I/O
huge disk may help ?
10
Compute server
• 40,000 specCINT2000_rate at 2006• Assume Moor’s law is still valid for coming years• Bunch of PC’s is difficult for us to manage
At Belle, limited human resources Belle software distribution
• “Space” problem One floor of Tsukuba exp. hall B3(~10m20m)
2002 cleared and flooring 2005 full ! No more space ! Air condition system should be equipped “electricity” probl
em:~500W for dual 3.5GHz CPUs Moor’s law is not enough to solve this problem
11
Software
• Simulation & reconstruction Geant4 framework for Supe
r Belle detector underway Simulation with beam back
ground is being done For reconstruction, robustn
ess against BG can be a key.
12
Grid
• Distributed computing at Belle MC production carried out at 20 sites outside KEK ~45 % of MC events produced at remote institutes from 2004 Infrastructure
Super-SINET 1Gbps to major universities inside Japan Need improvements for other sites
• Grid Should help us Effort with KEK computing research center
SRB(storage resource broker) Gfarm at Grid technology research center, National Institute o
f Advanced Industrial Science and Technology(AIST)
13
Summary
• Computing for physics output Try keeping the present goal
• Rental system Renew from 2006 Jan
• Mass storage PB scale: not only size but also type of accesses Technology choice and vendor’s roadmap
• CPU Moor’s law alone does not solve “space” problem
• Software Geant4 simulation underway
• Grid Infrastructure getting better in Japan (SuperSINET)