CERN Computing Fabric Status LHCC Review , 19 th November 2007

download CERN Computing Fabric Status  LHCC Review , 19 th  November 2007

If you can't read please download the document

description

CERN Computing Fabric Status LHCC Review , 19 th November 2007. Ingredients. Coarse grain functional differences in the CERN computing fabric T0  Central Data Recording, first pass processing, tape migration, data export to the Tier 1 sites CAF - PowerPoint PPT Presentation

Transcript of CERN Computing Fabric Status LHCC Review , 19 th November 2007

  • CERN Computing Fabric Status

    LHCC Review , 19th November 2007

    Bernd Panzer-Steindel, CERN/IT

  • Coarse grain functional differences in the CERN computing fabric

    T0 Central Data Recording, first pass processing, tape migration, data export to the Tier 1 sites

    CAF selected data copies from the T0 (near real-time) , calibration and alignment, analysis, T1/T2/T3 functions means something different for each experiment

    Both contain the same hardware, distinction is done via logical configurationsin the Batch system and the Storage system

    CPU nodes for processing ( ~65% of the total capacity for the T0)Disk server for storage ( ~40% of the total capacity for the T0)Tape libraries, tape drives and tape serverService nodesIngredients

    Bernd Panzer-Steindel, CERN/IT

  • ~linear growth rates ! underestimates ?experience/past shows exponential growth ratesGrowth rates based on the latest experiment requirements at CERN

    Bernd Panzer-Steindel, CERN/IT

  • Tendering preparations for the 2008 purchases started in May 2007

    Deliveries of equipment have started, first ~100 nodes have arrived and are being installed

    More deliveries spread over the next 4 month

    heavy logistic operations ongoing :

    preparation for the installation of ~2300 new nodes

    racks, rack preparations, power+console+network cabling, shipment and unpacking, installation (physical and logical), quality control and burn-in tests

    preparations to retire ~1000 old nodes during the next few month

    Preparations for 2008 I

    Bernd Panzer-Steindel, CERN/IT

  • Resource increase in 2008

    more than doubling the amount of CPU resources ~ 1200 CPU nodes (~4 MCHF)

    2. increasing the disk space by a factor 4 ~ 700 Disk servers ( ~6 MCHF)

    The experiment requirements for disk space this year were underestimated. We had to increase disk space by up to 50% during the various data challenges and productions.

    3. increase and consolidation of redundant and stable services ~ 350 service nodes (~3 MCHF)

    Grid services (CE, RB, UI, etc.) , Castor, Castor data bases, condition data bases, VO-boxes, experiment specific services (bookkeeping, production steering, monitoring, etc.), build server dont underestimate the service investments !

    Preparations for 2008 II

    Bernd Panzer-Steindel, CERN/IT

  • Power and CoolingCurrent computer center has a capacity of 2.5 MW for powering the nodes and 2.5 MW cooling capacity. A battery based UPS system allows for 10 min autonomy for the 2.5 MW.

    Power for critical nodes is limited to about 340 KW (capacity backed-up by the CERN diesel generators). no free capacity left (DB systems, network, AFS, Web, Mail, etc.)

    We will reach ~2 MW already in March 2008 and will not be able to host the full required capacity in 2010

    Started activities already more than a year ago, slow progressnow active discussion between IT PH and TS

    Identify building in Prevessin (Meyrin does not have enough power available)and start preparations for infrastructure upgrade

    Budget is already foreseen

    Bernd Panzer-Steindel, CERN/IT

  • based on the latest round of requirement gathering from the experiments during the late summer period.

    includes provisioning money for a new computer center

    presented to the CCRB on the 23rd of October

    CPU server, Disk storage, Tape storage and infrastructure, service nodes,Ortacle Data Base infrastructure, LAN and WAN network, testbeds, new CC costsspread over 10 yearssmall deficit but within the error of the cost predictionsMaterial Budget

    [MCHF]20082009201020112012Material budget31.223.422.222.222.2Balance0.9-1.3-1.9-0.1-1.2

    Bernd Panzer-Steindel, CERN/IT

  • Processors I cost of a full server nodecost of a separate single processor

    Bernd Panzer-Steindel, CERN/IT

  • less than 50% of a node costs are the processors plus memory

    2007 was special year , heavy price war between INTEL and AMD, INTEL pushing their quad-cores (even competing with their own products)

    new trend dual motherboard per 1u unit, very good power supply power efficiencies, as good as for blades

    our purchases will consist out of these nodes with a possibility of getting also bladesProcessors II

    Bernd Panzer-Steindel, CERN/IT

  • Technology trends :

    aim to have a two year cycle now architecture improvements and structure reduction (45nm products already announced by INTEL) multi-core 2 3 4 6 8

    BUTwhat to do with the expected billion transistors and multi-cores ?

    the market is not clear wide spread activities of INTEL e.g.\

    -- initiatives to get multithreading into the software, quite some time away, complicated especially debugging (we have a hard time to get our simple programs to work)

    -- co-processors (audio, video, etc.)

    -- merge of CPU and GPU (graphics) , AMD + ATI combined processors, NVIDIA use GPU as processor, INTEL move graphics to the cores -- on the fly re-programmable cores (FPGA like)

    not clear where we are goingspecialized hardware in the consumer area change of price structure for usProcessors III

    Bernd Panzer-Steindel, CERN/IT

  • Memory I

    Bernd Panzer-Steindel, CERN/IT

  • still monthly fluctuations in costs, up and down

    large variety of memory modules frequency and latency 533 and 667 MHz about 10% cost difference, factor 2 for 1 Ghz higher frequency goes along with higher latency CAS

    how does HEP code depend on memory speed ?

    DDR3 upcoming , more expensive in the beginning

    is 2 GB per core really enough ?Memory II

    Bernd Panzer-Steindel, CERN/IT

  • Disk storage Icost of a full disk server nodecost of a separate single disk

    Bernd Panzer-Steindel, CERN/IT

  • Trends

    cost evolution of single disks is still good ( ~ factor 2 per year, model dependent)

    lots of infrastructure needed upgrade of CPU and memory footprint of applications : RFIO, Gridftp, buffers, new functions, checksums, RAID5 consistency checks, data integrity probes

    need disk space AND spindles use smaller disks or buy more increase overall costs

    solid-state-disks, much more expensive (factor ~50) data base area

    hybrid disks good for VISTA (at least in the future, does not work yet) but higher price e.g. new Seagate disks + 256 MB flash == + 25% costs general trend for notebooks cant profit in our environment seldom cache reuse

    Disk storage II

    Bernd Panzer-Steindel, CERN/IT

  • The physical network topology (connections of nodes to switches and routers)is defined by space, electricity, cooling and cabling constraintsNetwork routerService NodesDisk ServerCPU ServerInternal Network I

    Bernd Panzer-Steindel, CERN/IT

  • Changing access patterns, high aggregate IO on the disk serversCPU serverDisk serverInternal Network II3000 nodes running 16000 concurrent physics applications are trying to access 1000 disk servers with 22000 disksLogical network topology

    Bernd Panzer-Steindel, CERN/IT

  • Need to upgrade the internal network infrastructure :decrease the blocking factor on the switches = spread the existing serversover more switches

    Changes since the 2005 LCG computing TDR :

    disk space increased by 30 %

    concurrent running applications increased by a factor 4 (multi-core technology evolution)

    computing model evolution, more high IO applications (calibration and alignment, analysis)

    doubling the number of connections (switches) to the network Core routers which as a consequences requires also to double the number of routers

    additional investment of 3 MCHF in 2008 (already approved by finance committee)Internal Network III

    Bernd Panzer-Steindel, CERN/IT

  • Batch Systemsolved with the upgrade to LSF 7andhardware upgrade of LSF control nodesMuch improved response time, removedthrottling bottlenecksaverage about 75000 jobs/daypeak value 115000 jobs/dayup to 50000 in the queue at any timetested with 500000 jobssome scalability and stability problems in spring and early summer

    Bernd Panzer-Steindel, CERN/IT

  • Tape StorageToday we have :

    10 PB of data on tape, 75 million files

    5 Silos with ~30000 tapes, ~5 PB free space

    120 tape drives (STK and IBM)

    during the last month we have 3PB written to tape and 2.4 PB read from tape small files and spread of data sets over too many tapes caused very highmount load in the silosspace increase to 8 free PB in the next 3-4 monthmore drives to cope with high recall rates and small files need Castor improvements

    Bernd Panzer-Steindel, CERN/IT

  • CASTOR much improved stability andperformance during the summerperiod (Castor Task Force)CMS CSA07ATLAS export tests and M5 runregular running at nominal speed(with 100% beam efficiency assumed)

    very high load on the disk serverssmall scale problems observed identified and fixed (Castor+Experiment)complex patterns and large numberof IO streams require more disk spacefor the T0 (probably factor 2)successful coupling of DAQ and T0 for LHCb, ALICE and ATLAS (not yet 100%nominal speed)CMS is planned for the beginning ofnext year

    Bernd Panzer-Steindel, CERN/IT

  • Data ExportATLAS successfully demonstrated for several days their nominaldata export speed (~1030 MB/s) all in parallel to the CMS CSA07 exercise

    no Castor issues, no internal network issues

    Bernd Panzer-Steindel, CERN/IT

  • Data Management Enhance Castor disk pool definitions activity in close collaboration with the experiments, new Castor functionalities are now available (access control) avoid disk server overload, better tape recall efficiencies

    Small files creating problems for the experiment bookkeeping systems and the HSM tape system need Castor improvements in the tape area (some amount of small files will be unavoidable) Experiments are investing into file merging procedures creates more IO streams and activity, needs more disk space

    Data integrity the deployment and organization of data checksums needs more work will create more IO and bookkeeping

    CPU and data flow efficiency To increase the efficiencies one has to integrate the 4 large functional units much closer (information exchange).

    Bernd Panzer-Steindel, CERN/IT

  • SummaryLarge scale logistic operation ongoing for the 2008 resource upgrades

    Very good Castor performance and stability improvements

    Large scale network (LAN) upgrade has started

    Successful stress tests and productions from the experiments (T0 and partly CAF)

    Power and cooling growth rate requires a new computer center, planning started

    Bernd Panzer-Steindel, CERN/IT