[email protected] October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has...
Transcript of [email protected] October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has...
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 1
Apprenticeship:60 years of computing experience
Ben Segal / CERN, [email protected]/ben
CERN Computing ColloquiaOctober 3rd and 11th, 2019
Second talk(CERN from 1989 - 2018):
• No more mainframes at CERN • Launching Grid computing • Volunteer computing and virtualisation• Reflections of a retiree• Conclusions
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 2
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 3
1989-93 CN-SW-DC SectionÞAround 1989 LR stepped down as SW Group Leader and created
the SW-DC Section to concentrate on Distributed Computing:
=> SW-DC Section Leader - Les Robertson(with an initial team of two: F. Hemmer and myself)
Our activities soon led to the:
SHIFT Project… and by 1993 we had become the largest Group in theComputing and Networking (CN) Division:
PDP (Physics Data Processing) Group
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 4
1989-91 SHIFT Project begins
IDEA: To provide CERN mainframe services on networked clusters of RISC and UNIX-based nodes:
“ Scalable Heterogeneous Integrated FaciliTy ”
• Initial prototype “HOPE” : a single Apollo DN10000• Used Cray to stage tape data• Connected to accounting system: 25% of CPU for CERN !
• For real system, used : Disk, Tape and CPU servers
SHIFT Architecture
Mainframe Services from Gigabit-Networked lVorlstations Baud,. . .
Packard and OPAL, a large physics collabora-tion based at CERN.The current configuration for the centrally
operated RISC-based workstation batch services isgiven in Table 2, and also indicated in Figure 1.
Project GoalsThe goal was to develop an architecture which
could be used for general purpose scientific comput-ing, could be implemented to provide systems withexcellent price/performance when compæed withmainframe solutions, and could be scaled up to pro-vide very large integrated facilities, or down to pro-vide a system suitable for small university depart-ments. The resulting systems should present a fami-liar and unified system image to their users, includ-ing access to many Gigabytes of disk data and toTerabytes of tape data: this is what we imply by theword integrated.
The goals of the SHIFT development were asfollows.
o Provide an INTEGRATED system of CpU,disk and tape servers capable of supporting alarge-scale general-purpose batch service
o Construct the system from heterogeneouscomponents conforming to OPEN standards toretain flexibility towards new technology andproducts
I The system must be SCALABLE, both tosmall sizes for individual collaborations/smallinstitutes, and upwards to at least twice the
current size of the CERN computer centero The batch service quality should be at least as
good as mainframe batch quality, operate in adistributed environment, and have a unifiedpriority scheduling scheme
¡ Provide automatic control of disk file space,integrated with a tape staging service
o Provide support for IBM 3480-compatible car-tridge tapes, Exabyte 8mm tapes, and otherdeveloping tape technologies, with acc€ss toCERN's automatic cartridge-mounting robots
a System operation and accounting to beintegrated into the CERN central computerservices
o The architecture should also be capable ofsupporting interactive scientific applications
SHIFT Architecture and DevelopmentThe SHIFT system has been outlined in earlier
papers 11,2,31. A prime goal of the SHIFT projectwas to build facilities which could scale in capacityfrom relatively small systems up to several timesthat of the combined power of the CERN centralmainframes. To achieve this, an architecture waschosen which encouraged separation of frrnctionality.This allowed modular exrensibility, flexibility, andoptimization of each component foi its specifið func-tion. Figure 2 shows this schematic architecture.
The principal elements of SHIFT are logicallydivided into CPU servers, disk servers and tapeservers, with distributed software which is
cpusefvefs
backr¡ane
Figure 2: SHIFT A¡chitecture
Summer '92 USENIX - June 8-June LZ, L9g2 - San Antonio, TX
servers
167Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 5
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 6
The SHIFT Backplane (1)(This was my responsibility)
Very high performance network backplane needed
Calculations / simulations showed requirements for a100 CERN unit system (1/2 Computer Centre):
- 6 Mbytes/s sustained / 15 MBytes/s peak- Peak server interface speed: 3-5 Mbytes/s- Big problem was network CPU consumption!
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 7
The SHIFT Backplane (2)
=> Found and purchased UltraNet :
• Solved CPU consumption problem for streaming I/O
• Could use a reasonable number of powerful servers
• Took DL to visit the UltraNet company to approve it
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 8
The SHIFT Backplane (3)
As SHIFT grew, we developed a hybrid backplane(using multi-homing) :
- UltraNet- HiPPI : 800 Mb/s- FDDI : 100 Mb/s – (later Fast Ethernet)- Ethernet : 10 Mb/s
• Final iteration used simply Gigabit Ethernet
The CORE System in 1992
Baud,. . .
outside the vault but in active use. A robot with acapacity for 18,000 3480 cartridges handles approxi-mately 20Vo of. the mount requests. Round-the-clockmanual mounts are the responsibility of operationsstaff.
Into this environment, a batch project based onRISC workstations was initiated two years ago.Begínning with a single APOLLO DN10040, theproject has grown substantially and now forms anoperational service which exceeds the total deliver-able CPU capacity of the central mainframes, Theservice is collectively known as the CentrallyOperated RISC Environtnent oÍ CORE, and has threecornponents: SHIFT, CS4 and HOPE.SHIFT The SHIFT system forms the subject of the
present paper. It is a general purpose facilityfor jobs with a broad range of I/O require-ments and which require access to many
Mainframe Services from Gigabit-Networked \ilorkstations
Gigabytes of online data. SHIFT worksta-tions are networked via both Ethernet andUltraNet. The SHIFT CPU and disk serversare currently SGI Power Series 340 worksta-tions and the tape servers are SUN 4/330s.
CSF The Central Simulation Facility or CSF is aplatform for CPU-intensive work with low I/Orequirements. The service runs on 16HP90001720 machines which are networkedvia Ethernet and which have full access to theSHIFT tape service. To the end user, CSFsystems are seen as a single batch facility.
HOPE The HOPE service is an earlier systembased on 3 APOLLO DN10040 machines. Itis for CPU-intensive, low I/O work and it willbe phased out during the course of 1992 asHOPE workload is tàken over by CSF.HOPE is a joint project between Hewlett-
Service cPU (CU) Disk (GB) 3480 Tapes 8mm TapesSHIFT
HOPECSF
100
50150
150
1010
6 manual 2 manual2 robotic
Table 2: CERN - Central RISC Services
shiftgsrl|. lrÞ.a
añd Þbol
AnalysisFacility
SimulationFacilities
16 H.P e00G'720¡
CSF
Figure 1: CERN - Centrally Operated RISC Environment
L66 Summer '92 USENIX - June 8-June 12, L992 - San Antonio, TXApprenticeship: 60 years of computing experience / Ben Segal, October 2019 9
SHIFT Developers1992 Usenix paper authors – (only 12)
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 10
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 11
1991-99 SHIFT Production
• First OPAL production system used SGI Power Series
• Later added Unix nodes by HP, IBM, DEC, Sun
=> All four LEP collaborations adopted SHIFT
• Final iteration used commodity PC’s and Linux
=>Mainframes were all replaced by 1997
Les Robertson in June 2001 accepting the Computerworld Honors Award
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 12
The SHIFT Team in 2001 with the Computerworld Honors Award
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 13
The Grid Idea, 1999
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 14
The Grid: Blueprint for a New Computing InfrastructureIan Foster, Carl Kesselman
Morgan Kaufmann Publishers, 1999 - 677 pages
The grid promises to fundamentally change the way we think about and use
computing. This infrastructure will connect multiple regional and national
computational grids, creating a universal source of pervasive and dependable
computing power that supports dramatically new classes of applications…
The Grid at CERN
• “After the Web, the Grid”… • … and why this is nonsense …• … it should have been …• “After SHIFT, the Grid”…
• The Hype factor – Foster et al. …• Overall was good for CERN (and IT)
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 15
The Grid at CERN(my recollections)
• Globus story• pragmatic choice, but an inverted pyramid
• EDG• EU structure: 22 partners, coordination …?
• “Development” vs “production” pressure• WP2 – Grid Data Management work package• I kept the development line …
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 16
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 17
Middleware : WP 1 - WP 3: wide area
Workload Management WP 1n …
Data management WP 2n Manage and share PetaByte-scale information volumes in high-throughput production-quality grid environments.
n Replication/caching; Metadata mgmt.; Authentication; Query optimization.
n High speed WAN data access; interface to Mass Storage Mgmt. systems.
Application monitoring WP 3n …
The Grid Worldwide
• WLCG to the rescue• Les Robertson again… pragmatism!
• LHC starting in 2005? 2008? 2010 …• … but the Grid was ready
• CERN-IT on the map at last• … and experiments now get computing budgets
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 18
Retirement, 2002
• Becoming an “Honorary Member of the Personnel”:• Invited by DL but said no…• At my farewell drink, given another chance…
• … and I have been around since!
• It’s a privilege and a challenge:• (To do enough but not too much…)
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 19
BOINC at CERN
• Co-started (with F. Grey) the BOINC project“LHC@home”
• CERN’s 50th Anniversary 2004 - computing Challenge?• Telephoned SETI@home (David Anderson) => BOINC
• Simple BOINC for Sixtrack (beam stability simulations)• FORTRAN only, using BOINC library• 2 Masters students for 6 months• Running from 2004 to today: over 200,000 volunteers…• Windows, Linux, Mac supported
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 20
BOINC at CERN• 2005: asked by DL to run “Real Physics”…• MUCH HARDER: needed a full Linux environment …
… but most volunteers were using Windows …=> Virtualisation chosen• 2006-2007 showed feasibility but image too big (n x GB’s)
=> CernVM launched by Predrag Buncic in 2008• 2008-2010: BOINC-VM with PH-SFT and many students …• 2011 production for Theory Dept (5 trillion events, still running)• LHC experiments joined later (ATLAS, CMS, LHCb)• CPU worth many millions of CHF has been volunteered
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 21
Citizen Cyberscience(Outside CERN, with F. Grey et al.)
• Began 2005 with Malaria.net• Collaboration with UniGE and Swiss Tropical Institute
… then AIMS in S. Africa => Africa@home
• Continued in Taiwan, Beijing => Asia@home
• 2009: created CCC with CERN, UniGE, UNOSat=> Citizen Cyberlab: CERN, UniGE, UNITAR
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 22
Profit from hindsight !(recalling some IT technology choices at CERN):
• No Intel CPU’s (only Motorola) - for much of 1980’s• No IBM PC’s (so no Microsoft) - until late 1980’s• No C programming - until mid 1980’s• No TCP/IP outside CERN - until late 1988• No UNIX - until mid 1980’s• No LINUX - until mid 1990’s• No NexT machines (except 1 or 2 ...) - ever• No SGI machines - until early 1990’s• No cisco routers (except 2 ...) - until early 1990’s• No VMs on the LHC Grid - until early 2010’s
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 23
Some Conclusions
• Find a good boss !!• Value your mentors – and be a mentor too• Beware of the “SHEEP EFFECT”• Watch out for “career-based” people
• Enjoy your work – if it’s not fun, complain!
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 24