PACT PACT 98 Http://.
-
date post
19-Dec-2015 -
Category
Documents
-
view
238 -
download
0
Transcript of PACT PACT 98 Http://.
PACTPACT
PACT 98
Http://www.research.microsoft.com/barc/gbell/pact.ppt
PACTPACT
Gordon Bell
Microsoft
What Architectures? Compilers?
Run-time environments? Programming models?
… Any Apps?
Parallel Architectures and Compilers TechniquesParis, 14 October 1998
PACTPACT
Talk plan
Where are we today? History… predicting the future
– Ancient– Strategic Computing Initiative and ASCI– Bell Prize since 1987 – Apps & architecture taxonomy
Petaflops: when, … how, how much New ideas: Grid, Globus, Legion Bonus: Input to Thursday panel
PACTPACT
1998: ISVs, buyers, & users? Technical: supers dying; DSM (and SMPs) trying
– Mainline: user & ISV apps ported to PCs & workstations – Supers (legacy code) market lives ... – Vector apps (e.g ISVs) ported to DSM (&SMP)– MPI for custom and a few, leading edge ISVs– Leading edge, one-of-a-kind apps: Clusters of 16, 256, ...1000s built from uni,
SMP, or DSM Commercial: mainframes, SMPs (&DSMs), and clusters are
interchangeable (control is the issue)– Dbase & tp: SMPs compete with mainframes if central control is an
issue else clusters– Data warehousing: may emerge… just a Dbase– High growth, web and stream servers:
Clusters have the advantage
PACTPACT
Xpt connected SMPSXpt-SMPvectorXpt-multithread (Tera)
“multi”Xpt-”multi” hybrid
DSM- SCI (commodity)DSM (high bandwidth_
Commodity “multis” & switchesProprietary “multis”& switchesProprietary DSMs
SMP
Multicomputers akaClusters … MPP16-(64)- 10K processors
mainline
mainline
c2000 Architecture Taxonomy
PACTPACT
TOP500 Technical Systems by Vendor (sans PC and mainframe clusters)
CRI
SGI
IBM
Convex
HP
SunTMC
IntelDEC
JapaneseOther
0
100
200
300
400
500Ju
n-9
3
No
v-93
Jun
-94
No
v-94
Jun
-95
No
v-95
Jun
-96
No
v-96
Jun
-97
No
v-97
Jun
-98
PACTPACT
Parallelism of Jobs On NCSA Origin Cluster
by # of Jobs by CPU Delivered
# CPUs
40%
5%16%
21%
8%
6% 3%1% 7%2%
9%
19%
18%
17%
19%
9%
123-45-89-1617-3233-6465-128
20 Weeks of Data, March 16 - Aug 2, 199815,028 Jobs / 883,777 CPU-Hrs
PACTPACT
How are users using the Origin Array?
1 23-
45-
89-
1617
-32
33-6
465
-128
0-6464-128128-256256-384384-512512+
020,00040,00060,00080,000
100,000
120,000
CPU Hrs Delivered
# CPUs
Mem/CPU (MB)
PACTPACT
National Academic Community Large Project Requests September 1998
Source: National Resource Allocation Committee
Over 5 Million NUs Requested
One NU = One XMP Processor-Hour
Vector
MPPDSM
PACTPACT
GB's Estimate of Parallelism in Engineering & Scientific Applications
granularity & degree of coupling (comp./comm.)
scalar60%
vector15%
Vector& //5%
One-of>>// 5%
Embarrassingly & perfectly parallel
15%
log
(#
app
s) new orscaled-up apps
dusty decksfor supers
SupersPCsWSs Clusters aka MPPs
aka multicomputers
----scalable multiprocessors-----
Gordon’s WAG
PACTPACT
General purpose, non-parallelizable codes(PCs have it!)
VectorizableVectorizable & //able(Supers & small DSMs)Hand tuned, one-ofMPP course grainMPP embarrassingly //(Clusters of PCs...)
DatabaseDatabase/TPWeb HostStream Audio/Video
Technical
Commercial
Application Taxonomy
If central control & rich then IBM or large SMPselse PC Clusters
PACTPACT
One procerssor perf. as % of Linpack
0
200
400
600
800
1000
1200
1400
1600
1800
T90 C90 SPP-2000
SP2-160
Origin195
PCA
Linpack
Apps. Ave.
22%
14% 19%33% 26%
CFDBiomolec.ChemistryMaterials
QCD
25%
PACTPACT
10 Processor Linpack (Gflops); 10 P appsx10; Apps % 1 P Linpack; Apps %10 P Linpack
0
5
10
15
20
25
30
35
T90 C90 SPP SP2/160 Origin195
PCA
Gordon’s WAG
PACTPACT
Ancient history
PACTPACT
Growth in Computational Resources Used for UK Weather Forecasting
•1950
•2000
10T •
1T •
100G •
10G •
1G •
100M •
10M •
1M •
100K •
10K •
1K •
100 •
10 •
LeoMercury
KDF9
195
205YMP
1010/ 50 yrs = 1.5850
PACTPACT
Harvard Mark I aka IBM ASCC
PACTPACT
I think there is a world I think there is a world
market for maybe five market for maybe five
computers.computers.
““ ””
Thomas Watson Senior, Chairman of IBM, 1943
PACTPACT
The scientific market is still about that size… 3 computers
When scientific processing was 100% of the industry a good predictor
$3 Billion: 6 vendors, 7 architectures DOE buys 3 very big ($100-$200 M)
machines every 3-4 years
PACTPACT
NCSA Cluster of 6 x 128 processors SGI Origin
PACTPACT
Intel/Sandia: 9000x1 node Ppro
LLNL/IBM: 512x8 PowerPC (SP2)
LNL/Cray: ?
Maui Supercomputer Center– 512x1 SP2
Our Tax Dollars At WorkASCI for Stockpile Stewardship
PACTPACT
“LARC doesn’t need 30,000 words!” --Von Neumann, 1955.
“During the review, someone said: “von Neumann was right. 30,000 word was too much IF all the users were as skilled as von Neumann ... for ordinary people, 30,000 was barely enough!” -- Edward Teller, 1995
The memory was approved. Memory solves many problems!
PACTPACT
““ ””
Parallel processing Parallel processing computer architectures computer architectures will be in use by 1975. will be in use by 1975.
Navy Delphi Panel1969
PACTPACT
““
””
In Dec. 1995 computers In Dec. 1995 computers with 1,000 processors with 1,000 processors will do most of the will do most of the scientific processing. scientific processing.
Danny Hillis 1990 (1 paper or 1 company)
PACTPACT
The Bell-Hillis BetMassive Parallelism in 1995TMC
World-wide
Supers
TMC
World-wide Supers
TMC
World-wideSupers
ApplicationsRevenue
Petaflops / mo.
PACTPACT
Bell-Hillis Bet: wasn’t paid off!
My goal was not necessarily to just win the bet!
Hennessey and Patterson were to evaluate what was really happening…
Wanted to understand degree of MPP progress and programmability
PACTPACT
““
””
DARPA, 1985 Strategic Computing Initiative (SCI)
A 50 X LISP machineA 50 X LISP machine
Tom Knight, Symbolics
A 1,000 node multiprocessorA 1,000 node multiprocessor
A Teraflops by 1995A Teraflops by 1995
Gordon Bell, Encore
””
““
All of ~20 HPCC projects failed!
““ ””
PACTPACT
SCI (c1980s): Strategic Computing Initiative funded
ATT/Columbia (Non Von), BBN Labs, Bell Labs/Columbia (DADO), CMU Warp (GE & Honeywell), CMU (Production Systems), Encore, ESL, GE (like connection machine), Georgia Tech, Hughes (dataflow), IBM (RP3), MIT/Harris, MIT/Motorola (Dataflow), MIT Lincoln Labs, Princeton (MMMP), Schlumberger (FAIM-1), SDC/Burroughs, SRI (Eazyflow), University of Texas, Thinking Machines (Connection Machine),
PACTPACT
Those who gave up their lives in SCI’s search for parallellism
Alliant, American Supercomputer, Ametek, AMT, Astronautics, BBN Supercomputer, Biin, CDC (independent of ETA), Cogent, Culler, Cydrome, Dennelcor, Elexsi, ETA, Evans & Sutherland Supercomputers, Flexible, Floating Point Systems, Gould/SEL, IPM, Key, Multiflow, Myrias, Pixar, Prisma, SAXPY, SCS, Supertek (part of Cray), Suprenum (German National effort), Stardent (Ardent+Stellar), Supercomputer Systems Inc., Synapse, Vitec, Vitesse, Wavetracer.
PACTPACT
Worlton: "Bandwagon Effect"explains massive parallelismBandwagon: A propaganda device by which
the purported acceptance of an idea ...is claimed in order to win further public acceptance.
Pullers: vendors, CS community Pushers: funding bureaucrats & deficit Riders: innovators and early adopters4 flat tires:
training, system software, applications, and "guideposts"
Spectators: most users, 3rd party ISVs
PACTPACT
Parallel processing is a constant distance away.
Our vision ... is a system of millions of hosts… in a loose confederation.
Users will have the illusion of a very powerful desktop computer through which they can manipulate objects.
Grimshaw, Wulf, et al “Legion” CACM Jan. 1997
““ ”” ““
””
PACTPACT
Progress
"Parallelism is a journey.*"
*Paul Borrill
PACTPACT
Let us not forget:
“The purpose of computing is insight, not numbers.”
R. W. Hamming
PACTPACT
Progress 1987-1998
PACTPACT
Bell Prize Peak Gflops vs time
0.1
1
10
100
1000
1986 1988 1990 1992 1994 1996 1998 2000
PACTPACT
Bell Prize: 1000x 1987-1998 1987 Ncube 1,000 computers:
showed with more memory, apps scaled 1987 Cray XMP 4 proc. @200 Mflops/proc 1996 Intel 9,000 proc. @200 Mflops/proc
1998 600 RAP Gflops Bell prize Parallelism gains
– 10x in parallelism over Ncube– 2000x in parallelism over XMP
Spend 2- 4x more Cost effect.: 5x; ECL CMOS; Sram Dram Moore’s Law =100x Clock: 2-10x; CMOS-ECL speed cross-over
PACTPACT
No more 1000X/decade.We are now (hopefully) only limited by Moore’s Law and not limited by memory access.
1 GF to 10 GF took 2 years10 GF to 100 GFtook 3 years100 GFto 1 TF took >5 years2n+1 or 2^(n-1)+1?
PACTPACT
Commercial Perf/$$/tpmC vs time
$10
$100
$1,000
Mar-94 Sep-94 Apr-95 Oct-95 May-96 Dec-96 Jun-97date
$/tp
mC
250 %/year improvement!
PACTPACT
tpmC vs time
100
1,000
10,000
100,000
Mar-94 Sep-94 Apr-95 Oct-95 May-96 Dec-96 Jun-97date
tpm
C
250 %/year improvement!
Commercal Perf.
PACTPACT
1998 Observations vs1989 Predictions for technical
Got a TFlops PAP 12/1996 vs 1995. Really impressive progress! (RAP<1 TF)
More diversity… results in NO software! – Predicted: SIMD, mC, hoped for scalable SMP– Got: Supers, mCv, mC, SMP, SMP/DSM,
SIMD disappeared $3B (un-profitable?) industry; 10 platforms PCs and workstations diverted users MPP apps DID NOT materialize
PACTPACT
Observation: CMOS supers replaced ECL in Japan 2.2 Gflops vector units have dual use
– In traditional mPv supers– as basis for computers in mC
Software apps are present Vector processor out-performs n micros
for many scientific apps It’s memory bandwidth, cache
prediction, and inter-communication
PACTPACT
Observation: price & performance Breaking $30M barrier increases PAP Eliminating “state computers” increased prices, but got
fewer, more committed suppliers, less variation, and more focus
Commodity micros aka Intel are critical to improvement. DEC, IBM, and SUN are ??
Conjecture: supers and MPPs may be equally cost-effective despite PAP – Memory bandwidth determines performance & price– “You get what you pay for ” aka
“there’s no free lunch”
PACTPACT
Observation: MPPs 1, Users <1 MPPs with relatively low speed micros with lower memory
bandwidth, ran over supers, but didn’t kill ‘em.
Did the U.S. industry enter an abyss?- Is crying “Unfair trade” hypocritical?- Are users denied tools?- Are users not “getting with the program”
Challenge we must learn to program clusters...- Cache idiosyncrasies- Limited memory bandwidth- Long Inter-communication delays- Very large numbers of computers
PACTPACT
Strong recommendation: Utilize in situ workstations!
NoW (Berkeley) set sort record, decrypting Grid, Globus, Condor and other projects Need “standard” interface and programming
model for clusters using “commodity” platforms & fast switches
Giga- and tera-bit links and switches allow geo-distributed systems
Each PC in a computational environment should have an additional 1GB/9GB!
PACTPACT
““ ”” Petaflops by 2010Petaflops by 2010
DOEAccelerated Strategic
Computing Initiative (ASCI)
PACTPACT
DOE’s 1997 “PathForward” Accelerated Strategic Computing Initiative (ASCI)
1997 1-2 Tflops: $100M 1999-2001 10-30 Tflops $200M?? 2004 100 Tflops 2010 Petaflops
PACTPACT
““
”” When is a Petaflops When is a Petaflops possible? What price? possible? What price?
Moore’s Law 100xBut how fast can the clock tick?
Increase parallelism 10K>100K 10x Spend more ($100M $500M) 5x Centralize center or fast network 3x Commoditization (competition) 3x
Gordon Bell, ACM 1997
PACTPACT
Micros gains if 20, 40, & 60% / year
1.E+21
1.E+18
1.E+15
1.E+12
1.E +9
1.E+61995 2005 2015 2025 2035 2045
20%= 20%= TeraopsTeraops
40%= 40%= PetaopsPetaops
60%= 60%= ExaopsExaops
PACTPACT
Processor Limit: DRAM GapµProc60%/yr..
DRAM7%/yr..
1
10
100
10001980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
DRAM
CPU
1982
Processor-MemoryPerformance Gap:(grows 50% / year)
Per
form
ance
• Alpha 21264 full cache miss / instructions executed: 180 ns/1.7 ns =108 clks x 4 or 432 instructions• Caches in Pentium Pro: 64% area, 88% transistors
*Taken from Patterson-Keeton Talk to SigMod
“Moore’s Law”
PACTPACT
Five ScalabilitiesSize scalable -- designed from a few components,
with no bottlenecks
Generation scaling -- no rewrite/recompile is required across generations of computers
Reliability scaling
Geographic scaling -- compute anywhere (e.g. multiple sites or in situ workstation sites)
Problem x machine scalability -- ability of an algorithm or program to exist at a range of sizes that run efficiently on a given, scalable computer.
Problem x machine space => run time: problem scale, machine scale (#p), run time, implies speedup and efficiency,
PACTPACT
The Law of Massive Parallelism (mine) is based on application scaling
There exists a problem that can be made sufficiently large such that any network of computers can run efficiently given enough memory, searching, & work -- but this problem may be unrelated to no other.
A ... any parallel problem can be scaled to run efficiently on an arbitrary network of computers, given enough memory and time… but it may be completely impractical
Challenge to theoreticians and tool builders:How well will or will an algorithm run?
Challenge for software and programmers: Can package be scalable & portable? Are there models?
Challenge to users: Do larger scale, faster, longer run times, increase problem insight and not just total flop or flops?
Challenge to funders: Is the cost justified?
Gordon’s WAG
PACTPACT
Manyflops for Manybucks: what are the goals of spending?
Getting the most flops, independent of how much taxpayers give to spend on computers?
Building or owning large machines? Doing a job (stockpile stewardship)? Understanding and publishing about
parallelism? Making parallelism accessible? Forcing other labs to follow?
PACTPACT
Petaflops Alternatives c2007-14 from 1994 DOE Workshop
SMP Cluster Active Mem Grid
400 Proc.;1 Tflops
4-40 K Proc.;10-100 Gflops
400 K Proc.;1Gflops
400 TB SRAM250K chips
400 TB DRAM60K-100K chips
0.8 TB embed.4K chips
1 ps/result…multi-threading100 10 Gflopsthread is likely
10-100 ps/resultcache heirarchy
No definition of storage, network, orprogramming model
PACTPACT
Or more parallelism… and use installed machines
10,000 nodes in 1998 or 10x Increase Assume 100K nodes 10 Gflops/10GBy/100GB nodes
or low end c2010 PCs Communication is first problem… use the
network Programming is still the major barrier Will any problems fit it
PACTPACT
Next, short steps
PACTPACT
192 HP 300 MHz
64 Compaq 333 MHz
• Andrew Chien, CS UIUC-->UCSD • Rob Pennington, NCSA• Myrinet Network, HPVM, Fast Msgs• Microsoft NT OS, MPI API
“Supercomputer performance at mail-order prices”-- Jim Gray, Microsoft
The Alliance LES NT Supercluster
PACTPACT
0
1
2
3
4
5
6
7
0
10
20
30
40
50
60
Processors
Gig
afl
op
s
Origin-DSM
Origin-MPI
NT-MPI
SP2-MPI
T3E-MPI
SPP2000-DSM
2D Navier-Stokes Kernel - PerformancePreconditioned Conjugate Gradient Method With
Multi-level Additive Schwarz Richardson Pre-conditioner
Danesh Tafti, Rob Pennington, NCSA; Andrew Chien (UIUC, UCSD)
Sustaining 7 GF on 128 Proc.
NT Cluster
PACTPACT
The Grid:Blueprint for a New Computing InfrastructureIan Foster, Carl Kesselman (Eds), Morgan Kaufmann, 1999
Published July 1998; ISBN 1-55860-475-8 22 chapters by expert authors
including: – Andrew Chien, – Jack Dongarra, – Tom DeFanti, – Andrew Grimshaw, – Roch Guerin, – Ken Kennedy, – Paul Messina, – Cliff Neuman, – Jon Postel, – Larry Smarr, – Rick Stevens, – Charlie Catlett– John Toole– and many others
http://www.mkp.com/grids
“A source book for the historyof the future” -- Vint Cerf
PACTPACT
The Grid“Dependable, consistent,
pervasive access to
[high-end] resources” Dependable: Can provide
performance and functionality guarantees
Consistent: Uniform interfaces to a wide variety of resources
Pervasive: Ability to “plug in” from anywhere
PACTPACT
Alliance Grid Technology Roadmap: It’s just not flops or records/se
User InterfaceUser Interface
TangoWebflowHabanero
WorkbenchesNetMeeting
H.320/323
RealNetworks
MiddlewareMiddlewareGlobus
LDAPQoS
Java
vBNSAbilene
ActiveX
MREN
Clusters
ComputeCompute
Condor JavaGrandeHPVM/FM
Symera (DCOM)
DSMHPF
MPI OpenMP
Clusters DataData
ODBC
Emerge (Z39.50)
SRB HDF-5
SANssvPablo DMFXML
Virtual Director
CAVERNsoft
Java3D
SCIRun
VisualizationVisualization
Cave5D
VRML
PACTPACT
Globus Approach Focus on architecture issues
– Propose set of core services as basic infrastructure
– Use to construct high-level, domain-specific solutions
Design principles– Keep participation cost low– Enable local control– Support for adaptation
Core Globusservices
Local OS
A p p l i c a t i o n s
Diverse global svcs
PACTPACT
Globus Toolkit: Core Services
Scheduling (Globus Resource Alloc. Manager)– Low-level scheduler API
Information (Metacomputing Directory Service) – Uniform access to structure/state information
Communications (Nexus)– Multimethod communication + QoS
management Security (Globus Security Infrastructure)
– Single sign-on, key management Health and status (Heartbeat monitor) Remote file access (Global Access to Secondary
Storage)
PACTPACT
Summary of some beliefs 1000x increase in PAP has not been accompanied
with RAP, insight, infrastructure, and use. What was the PACT/$? “The PC World Challenge” is to provide
commodity, clustered parallelism to commercial and technical communities
Only comes true of ISVs believe and act Grid etc. using world-wide resources, including in
situ PCs is the new idea
PACTPACT
PACT 98
Http://www.research.microsoft.com/barc/gbell/pact.ppt
PACTPACT
The end