Overview of HPC Computer Architecture...– Big-compute (performance demand on massively...
Transcript of Overview of HPC Computer Architecture...– Big-compute (performance demand on massively...
9/24/12 NSF-ME-08-2012 1 1
Overview of HPC Computer Architecture:
A Long March Toward Exa-Scale Computing and Beyond \
August 16, 2012
Guang R. Gao
ACM Fellow and IEEE Fellow
Distinguished Professor, Dept. of ECE University of Delaware
9/24/12 NSF-ME-08-2012 2 2
Toward A Codelet Based Execution Model and Its Memory Semantics
-- For Future Extreme-Scale Computing Systems
\
August 16, 2012
Guang R. Gao
ACM Fellow and IEEE Fellow Distinguished Professor, Dept. of ECE
University of Delaware
9/24/12 NSF-ME-08-2012 3
Outline • Background and motivation • Program execution models • Evolution of codelet based execution models
– The EARTH project (1994 – 2004) – IBM Cyclops-64 project (2004 – 2010+ ): The TNT Experience – Intel-led UHPC/Runnemede (2010 – 2012): The codelet concept and
SWARM
• Memory semantics the codelet model • Conclusions and Future Directions
3
K (“KEI”) Computer • "K” draws upon the Japanese word "Kei"
for 1016 • 3 times faster than Chinese Tianhe 1A • 8.162 Pflops Rmax, 8.777 Pflops Rpeak • 80,000 8-core 2GHz SPARC64 VIIIfx to
deliver a total of more than 640,000 processing cores
• 1 PB memory • 4th most energy-efficient system in the
500, with a performance-per-watt rating of 825 megaflops per Watt.
• Tofu : A 6D Mesh/Torus Interconnect NSF-ME-08-2012 4 9/24/12
Tianhe-1A 2.566 Petaflops Rmax
DEPARTMENT OF COMPUTER SCIENCE @ LOUISIANA STATE UNIVERSITY 5
Current Big Themes in Supercomputing
• Multi-core Many-core – Exa-Scale is on horizon
• Heterogeneity and Accelerators • Data-Intensive (big-data) • Others ?
9/24/12 NSF-ME-08-2012 6
Challenges
• Challenges: – Big-compute (performance demand on
massively parallelism) – Big-data (massive, irregular, unstructured data
need big analytics) – Big chips with architecture heterogeneity – Energy efficiency and resiliency
9/24/12 NSF-ME-08-2012 7 7
A Fundamental Challenge - Parallel Program Execution
Models
9/24/12 NSF-ME-08-2012 8
9/24/12 NSF-ME-08-2012 9
Outline • Background and motivation • Program execution models • Evolution of codelet based execution models
– The EARTH project (1994 – 2004) – IBM Cyclops-64 project (2004 – 2010+ ): The TNT Experience – Intel-led UHPC/Runnemede (2010 – 2012): The codelet concept and
SWARM
• Semantics of the codelet model • Conclusions and Future Directions
9nn
A Quiz: Have you heard the following terms ?
Actors (dataflow) ?
9/24/12 NSF-ME-08-2012 10
strand ?
fiber ? codelet ?
What is a Program Execution Model? Application Code Software Packages Program Libraries Compilers Utility Applications
(API) PXM
User Code
Hardware Runtime Code and Libraries Operating System
System
Curtsey: JB Dennis, PEM-2, 4/72011
NSF-ME-08-2012 12
CPU
Memory
Fine-Grain non-preemptive thread- The “hotel” model
Thread Unit
Executor Locus
Coarse-Grain vs. Fine-Grain Multithreading
A Pool Thread
CPU
Memory
Executor Locus
A Single Thread
Coarse-Grain thread- The family home model
Thread Unit
[Gao: invited talk at Fran Allen’s Retirement Workshop, 07/2002]
9/24/12
Execution Model API
Abstract Machine Models
Programming Environment Platforms
Users Users
Exe
cutio
n M
odel
Programming Models
Execution Model and Abstract Machines 9/24/12 NSF-ME-08-2012 13
9/24/12 NSF-ME-08-2012 14
Outline • Background and motivation • Program execution and abstract machine models • Codelet based execution models
– The EARTH project (1994 – 2004) – IBM Cyclops-64 project (2004 – 2010+ ): The TNT Experience – Intel-led UHPC/Runnemede (2010 – 2012): The codelet concept and
SWARM
• Semantics of the codelet model • Conclusions and Future Directions
14nn
Execution Model API
Abstract Machin e Models
Programming Environment Platforms
Users Users
Exe
cutio
n M
odel
Programming Models
Execution Model and Abstract Machines 9/24/12 NSF-ME-08-2012 15
Abstract Machine Models May Be Heterogeneous!
NSF-ME-08-2012 16 9/24/12
High-Level Programming API (MPI, Open MP, CnC, Xio, Chapel, etc.)
Software packages Program libraries Utility applications
Compilers Tools/SDK
API
Abstract Machine
Hardware Architecture
Programming Models/ Environment
Users
Users
Exe
cutio
n M
odel
Runtime System
Runtime System
Execution Model and Abstract Machines NSF-ME-08-2012 17 9/24/12
EARTH Architecture
PE PE PE
EU
SU
Loca
l Mem
ory
Memory Bus
From RQ To EQ
RQ EQ
Inte
rcon
nect
Net
wor
knode
node
node
... ...
9/24/12 NSF-ME-08-2012 18
The EARTH Multithreaded Execution Model (1993 – 200x)
NSF-ME-08-2012 19
fiber within a frame Aync. function invocation
A sync operation Invoke a threaded func
Two Level of Fine-Grain Threads: - threaded procedures - fibers
2 2 1 2
1 2 2 4
Signal Token
Total # signals
Arrived # signals
9/24/12 Fibers 2-level of threads
9/24/12 NSF-ME-08-2012 20
Outline • Background and motivation • Program execution models • Evolution of codelet based execution models
– The EARTH project (1994 – 2004) – IBM Cyclops-64 project (2004 – 2010+ ): The TNT
Experience – Intel-led UHPC/Runnemede (2010 – 2012): The codelet
concept and SWARM
• Semantics of the codelet model • Conclusions and Future Directions
20
9/24/12 NSF-ME-08-2012 21
Outline • Background and motivation • Program execution models • Evolution of codelet based execution models
– The EARTH project (1994 – 2004) – IBM Cyclops-64 project (2004 – 2010+ ): The TNT
Experience – Intel-led UHPC/Runnemede (2010 – 2012): The codelet
concept and SWARM
• Semantics of the codelet model • Conclusions and Future Directions
21
9/24/12 NSF-ME-08-2012 22
Outline • Background and motivation • Program execution models • Evolution of codelet based execution models
– The EARTH project (1994 – 2004) – IBM Cyclops-64 project (2004 – 2010+ ): The TNT
Experience – Intel-led UHPC/Runnemede (2010 – 2012): The codelet
concept and SWARM – DOE X-Stack (2012-2015): Continue the codelet path
• Semantics of Codelet Models • Conclusions and Future Directions
22nn
The Codelet: A Fine-Grain Piece of Computing
Codelet
Result Object
Data Objects
Supports Massively Parallel Computation!
Courtesy: Prof. Jack Dennis, 2001
The Codelet: A Fine-Grain Piece of Computing
Codelet
Result Object
Data Objects
This Looks Like Data Flow!!
Courtesy: Prof. Jack Dennis, 2001
Concept of Codelet (Feb. 4th, 2011)
- Codelets are the principal scheduling quantum under our codelet based execution model. A codelet, once allocated and scheduled, will be kept usefully busy - since it is non-preemptive
- The underline hardware architecture and system software (e.g. compiler, etc.) are optimized to ensure such non-preemption features can be productively utilized.
9/24/12 NSF-ME-08-2012 25
9/24/12 NSF-ME-08-2012 26
Outline • Background and motivation • Program execution models • Codelet based execution models
– The EARTH project (1994 – 2004) – IBM Cyclops-64 project (2004 – 2010+ ): The TNT
Experience – Intel-led UHPC/Runnemede (2010 – 2012): The codelet
concept and SWARM
• Memory semantics of codelet models • Conclusions and Future Directions
What is A Shared Memory Execution Model?
NSF-ME-08-2012 27
Thread Model A set of rules for creating, destroying and managing threads
Memory Model Dictate the ordering of memory operations
Synchronization Model Provide a set of mechanisms to protect from data races
Execution Model
The Thread Abstract Machine 9/24/12
“Memory Coherence” A Basic Assumption of SC-Derived
Memory Models
“…All writes to the same location are serialized in some order and are performed in that order with respect to any processor…”
[Gharacharloo Et Al 90]
9/24/12 NSF-ME-08-2012 28
Can We Break The Memory Coherence Barrier ?
9/24/12 NSF-ME-08-2012 29
No ?
Yes ?
Four Key Question on Memory Models
• What happens when two (or more) concurrent load/store operations happen (arrives) at the same memory location?
• Answers ?
9/24/12 NSF-ME-08-2012 30
A Conjecture
The LC (Location Consistency) memory model belongs to the group of memory models that iss weakest while still do not violate the causality constraint!
9/24/12 NSF-ME-08-2012 31
9/24/12 NSF-ME-08-2012 32
Outline • Background and motivation • Program execution models • Evolution of codelet based execution models
– The EARTH project (1994 – 2004) – IBM Cyclops-64 project (2004 – 2010+ ): The TNT
Experience – Intel-led UHPC/Runnemede (2010 – 2012): The codelet
concept and SWARM
• The memory semantics of codelets • Conclusions and Future Directions
DOE X-Stack Project July 2012 – June 2015
Traleika Glacier
(Team Lead: Intel Universities: UIUC, UD, UCSD, Rice U)
Other Industries (ETI, Reservoir) DOE Labs: (PNNL, Sandia, ORNL, ..)
9/24/12 NSF-ME-08-2012 33
9/24/12 NSF-ME-08-2012 34
Acknowledgements
• Our Sponsors • Members of CAPSL • Members of ETI • Other Collaborators (T. Sterling, V. Sarkar, etc.) • My Mentor - Prof. Jack B. Dennis • My Host