Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes,...
Transcript of Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes,...
![Page 1: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/1.jpg)
Fuat Keceli, AlexandrosTzannes,George C. Caragea, Rajeev Barua and Uzi Vishkin
University of Maryland, College Park, MD
Toolchain For Programming, Simulating and
Studying the XMT Many-Core Architecture
![Page 2: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/2.jpg)
5/20/2011HIPS20112
XMT: eXplicit Multi-Threading (Not Cray XMT)
Available at:http://www.umiacs.umd.edu/users/vishkin/XMT/index.shtml#sw-release
Or search XMTC on SourceForge
![Page 3: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/3.jpg)
Overview
5/20/2011HIPS2011
XMT framework and architecture
XMTSim
The cycle-accurate simulator of the architecture
XMT Compiler
Optimizing compiler for explicit parallelism
Importance of the toolchain
Summary
3
![Page 4: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/4.jpg)
Explicit Multi-Threading (XMT)
General Purpose Many-Core Computer
Framework and Architecture
5/20/2011HIPS20114
![Page 5: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/5.jpg)
Searching for parallel platforms
which provide …
5/20/2011HIPS2011
Ease-of-programming
Competitive performance for both high and low degrees of
parallelism
Manageable power
5
![Page 6: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/6.jpg)
Explicit Multi-Threading “Explicitly” (definition)
Parallelism is explicitly exposed by the programmer
HP 4th ed. “If you want your program to run significantly faster,
you're going to have to parallelize it”
5/20/2011HIPS20116
![Page 7: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/7.jpg)
XMTC Programming
5/20/2011HIPS20117
Execution Modes:
Serial and parallel
![Page 8: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/8.jpg)
XMTC Programming
5/20/2011HIPS20118
start
serial code
…
spawn(# threads) {
parallel SPMD code*
}
serial code
…
halt
* Not MPI type SPMD.
![Page 9: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/9.jpg)
XMT Architecture
5/20/2011HIPS20119
![Page 10: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/10.jpg)
XMT Architecture
5/20/2011HIPS201110
Parallel sections
Serial Sections
![Page 11: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/11.jpg)
XMT Architecture
5/20/2011HIPS201111
![Page 12: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/12.jpg)
Explicit Multi-Threading “Explicitly” (definition)
Parallelism is explicitly exposed by the programmer
HP 4th ed. “If you want your program to run significantly faster,
you're going to have to parallelize it”
Why focus on Explicitness
Automatic parallelization: limited scale & generality
Irregular parallelism
Serial coding sometimes hides parallelism (BFS).
5/20/2011HIPS201112
![Page 13: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/13.jpg)
A Holistic ApproachVertical Stack of Programming
Programmer’s workflow
Execution Abstraction [CACM11]
Algorithmic theory – PRAM
Program (software) – XMTC
Compiler/Run-Time (middle-ware)
Hardware/Simulator
5/20/2011HIPS201113
![Page 14: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/14.jpg)
Why is the toolchain important?
5/20/2011HIPS2011
“A perfection of means, and confusion of aims, seems to be our
main problem.”
Albert Einstein
14
![Page 15: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/15.jpg)
XMTSim
The Cycle-Accurate Simulator of the XMT Architecture
• Overview and modeled components
• Software structure
• Discrete event vs. discrete time simulation
• Other features
5/20/2011HIPS201115
![Page 16: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/16.jpg)
Highlights
5/20/2011HIPS201116Image from wikipedia.
![Page 17: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/17.jpg)
Highlights Highly configurable: Number of units, clock frequencies, …
5/20/2011HIPS201117
![Page 18: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/18.jpg)
Highlights
5/20/2011HIPS2011
Highly configurable: Number of units, clock frequencies, …
Relatively easy to replace large components (i.e. ICN)
18
![Page 19: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/19.jpg)
Highlights Highly configurable: Number of units, clock frequencies, …
Relatively easy to replace large components (i.e. ICN)
Cycle-accuracy validated against the FPGA prototype
5/20/2011HIPS201119
![Page 20: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/20.jpg)
Highlights Highly configurable: Number of units, clock frequencies, …
Relatively easy to replace large components (i.e. ICN)
Cycle-accuracy validated against the FPGA prototype
Custom mechanisms for statistics collection, debugging,
runtime modifications
5/20/2011HIPS201120
![Page 21: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/21.jpg)
Implementation details Written in Java (object oriented)
~28K lines of code + ~14K lines of inline documentation
Simulation speed (Intel Xeon server @ 3GHz)
10K – 2M instructions/second
Order of 1K cycles/second (1024-core conf.)
5/20/2011HIPS201121
![Page 22: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/22.jpg)
Software Structure
5/20/2011HIPS201122
![Page 23: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/23.jpg)
Software Structure
5/20/2011HIPS201123
![Page 24: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/24.jpg)
Software Structure
5/20/2011HIPS201124
![Page 25: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/25.jpg)
Software Structure
5/20/2011HIPS201125
![Page 26: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/26.jpg)
Software Structure
5/20/2011HIPS201126
![Page 27: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/27.jpg)
Software Structure
5/20/2011HIPS201127
![Page 28: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/28.jpg)
Software Structure
5/20/2011HIPS201128
![Page 29: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/29.jpg)
Simulation Strategy
Discrete Time vs. Discrete Event Simulation
5/20/2011HIPS201129
![Page 30: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/30.jpg)
Discrete Time SimulationMain loop:
while(still running) {
Set clock to next cycle
Execute simulation code for one cycle
}
5/20/2011HIPS201130
![Page 31: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/31.jpg)
Discrete Event Simulation
Event List
Actor Actor
ScheduleWake-up
Schedule
Wake-up
Iterate
5/20/2011HIPS201131
while(still running) {
Set clock to next event time
Remove event from queue
Notify the scheduling actor
}
![Page 32: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/32.jpg)
DE vs. DT Simulation
Discrete Time Simulation Discrete Event Simulation
•More compact code for smaller
simulations
•Efficient if a lot of work done for
every simulated cycle
•Naturally suitable for an object-
oriented structure
•More flexible in quantization of
simulated time
•Can simulate asynchronous logic
5/20/2011HIPS201132
![Page 33: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/33.jpg)
DE vs. DT Simulation
Discrete Time Simulation Discrete Event Simulation
•Efficient if a lot of work done for
every simulated cycle
•More compact code for smaller
simulations
•Naturally suitable for an object-
oriented structure
•More flexible in quantization of
simulated time
•Can simulate asynchronous logic
•Requires complex case analysis for
a large simulator.
•Slow if not all components do
work every clock cycle
•Event list operations are expensive
•Might require more work for
emulating one clock cycle
5/20/2011HIPS201133
![Page 34: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/34.jpg)
How do I simulate this?
5/20/2011HIPS201134
![Page 35: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/35.jpg)
Best of both worlds Discrete event for a modular structure
Event List
Cluster
ScheduleWake-up
ScheduleWake-up
Iterate
5/20/2011HIPS201135
ICN …
![Page 36: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/36.jpg)
Best of both worlds Heavy-weight action code for every actor (similar to the code
for one iteration of DT simulation)
Event List
Cluster
ScheduleWake-up
ScheduleWake-up
Iterate
5/20/2011HIPS201136
ICN …
![Page 37: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/37.jpg)
In the paper/tech. report
5/20/2011HIPS2011
Implementation details
Handling the communication between actors
Optimizing the event list
Other features
Visualizing data on floorplan
Default plug-ins
Execution check-points
37
![Page 38: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/38.jpg)
XMT Compiler
Optimizing compiler for explicit parallelism
• Serial GCC for parallel code
• Latency Hiding
• XMT Runtime
• Software structure of the compiler
5/20/2011HIPS201138
![Page 39: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/39.jpg)
Designing a compiler from
scratch is hard…
5/20/2011HIPS201139
![Page 40: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/40.jpg)
How hire 1000s of developers
for free?
5/20/2011HIPS201140
![Page 41: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/41.jpg)
Use GCC!
5/20/2011HIPS201141
![Page 42: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/42.jpg)
Serial GCC for parallel code?• Illegal dataflow
• Reasons
• Concurrent semantics
• Control transfer between serial core and parallel cores
5/20/2011HIPS201142
![Page 43: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/43.jpg)
Example: Illegal Code MotionXMTC Code
int A[N]=...;
bool found=false;
spawn (0, N-1) {
if(A[$]!=0)
found=true;
}
if (found) counter+=1;
5/20/2011HIPS201143
![Page 44: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/44.jpg)
Example: Illegal Code Motion
int A[N]=...;
bool found=false;
asm (spawn 0, N-1)
if(A[$]!=0)
found=true;
asm (join)
if (found) counter+=1;
5/20/2011HIPS201144
![Page 45: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/45.jpg)
Example: Illegal Code MotionCompiler doesn’t know the spawn construct!
int A[N]=...;
bool found=false;
asm (spawn 0, N-1)
if(A[$]!=0)
found=true;
asm (join)
if (found) counter+=1;
5/20/2011HIPS201145
![Page 46: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/46.jpg)
Example: Illegal Code MotionCompiler doesn’t know the spawn construct!
int A[N]=...;
bool found=false;
asm (spawn 0, N-1)
if(A[$]!=0)
found=true;
if (found) counter+=1;
asm (join)
5/20/2011HIPS201146
![Page 47: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/47.jpg)
A Partial Solution: Outlining
5/20/2011HIPS201147
The compiler's view
int A[N]=...;
bool found=false;
asm (spawn 0, N-1)
if(A[$]!=0)
found=true;
asm (join)
if (found) counter+=1;
After Outlining
int A[N]=...;
bool found=false;
outlined(A,&found);
if(found) counter+=1;
outlined(int (*A), bool *found) {
asm (spawn 0, N-1)
if (A[$]!=0)
(*found) = true;
asm (join)
}
![Page 48: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/48.jpg)
Other Issues
5/20/2011HIPS2011
Register Liveness Across MTCU TCU transfers of control
Broadcast live MTCU registers
Assembly code layout escaping spawn/join XMT statements
Analysis for relocating offending basic blocks
Details in the paper!
48
![Page 49: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/49.jpg)
On-chip Memory Architecture
5/20/2011HIPS201149
![Page 50: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/50.jpg)
On-chip Memory Architecture Multiple shared cache memory modules
No Coherent Private Cache
→ Reduced power and ICN bandwidth traffic (+)
→ Increased latency to d-Cache (-)
But Hide Latency (~30 cycles to shared caches) with …
Parallelism
Compiler optimizations
5/20/2011HIPS201150
![Page 51: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/51.jpg)
Latency Hiding
5/20/2011HIPS2011
Non-Blocking Stores
Broadcasting of Code and Data
Software Prefetching
Planned for next release:
Read Only Caches (at Cluster Level)
Scratch Pads
51
![Page 52: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/52.jpg)
Latency Hiding: Prefetching
5/20/2011HIPS2011
Loop Prefetching
Unique resource aware prefetch algorithm
Motivated by scarcity of TCU resources
4 Prefetch buffer locations per TCU
Existing algorithms do not take resources into account and
assume that after prefetching all reads are cache hits
52
![Page 53: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/53.jpg)
Software Structure of Compiler
• CIL• Outlining
• …
• GCC• Broadcasting
• Cactus-Stack Allocation
• Prefetching
• …
• SableCC• Basic Block Layout
• Linking
• …
5/20/2011HIPS201153
source
code
.sim .b
CIL Pre-pass
Source-to-source
GCC Core Pass
Source to Assembly
SableCC Post-pass
Assembly-to-Binary
![Page 54: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/54.jpg)
Why is the toolchain important?
5/20/2011HIPS2011
“A perfection of means, and confusion of aims, seems to be our
main problem.”
Albert Einstein
54
![Page 55: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/55.jpg)
Current Implementation & Vision
5/20/2011HIPS201155
64-core FPGA prototype
available
1024-cores on-chip feasible
with today’s technology
[HotPar2010]
Developing the specifications
via the toolchain!
Figure from CACM11
![Page 56: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/56.jpg)
How to advance towards the vision?
5/20/2011HIPS2011
Architecture/compiler research
XMT vs. GPU comparison [HotPar2010]
Scheduler for nested parallelism [PPoPP2010]
Software prefetching [IJPP2010]
56
![Page 57: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/57.jpg)
5/20/2011HIPS2011
Architecture/compiler research
Comparison with GPUs [HotPar2010]
Scheduler for nested parallelism [PPoPP2010]
Software prefetching [IJPP2010]
Algorithms research
Max-Flow [SPAA11]
Bi-Connectivity [DIMACS2011]
57
How to advance towards the vision?
![Page 58: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/58.jpg)
5/20/2011HIPS2011
Architecture/compiler research
Comparison with GPUs [HotPar2010]
Scheduler for nested parallelism [PPoPP2010]
Software prefetching [IJPP2010]
Algorithms research
Max-Flow [SPAA11]
Bi-Connectivity [DIMACS2011]
Teaching
Undergrad. and grad. classes – UMD/UIUC [EduPar2011]
High school/middle school [SIGCSE2010]
58
How to advance towards the vision?
![Page 59: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/59.jpg)
Under Release Testing
5/20/2011HIPS2011
Compiler
Parallel function calls
Scheduling for nested parallelism (lazy binary splitting)
Simulator
Power/temperature models included (HotSpot from UVA)
59
![Page 60: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/60.jpg)
Under Development
5/20/2011HIPS2011
Compiler
More latency hiding mechanisms
(read only caches and scratch pads)
Simulator
Increasing simulator speed:
Parallel simulation, phase sampling and fast forwarding
Simulation of asynchronous interconnect in [TCAD2010]
Operating system
60
![Page 61: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/61.jpg)
Summary
5/20/2011HIPS2011
Explicit Multi-Threading
From abstraction/algorithms to hardware
XMTSim
Highly configurable
Can simulate GALS, asynchronous logic, dynamic management
XMT Compiler
How to use serial GCC for a parallel language?
Optimizations: Latency Hiding, efficient scheduling
Impact
Architecture/compiler/algorithms research, teaching
61
![Page 62: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/62.jpg)
Past Contributors
5/20/2011HIPS2011
Yoni Ashar, Thomas Dubois, Bryant Lee (UMD Students)
Tali Moreshet, Katherine Bertaut (Swarthmore College)
Genette Gill (Columbia University)
62
![Page 63: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/63.jpg)
References
5/20/2011HIPS2011
[CACM2011] U. Vishkin. Using simple abstraction to reinvent computing for parallelism. CACM 54,1,p. 75-85, Jan. 2011
[SPAA2011] G.C. Caragea and U. Vishkin. Better speedups for parallel max-flow. SPAA, 2011.
[EduPar2011] D. Padua, U. Vishkin and J. Carver. Joint UIUC/UMD parallel algorithms/programming course. EduPar-11, in IPDPS, 2011
[SIGCSE2010] S. Torbert, U. Vishkin, R. Tzur and D. Ellison. Is teaching parallel algorithmic thinking to high-school student possible? One teacher's experience. SIGCSE, 2010
[IJPP2010] C.G. Caragea, A. Tzannes, F. Keceli, R. Barua and U. Vishkin. Resource-aware compiler prefetching for many-cores. Intern. J. of Parallel Programming, 2010
[HotPar2010] C.G. Caragea, F. Keceli, A. Tzannes and U. Vishkin. General-purpose vs. GPU: Comparison of many-cores on irregular workloads. HotPar, 2010
[PPoPP2010] A. Tzannes, G.C. Caragea, R. Barua and U. Vishkin. Lazy binary splitting: A run-time adaptive dynamic work-stealing scheduler. PPoPP, 2010.
[DIMACS 2011] From asymptotic PRAM speedups to easy-to-obtain concrete XMT ones, invited talk, DIMACS Workshop on Parallelism: A 20-20 Vision, 2011
[TCAD2010] M.N. Horak, S.M. Nowick, M. Carlberg and U. Vishkin. A low-overhead asynchronous interconnection network for GALS chip miltiprocessor. TCAD p. 494-507, special Issue for NOCS, Apr. 2010
63
![Page 64: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/64.jpg)
Backup Slides
5/20/2011HIPS201164
![Page 65: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/65.jpg)
Speedups
5/20/2011HIPS2011
XMT vs. GTX280 GPU 6x speedup on irregular par.
MaxFlow >100x over serial
Bi-Connectivity Up to 33x over serial
65
![Page 66: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/66.jpg)
Immediate Concurrent Execution
5/20/2011HIPS201166
Figure from CACM11
![Page 67: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/67.jpg)
How do actors communicate? Problem: Order of concurrent events is not deterministic
Ports
Concept similar to HDLs/SystemC
Event priority mechanism for ports
Two phases/priorities: Evaluate and update
How to handle halt signals?
Macro Actor contract: No combinatorial path between inputs
and outputs
5/20/2011HIPS201167
![Page 68: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/68.jpg)
DE Simulation Example
Event List
Actor Actor
Wake me up at T+1 Wake me up at T+2
5/20/2011HIPS201168
Time = T
![Page 69: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/69.jpg)
DE Simulation Example
Event List
Actor Actor
Wake-up!
5/20/2011HIPS201169
Time = T+1
![Page 70: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/70.jpg)
DE Simulation Example
Event List
Actor Actor
Wake-up!
5/20/2011HIPS201170
Time = T+2
![Page 71: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/71.jpg)
DT Example: Pipeline simulation
Initial:
5/20/2011HIPS201171
How to simulate advancing the
pipeline for a single clock cycle?
![Page 72: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/72.jpg)
DT Example: Pipeline simulation
Initial:
5/20/2011HIPS201172
![Page 73: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/73.jpg)
DT Example: Pipeline simulation
Initial:
5/20/2011HIPS201173
![Page 74: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/74.jpg)
DT Example: Pipeline simulation
Initial:
5/20/2011HIPS201174
After 1 simulated
cycle:
![Page 75: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/75.jpg)
DE Example: Pipeline simulation
Initial:
5/20/2011HIPS201175
How to simulate advancing the
pipeline for a single clock cycle?
![Page 76: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/76.jpg)
DE Example: Pipeline simulation
Initial:
Use intermediate
storage:
5/20/2011HIPS201176
![Page 77: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/77.jpg)
DE Example: Pipeline simulation
Initial:
After 1 simulated
cycle:
5/20/2011HIPS201177
![Page 78: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/78.jpg)
XMTC Runtime
5/20/2011HIPS2011
Allocation of TCUs to virtual-threads
Using XMT hardware support (ps, chkid)
Dynamic Memory Allocation
Serial only now. Parallel is left for future work
Parallel Stack Allocation & Nested Parallelism Scheduling
78
![Page 79: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/79.jpg)
XMTC memory model
Problem: Many processors read and write to the memory in parallel.
Question: In what order does processor A see memory operations from processor B? (Is the
order experienced the same by A and B?)
Example: Initially: x=0, y=0;
Thread A Thread B
x:=1 Read y
y:=1 Read x
Can B read (x,y) = (1,0) ?
5/20/2011HIPS201179
![Page 80: Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev ... · Fuat Keceli, AlexandrosTzannes, George C. Caragea, Rajeev Barua and Uzi Vishkin University of Maryland, College Park,](https://reader031.fdocuments.net/reader031/viewer/2022021809/5c65b03a09d3f2916e8d2c4e/html5/thumbnails/80.jpg)
XMTC memory model
Initially: x=0, y=0
Thread A Thread B
x=1; ... = psm(0,y);//Read y
psm(1,y); //atomic{y++} ... = x;// Read x
implicit
memory
barrier
5/20/2011HIPS201180