Parallel numerical algorithms: Course...
Transcript of Parallel numerical algorithms: Course...
![Page 1: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/1.jpg)
Parallel numerical algorithms:Course overview
Prof. Richard Vuduc
Georgia Institute of Technology
CSE/CS 8803 PNA, Spring 2008
[L.01] Tuesday, January 8, 2008
1
![Page 2: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/2.jpg)
Sources for today’s material
CS 267 (Yelick @UCB)
David Keyes
Williams, et al. SC’07 (UCB)
Higham, Accuracy and Stability of Numerical Algorithms
2
![Page 3: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/3.jpg)
Lecture: Patterson’s Law of Attention SpanD. Patterson (UC Berkeley)
Attention span
5 min 10 min 50 min
“In conclusion…”
3
![Page 4: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/4.jpg)
Why study PNA today?
4
![Page 5: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/5.jpg)
Why study PNA today?
Current and future apps are numerical, data-intensive
Parallel hardware widely available
Computing industry betting its future on it! (next lecture)
Need to program for parallelism and locality explicitly
Algorithmic costs changed: “mops” not flops; “accuracy”
“Winners” unclear
Interaction of applications, algorithms, and systems
4
![Page 6: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/6.jpg)
5
![Page 7: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/7.jpg)
5
![Page 8: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/8.jpg)
Application example:Google’s PageRank
6
![Page 9: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/9.jpg)
Google’s web search procedure
Step 1: Rank all web pages, given the web
Update every once in a while
Step 2: Return list of matching pages in order by rank, given query
At “query-time”
7
![Page 10: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/10.jpg)
Google’s web search procedure
Step 1: Rank all web pages, given the web
Update every once in a while
Step 2: Return list of matching pages in order by rank, given query
At “query-time”
Compute ranking using a “random surfer” model
7
![Page 11: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/11.jpg)
Start on page i.A “random surfer” model.
i
8
![Page 12: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/12.jpg)
Jump to j with some probability.A “random surfer” model.
i
j
9
![Page 13: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/13.jpg)
Repeat.A “random surfer” model.
i
j
k
10
![Page 14: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/14.jpg)
Where will the surfer end up?A “random surfer” model.
i
j
k
11
![Page 15: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/15.jpg)
Probability of being on page i at time t.
i
xt(i)
12
![Page 16: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/16.jpg)
Transition probability (=0 if no link).
i
j
W (i, j)
13
![Page 17: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/17.jpg)
Probability of being on page j at time t+1, with transition probability W(i,j).
i
j
i
i
xt+1(j) =n!
i=1
xt(i) · W (i, j)
14
![Page 18: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/18.jpg)
Repeat matrix-vector multiply until convergence.
i
j
i
i
xt+1 = WT · xt
15
![Page 19: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/19.jpg)
“Power method” for computing the principal eigenvector of WT.
i
j
i
i
xt+1 = WT · xt = (WT )2 · xt!1 = · · ·= (WT )t+1 · x0
16
![Page 20: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/20.jpg)
Applications and architectures
17
![Page 21: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/21.jpg)
Units of measurement
“Flop”: Floating-point operation
“Flop/s”: Flops per second
“Bytes”: Size of data (double-precision float is 8 bytes)
Today’s peak speeds
Single processor core ~5 Gflop/s
STI-Cell (8-core) ~15 Gflop/s
NVIDIA Tesla (128-core) ~500 Gflop/s
Note: Achievable speeds often < 10% of peak
18
![Page 22: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/22.jpg)
Algorithmic efficiency rivals architectural advances in scale.If n=64, flops reduced by ~16 M [6 mo. to 1 sec.]; Source: Keyes (2004)
Year Method Reference Storage Flops
1947
1950
1971
1984
GaussianElimination
Von Neumann & Goldstine
n5 n7
Optimal SOR Young n3 n4 log n
Conj. grad. Reid n3 n3.5 log n
Full multigrid Brandt n3 n3
∇2u=f 64
64 64
19
![Page 23: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/23.jpg)
year
relativespeedup
Algorithmic and architectural improvements go hand-in-hand.
20
![Page 24: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/24.jpg)
Parallelism and algorithmic innovation outpace Moore’s Law
1
10
100
1000
10000
100000
1000000
1988 1990 1993 1995 1997 1999 2001 2003 2005 2007
Gflo
p/s
Year
Gordon Bell Winners
Moore’s Law Projection
105x improvement in ~20 years
100x
21
![Page 25: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/25.jpg)
Source: SCaLeS report 2 (2004), via D. Keyes. http://pnl.gov/scales
22
![Page 26: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/26.jpg)
0
1
2
3
4
5
6
7
8
9
10
1980 1990 2000 2010
Calendar Year
Log
Effe
ctiv
e G
igaF
LOPS
High Order
Autocode
ARK integratorcomplex chem Higher
orderAMR
NERSCRS/6000
NERSCSP3
Cray 2
AMR
Low Mach
“Moore’s Law” for Combustion Simulations
Source: SCaLeS report 2 (2004), via D. Keyes. http://pnl.gov/scales
23
![Page 27: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/27.jpg)
Intel, on their hardware strategy:
“We are dedicating all of our future product development to multicore designs. … This is a sea change in computing.” (2005)
24
![Page 28: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/28.jpg)
1
10
100
1000
10000
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
Per
form
ance
(vs.
VA
X-1
1/78
0)
25%/year
52%/year
??%/year
Uniprocessor performance not keeping up with Moore’s law.
3X
2x every5 years?
Source: Hennessey and Patterson, CA:AQA (4th edition), 2006
25
![Page 29: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/29.jpg)
What’s so hard about parallel programming?
26
![Page 30: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/30.jpg)
On p processors with fraction s serial work:
Speedup(p) =Time(1)Time(p)
! 1s + 1!s
p
! 1s
Finding enough parallelism.Amdahl’s Law: Maximum speedup limited by sequential part.
27
![Page 31: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/31.jpg)
Parallelism incurs overheads.
Examples of overheads:
Cost of starting a thread or process
Cost of communicating shared data
Cost of synchronization
Cost of extra (redundant) computation
Costs may be milliseconds, which is millions of flops
Tradeoff: Granularity of task vs. amount of parallelism
28
![Page 32: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/32.jpg)
on-chipcacheregisters
datapath
control
processor
Secondlevelcache(SRAM)
Mainmemory
(DRAM)
Secondarystorage(Disk)
Tertiarystorage
(Disk/Tape)
TBGBMBKBBSize
10sec10ms100ns10ns1nsCost
Memory hierarchies:Non-uniform memory access cost.Better algorithms and implementations exploit locality.
29
![Page 33: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/33.jpg)
ProcCache
L2 Cache
L3 Cache
Memory
Conventional Storage Hierarchy
ProcCache
L2 Cache
L3 Cache
Memory
ProcCache
L2 Cache
L3 Cache
Memory
potentialinterconnects
NUMA in the parallel setting.Processors should minimize communication (remote data access).
30
![Page 34: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/34.jpg)
+NUMA/Affinity
Naïve Pthreads
Naïve Single Thread
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Dense
Protein
FEM-Sphr
FEM-Cant
Tunnel
FEM-Har
QCD
FEM-Ship
Econom
Epidem
FEM-Accel
Circuit
Webbase LP
Median
GFlop/s
Opteron
DDR2 DRAM
Opteron
DDR2 DRAM
Opteron
DDR2 DRAM
Opteron
DDR2 DRAM
Opteron
DDR2 DRAM
Opteron
DDR2 DRAM
Single Thread
Multiple Threads,One memory controller
Multiple Threads,Both memory controllers
NUMA ⇒ Explicit
data placement.
2x
Sparse matrix-vector multiplyon 2x2-core AMD Opteron.
31
![Page 35: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/35.jpg)
y = AAT xt ! AT x
y ! At
NUMA ⇒ Seek algorithmic reuse.
For large A, “reads” A twice.
32
![Page 36: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/36.jpg)
AAT x = (a1 · · · an)
!
"#aT1...
aTn
$
%& x =n'
i=1
ai
(aT
i x)
Reorganize to improve reuse.
33
![Page 37: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/37.jpg)
Algorithms and implementations must guard against load imbalance.
Load imbalance: Subset of processors becomes idle
Insufficient parallelism
Unequally sized tasks
Sources of imbalance
Local adaptation (adaptive mesh refinement)
Tree-structured computations
Fundamentally unstructured problem
34
![Page 38: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/38.jpg)
x̄ =1n
n!
i=1
xi
!(x) =1
n! 1
n!
i=1
(xi ! x̄)2
Let !̂(x) = computed !(x),and " = machine precision.then:
!̂(x)! !(x)!(x)
" (n + 3)" + O""2
#
Large-scale parallelism may raise numerical accuracy issues.
Finite-ness of precision matters more for larger problems
Ill-conditioning
Round-off
Example: Compute variance of data set using “two-pass” algorithm (right)
In single prec. (ε ~ 10-7), all digits lost when n ~ 10 million
35
![Page 39: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/39.jpg)
Course organization
36
![Page 40: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/40.jpg)
Student make-up of course?
Mostly CS students, so will emphasize algorithm-to-machine mapping more than new-algorithm-development
Work in teams of two and three; be interdisciplinary where possible!
37
![Page 41: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/41.jpg)
Workload and grading
Grading
10% - Class participation and “scribe notes”
24% - Two homework/programming assignments
6% - Attend SIAM PP and write-up what you learned
60% - Course project
All homework during first 8 weeks; last 8 weeks for your project!
Collaboration encouraged, but don’t “cheat.”
No textbook, but will supplement lectures with readings
38
![Page 42: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/42.jpg)
Schedule of topics(approximate)
Week 1: Overview and hardware trends
Week 2: Sources of parallelism & locality; performance modeling
Week 3: Basics of parallel programming [HW 1]
Week 4: Structured grids; dense linear algebra
Week 5-6: Dense and sparse linear algebra
Week 7: FFT ; floating-point issues
Week 8: Single-processor performance tuning [HW 2; project proposals]
Guest lecture by Hyesoon Kim on General-purpose GPU programming
39
![Page 43: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/43.jpg)
Schedule of topics (2)
Week 9: Automatic performance tuning (autotuning)
Week 10: Load balancing; SIAM PP
Week 11-12: Particle methods; graph partitioning [Project checkpoint]
Week 13: PDEs
Week 14: Event-driven methods
Week 15: Volunteer computing ; parallel languages research
Week 16: Project presentations
Project write-up due on “final exam” day
40
![Page 44: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/44.jpg)
Homework #0:Complete course survey
I will post all materials at GT T-Square site for “CSE-8803-PNA”
Go to: http://t-square.gatech.edu
Note: Cross-listed with CS-8803-PNA / CS-4803-PNA, but ignore these
Fill out survey questions under “Poll” tab
When you “submit” the assignment, briefly describe what you hope to get out of this course
Also: Use “Forum” to introduce yourselves
41
![Page 45: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/45.jpg)
“In conclusion…”
42
![Page 46: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/46.jpg)
y1
y2
y3
y4
y5
t1
t2
t3
t4
t5
x1
x2
x3
x4
x5
a11 a11
a12 a12
Need for locality and parallelism drives new algorithms.Example: y = A2*x
43
![Page 47: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/47.jpg)
y1
y2
y3
y4
y5
t1
t2
t3
t4
t5
x1
x2
x3
x4
x5
a11 a11
a12 a12
Need for locality and parallelism drives new algorithms.Example: y = A2*x
44
![Page 48: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/48.jpg)
y1
y2
y3
y4
y5
t1
t2
t3
t4
t5
x1
x2
x3
x4
x5
Need for locality and parallelism drives new algorithms.Example: y = A2*x. A new kernel implies a new algorithm… ?
45
![Page 49: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/49.jpg)
Some questions raised by this example.
This algorithm has locality, but what about parallelism?
What is the relationship between locality and parallelism?
The “inner loop” of a solver based on this kernel is now Ak*x, not just A*x. How does this change the “outer loops?”
46
![Page 50: Parallel numerical algorithms: Course overviewvuduc.org/teaching/cse8803-pna-sp08/slides/cse8803-pna-sp08-01.pdfParallelism and algorithmic innovation outpace Moore’s Law 1 10 100](https://reader036.fdocuments.net/reader036/viewer/2022070916/5fb6efd3a75c2017975b078d/html5/thumbnails/50.jpg)
Technical challenges of PNA
Precision, memory, and processing are finite resources
Curse of dimensionality: 3D space + time eats up Moore’s Law
Hardware changes quickly
Huge amount of relevant domain-knowledge exists: Collaborate!
Parallel programming is hard
47