ECE 1747: Parallel Programming
description
Transcript of ECE 1747: Parallel Programming
![Page 1: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/1.jpg)
ECE 1747: Parallel Programming
Distributed Shared Memory (DSM)
![Page 2: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/2.jpg)
Multiprocessor (SMP)
proc1 proc3
X=0
X=0 X=0
proc2
X=0
![Page 3: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/3.jpg)
Consistency Models
• Sequential Consistency– All processors observe the same order– Must correspond to some serial order– Only ordering constraint is that
reads/writes of P1 appear in the same order, but no restrictions on relative ordering between processors.
![Page 4: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/4.jpg)
Common consistency protocols
• Write update– Multicast update to all replicas
• Write invalidate– Invalidate cached copies in p2, p3– Cache miss if p2/p3 access X
• Valid data from other cache
![Page 5: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/5.jpg)
Distributed Shared Memory (DSM)
mem0
proc0
mem1
proc1
mem2
proc2
memN
procN
network
...
shared memory
![Page 6: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/6.jpg)
DSM programming
• Standard – pthread-like• synchronizations
– Barriers – Locks– Semaphores
![Page 7: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/7.jpg)
Sequential SOR
for some number of timesteps/iterations {for (i=0; i<n; i++ )
for( j=1, j<n, j++ )temp[i][j] = 0.25 *
( grid[i-1][j] + grid[i+1][j]
grid[i][j-1] + grid[i][j+1] );for( i=0; i<n; i++ )
for( j=1; j<n; j++ )grid[i][j] = temp[i][j];
}
![Page 8: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/8.jpg)
Parallel SOR with Barriers (1 of 2)
void* sor (void* arg){
int slice = (int)arg;int from = (slice * (n-1))/p + 1;int to = ((slice+1) * (n-1))/p + 1;
for some number of iterations { … }}
![Page 9: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/9.jpg)
Parallel SOR with Barriers (2 of 2)
for (i=from; i<to; i++) for (j=1; j<n; j++)
temp[i][j] = 0.25 * (grid[i-1][j] + grid[i+1][j] + grid[i][j-1] + grid[i][j+1]);
barrier();for (i=from; i<to; i++)
for (j=1; j<n; j++) grid[i][j]=temp[i][j];
barrier();
![Page 10: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/10.jpg)
Sequential Consistency DSM
• As proposed by Li & Hudak, TOCS ‘86.• Use virtual memory to implement
sharing.• Shared memory divided up by virtual
memory pages.• Use an SMP-like coherence protocol.• Keep pages in one of three states:
– invalid, read-only, read-write
![Page 11: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/11.jpg)
SC implementation
• Synchronous read/write– Writes must be propagated before
moving on to the next operation
![Page 12: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/12.jpg)
Read-Write False Sharing
x
y
![Page 13: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/13.jpg)
Read-Write False Sharing (Cont.)
w(x)
r(y) r(y) r(x)
w(x) w(x)
![Page 14: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/14.jpg)
Read-Write False Sharing (Cont.)
w(x)
r(y) r(y) r(x)
synch
w(x) w(x)
![Page 15: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/15.jpg)
Weak Consistency (WEAKC)
• Data modifications are only propagated at the time of synchronization.
• Works fine if program is properly synchronized through system primitives.– All programs should be …
![Page 16: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/16.jpg)
Read-Write False Sharing (Before)
w(x)
r(y) r(y) r(x)
synch
w(x) w(x)
![Page 17: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/17.jpg)
Read-Write False Sharing (WEAKC)
w(x) w(x)
r(y) r(y) r(x)
synch
![Page 18: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/18.jpg)
Write-Write False Sharing
x
y
![Page 19: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/19.jpg)
Write-Write False Sharing
w(x)
w(y) w(y) r(x)
synch
w(x) w(x)
![Page 20: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/20.jpg)
Write-Write False Sharing (WEAKC)
w(x)
w(y) w(y) r(x)
synch
w(x) w(x)
![Page 21: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/21.jpg)
Multiple Writer (MW) Protocols
• Allows multiple writers per page.• Modifications merged at
synchronization (according to weakc definition).
• Modifications are recorded through a mechanism called twinning and diffing.
![Page 22: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/22.jpg)
Write-Write False Sharing and MW
w(x)
w(y) w(y) r(x)
synch
w(x) w(x)
![Page 23: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/23.jpg)
Creating a diff (delta)
w(x) w(x)...
twin Diff (delta)
writablewrite-protected
write-protected
![Page 24: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/24.jpg)
Write-Write False Sharing and MW
w(x)
w(y) w(y) r(x)
synch
w(x) w(x)
y yx
xtwin
twin
x
![Page 25: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/25.jpg)
Release Consistency (RC)
• Distinguish acquires from releases– Ordinary read/write wait until the
previous acquire is performed– Release waits until previous
read/write are performed– Acquire/release are sequentially
consistent w.r.t. one another
![Page 26: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/26.jpg)
Eager & Lazy Release Consistency
• Eager release consistency: transfer consistency information at release of a lock.
• Lazy release consistency: transfer consistency information at acquire of a lock.
![Page 27: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/27.jpg)
Eager Release Consistency
w(x) rel
acq r(x)
acq w(x) rel
p1
p2
p3
p4
Acq w(x) rel
![Page 28: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/28.jpg)
Lazy Release Consistency
w(x) rel
acq r(x)
acq w(x) rel
p1
p2
p3
p4
Acq w(x) rel
![Page 29: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/29.jpg)
Lazy Release Consistency
• Acquiring processor determines witch modifications it needs to see.
w(x) rel
acq w(y) rel
p1
p2
p3acq r(x) r(y)
synch
![Page 30: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/30.jpg)
Vector Timestamps
w(x) rel
acq w(y) rel
p1
p2
p3acq r(x) r(y)
000
000
000
100
110
![Page 31: ECE 1747: Parallel Programming](https://reader034.fdocuments.net/reader034/viewer/2022050805/568158c7550346895dc61122/html5/thumbnails/31.jpg)
DSM Summary
• Relaxed consistency– application’s definition of correctness
• >70% performance of corresponding message passing applications