CS 179: GPU Programming

31
CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver

description

CS 179: GPU Programming. Lab 7 Recitation : The MPI/CUDA Wave Equation Solver. MPI/CUDA – Wave Equation. Big idea: Divide our data array between n processes!. MPI/CUDA – Wave Equation. Problem if we’re at the boundary of a process!. t+1. t. t -1. x. - PowerPoint PPT Presentation

Transcript of CS 179: GPU Programming

Page 1: CS 179: GPU Programming

CS 179: GPU ProgrammingLab 7 Recitation: The MPI/CUDA Wave Equation Solver

Page 2: CS 179: GPU Programming

MPI/CUDA – Wave Equation Big idea: Divide our data array between n processes!

Page 3: CS 179: GPU Programming

MPI/CUDA – Wave Equation Problem if we’re at the boundary of a process!

𝑦 𝑥 ,𝑡+1=2 𝑦 𝑥, 𝑡− 𝑦𝑥 ,𝑡 −1+(𝑐∆ 𝑡∆ 𝑥 )2

(𝑦¿¿ 𝑥+1 ,𝑡−2 𝑦𝑥 ,𝑡+𝑦𝑥−1 ,𝑡 )¿

x

Where do we get ? (It’s outside our process!)

tt-1

t+1

Page 4: CS 179: GPU Programming

Wave Equation – Simple Solution After every time-step, each process gives its leftmost and

rightmost piece of “current” data to neighbor processes!

Proc0 Proc1 Proc2 Proc3 Proc4

Page 5: CS 179: GPU Programming

Wave Equation – Simple Solution

Pieces of data to communicate:

Proc0 Proc1 Proc2 Proc3 Proc4

Page 6: CS 179: GPU Programming

Wave Equation – Simple Solution Can do this with MPI_Irecv, MPI_Isend, MPI_Wait:

Suppose process has rank r: If we’re not the rightmost process:

Send data to process r+1 Receive data from process r+1

If we’re not the leftmost process: Send data to process r-1 Receive data from process r-1

Wait on requests

Page 7: CS 179: GPU Programming

Wave Equation – Simple Solution Boundary conditions:

Use MPI_Comm_rank and MPI_Comm_size Rank 0 process will set leftmost condition Rank (size-1) process will set rightmost condition

Page 8: CS 179: GPU Programming

Simple Solution – Problems Communication can be expensive!

Expensive to communicate every timestep to send 1 value!

Better solution: Send some m values every m timesteps!

Page 9: CS 179: GPU Programming

Possible Implementation Initial setup: (Assume 3 processes)

Proc0 Proc1 Proc2

Page 10: CS 179: GPU Programming

Possible Implementation Give each array “redundant regions” (Assume communication interval = 3)

Proc0 Proc1 Proc2

Page 11: CS 179: GPU Programming

Possible Implementation Every (3) timesteps, send some of your data to neighbor

processes!

Page 12: CS 179: GPU Programming

Possible Implementation Send “current” data (current at time of communication)

Proc0 Proc1 Proc2

Page 13: CS 179: GPU Programming

Possible Implementation Then send “old” data

Proc0 Proc1 Proc2

Page 14: CS 179: GPU Programming

Then… Do our calculation as normal, if we’re not at the ends of our array

Our entire array, including redundancies!

𝑦 𝑥 ,𝑡+1=2 𝑦 𝑥, 𝑡− 𝑦𝑥 ,𝑡 −1+(𝑐∆ 𝑡∆ 𝑥 )2

(𝑦¿¿ 𝑥+1 ,𝑡−2 𝑦𝑥 ,𝑡+𝑦𝑥−1 ,𝑡 )¿

Page 15: CS 179: GPU Programming

What about corruption? Suppose we’ve just copied our data… (assume a non-

boundary process)

. = valid ? = garbage ~ = doesn’t matter

(Recall that there exist only 3 spaces – gray areas are nonexistent in our current time

Page 16: CS 179: GPU Programming

What about corruption? Calculate new data…

Value unknown!

Page 17: CS 179: GPU Programming

What about corruption? Time t+1:

Current -> old, new -> current (and space for old is overwritten by new…)

Page 18: CS 179: GPU Programming

What about corruption? More garbage data!

“Garbage in, garbage out!”

Page 19: CS 179: GPU Programming

What about corruption? Time t+2…

Page 20: CS 179: GPU Programming

What about corruption? Even more garbage!

Page 21: CS 179: GPU Programming

What about corruption? Time t+3…

Core data region - corruption imminent!?

Page 22: CS 179: GPU Programming

What about corruption? Saved!

Data exchange occurs after communication interval has passed!

Page 23: CS 179: GPU Programming

“It’s okay to play with garbage… just don’t get sick”

Page 24: CS 179: GPU Programming

Boundary Conditions Applied only at the leftmost and rightmost process!

Page 25: CS 179: GPU Programming

Boundary corruption? Examine left-most process:

We never copy to it, so left redundant region is garbage!

(B = boundary condition set)

Page 26: CS 179: GPU Programming

Boundary corruption? Calculation brings garbage into non-redundant region!

Page 27: CS 179: GPU Programming

Boundary corruption? …but boundary condition is set at every interval!

Page 28: CS 179: GPU Programming

Other details To run programs with MPI, use the “mpirun” command, e.g.mpirun -np (number of processes) (your program and arguments)

CMS machines: Add this to your .bashrc file:alias mpirun=/cs/courses/cs179/openmpi-1.6.4/bin/mpirun

Page 29: CS 179: GPU Programming

Common bugs (and likely causes) Lock-up (it seems like nothing’s happening):

Often an MPI issue – locks up on MPI_Wait because some request wasn’t fulfilled

Check that all sends have corresponding receives

Your wave looks weird: Likely cause 1: Garbage data is being passed between

processes Likely cause 2: Redundant regions aren’t being refreshed and/or

are contaminating non-redundant regions

Page 30: CS 179: GPU Programming

Your wave is flat-zero: Left boundary condition isn’t being initialized and/or isn’t

propagating Same reasons as previous

Common bugs (and likely causes)

Page 31: CS 179: GPU Programming

Common bugs (and likely causes) General debugging tips:

Run at MPI with process number = 1 or 2 Set kernel to write constant value