CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann...
Transcript of CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann...
![Page 1: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/1.jpg)
CUDA Implementation of the Lattice Boltzmann MethodCSE 633 Parallel Algorithms
Andrew Leach
University at Buffalo
2 Dec 2010
A. Leach (University at Buffalo) CUDA LBM Nov 2010 1 / 16
![Page 2: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/2.jpg)
Motivation
The Lattice Boltzmann Method(LBM) solves the Navier-Stokesequation accurately and efficiently.
Uniformity makes it easy to parallelize.
High volume of simple calculations make it ideal for GPGPUcomputing.
A. Leach (University at Buffalo) CUDA LBM Nov 2010 2 / 16
![Page 3: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/3.jpg)
Motivation
The Lattice Boltzmann Method(LBM) solves the Navier-Stokesequation accurately and efficiently.
Uniformity makes it easy to parallelize.
High volume of simple calculations make it ideal for GPGPUcomputing.
A. Leach (University at Buffalo) CUDA LBM Nov 2010 2 / 16
![Page 4: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/4.jpg)
Motivation
The Lattice Boltzmann Method(LBM) solves the Navier-Stokesequation accurately and efficiently.
Uniformity makes it easy to parallelize.
High volume of simple calculations make it ideal for GPGPUcomputing.
A. Leach (University at Buffalo) CUDA LBM Nov 2010 2 / 16
![Page 5: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/5.jpg)
LBM Degrees of Freedom
F5
F8
F2
F3
F7
F6
F1
F4
Each lattice point has an associated mass density
This mass density is projected in 9 directions
A. Leach (University at Buffalo) CUDA LBM Nov 2010 3 / 16
![Page 6: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/6.jpg)
LBM Degrees of Freedom
F5
F8
F2
F3
F7
F6
F1
F4
Each lattice point has an associated mass density
This mass density is projected in 9 directions
A. Leach (University at Buffalo) CUDA LBM Nov 2010 3 / 16
![Page 7: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/7.jpg)
LBM Stream
F8
F1
F5F2
F6
F3
F7F4
At each time step, each neighbor passes mass density
A. Leach (University at Buffalo) CUDA LBM Nov 2010 4 / 16
![Page 8: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/8.jpg)
LBM Collision
F5
F8
F2
F3
F7
F6
F1
F4
Collision occurs with the accepted mass densities
Equillibrium condition is solved
New projected mass densities are assigned
A. Leach (University at Buffalo) CUDA LBM Nov 2010 5 / 16
![Page 9: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/9.jpg)
LBM Collision
F5
F8
F2
F3
F7
F6
F1
F4
Collision occurs with the accepted mass densities
Equillibrium condition is solved
New projected mass densities are assigned
A. Leach (University at Buffalo) CUDA LBM Nov 2010 5 / 16
![Page 10: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/10.jpg)
LBM Collision
F5
F8
F2
F3
F7
F6
F1
F4
Collision occurs with the accepted mass densities
Equillibrium condition is solved
New projected mass densities are assigned
A. Leach (University at Buffalo) CUDA LBM Nov 2010 5 / 16
![Page 11: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/11.jpg)
LBM Boundary Conditions
Bounceback is implemented at solid boundaries
The inlet has predetermined mass density
The outlet accepts outward flow
A. Leach (University at Buffalo) CUDA LBM Nov 2010 6 / 16
![Page 12: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/12.jpg)
LBM Boundary Conditions
Bounceback is implemented at solid boundaries
The inlet has predetermined mass density
The outlet accepts outward flow
A. Leach (University at Buffalo) CUDA LBM Nov 2010 6 / 16
![Page 13: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/13.jpg)
LBM Boundary Conditions
Bounceback is implemented at solid boundaries
The inlet has predetermined mass density
The outlet accepts outward flow
A. Leach (University at Buffalo) CUDA LBM Nov 2010 6 / 16
![Page 14: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/14.jpg)
Code: Data Structures
F1 HOST
Device
Data initialized as an array on host
Pitch stores the width of a row in memory, determined byCudaMallocPitch()
Memory is allocated on the device linear memory withCudaMallocArray()
Array copied from host to device with CudaMemcpy2D()
A. Leach (University at Buffalo) CUDA LBM Nov 2010 7 / 16
![Page 15: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/15.jpg)
Code: Data Structures
F1 HOST
Device
Data initialized as an array on host
Pitch stores the width of a row in memory, determined byCudaMallocPitch()
Memory is allocated on the device linear memory withCudaMallocArray()
Array copied from host to device with CudaMemcpy2D()
A. Leach (University at Buffalo) CUDA LBM Nov 2010 7 / 16
![Page 16: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/16.jpg)
Code: Data Structures
F1 HOST
Device
Data initialized as an array on host
Pitch stores the width of a row in memory, determined byCudaMallocPitch()
Memory is allocated on the device linear memory withCudaMallocArray()
Array copied from host to device with CudaMemcpy2D()
A. Leach (University at Buffalo) CUDA LBM Nov 2010 7 / 16
![Page 17: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/17.jpg)
Code: Data Structures
F1 HOST
Device
Data initialized as an array on host
Pitch stores the width of a row in memory, determined byCudaMallocPitch()
Memory is allocated on the device linear memory withCudaMallocArray()
Array copied from host to device with CudaMemcpy2D()
A. Leach (University at Buffalo) CUDA LBM Nov 2010 7 / 16
![Page 18: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/18.jpg)
Code: Textures
The stream step requires a lot of data retrieval
Texture memory has fast retrieval but limited space
Use cudaBindTextureToArray() to copy data as a texture
A. Leach (University at Buffalo) CUDA LBM Nov 2010 8 / 16
![Page 19: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/19.jpg)
Code: Textures
The stream step requires a lot of data retrieval
Texture memory has fast retrieval but limited space
Use cudaBindTextureToArray() to copy data as a texture
A. Leach (University at Buffalo) CUDA LBM Nov 2010 8 / 16
![Page 20: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/20.jpg)
Code: Textures
The stream step requires a lot of data retrieval
Texture memory has fast retrieval but limited space
Use cudaBindTextureToArray() to copy data as a texture
A. Leach (University at Buffalo) CUDA LBM Nov 2010 8 / 16
![Page 21: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/21.jpg)
Code: Kernels
A kernel is launched on a grid of blocks
Each block consists of threads which will independently run thekernel(SIMD)
What follows is the Kernel for the stream() method. This exampleutilizes a lock-step texture look up.
A. Leach (University at Buffalo) CUDA LBM Nov 2010 9 / 16
![Page 22: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/22.jpg)
Code: Kernels
A kernel is launched on a grid of blocks
Each block consists of threads which will independently run thekernel(SIMD)
What follows is the Kernel for the stream() method. This exampleutilizes a lock-step texture look up.
A. Leach (University at Buffalo) CUDA LBM Nov 2010 9 / 16
![Page 23: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/23.jpg)
Code: Kernels
A kernel is launched on a grid of blocks
Each block consists of threads which will independently run thekernel(SIMD)
What follows is the Kernel for the stream() method. This exampleutilizes a lock-step texture look up.
A. Leach (University at Buffalo) CUDA LBM Nov 2010 9 / 16
![Page 24: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/24.jpg)
Code: Stream
A. Leach (University at Buffalo) CUDA LBM Nov 2010 10 / 16
![Page 25: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/25.jpg)
Runtime Analysis
The following slides contain graphs comparing run times for the LBM on alaptop with 1.3 GHZ processor running sequential C code and a singleTesla GPU running parallel code in CUDA. The change in performancebased on block size is also explored.
A. Leach (University at Buffalo) CUDA LBM Nov 2010 11 / 16
![Page 26: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/26.jpg)
Sequential vs Parallel
A. Leach (University at Buffalo) CUDA LBM Nov 2010 12 / 16
![Page 27: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/27.jpg)
Sequential vs Parallel
A. Leach (University at Buffalo) CUDA LBM Nov 2010 13 / 16
![Page 28: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/28.jpg)
Sequential vs Parallel
A. Leach (University at Buffalo) CUDA LBM Nov 2010 14 / 16
![Page 29: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/29.jpg)
Thank You
Thanks to Dr.Graham Pullan from Cambridge University for letting me useand modify his code.
A. Leach (University at Buffalo) CUDA LBM Nov 2010 15 / 16
![Page 30: CUDA Implementation of the Lattice Boltzmann Method · CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Bu alo 2 Dec 2010](https://reader030.fdocuments.net/reader030/viewer/2022013006/5ad3a0c07f8b9a482c8dfdde/html5/thumbnails/30.jpg)
Bibliography
Alexander Wagner, A Practical Introduction to the Lattice BoltzmannMethod. North Dakota State University, March 2008.
Graham Pullan, A 2D Lattice Boltzmann Flow Solver Demo.http://www.many-core.group.cam.ac.uk/projects/LBdemo.shtml,University of Cambridge.
A. Leach (University at Buffalo) CUDA LBM Nov 2010 16 / 16