Mesh-free numerical methods for large-scale engineering problems. Basics of GPU-based computing....
-
Upload
evelyn-newton -
Category
Documents
-
view
217 -
download
0
Transcript of Mesh-free numerical methods for large-scale engineering problems. Basics of GPU-based computing....
Mesh-free numerical methods for large-scale engineering problems.
Basics of GPU-based computing. D.Stevens, M.Lees, P. Orsini
Supervisors: Prof. Henry Power, Herve Morvan
January 2008
Outline:
Meshless numerical methods using Radial Basis Functions (RBFs) Basic RBF interpolation Brief overview of the work done in our research group, under the direction of Prof. Henry Power Example results Large scale future problems
GPU computing Introduction to the principle of using graphics processors (GPUs), as floating point co-processors Current state of GPU hardware and software Basic implementation strategy for numerical simulations
GABARDINE - EU project GABARDINE:
Groundwater Artificial recharge Based on Alternative sources of
wateR: aDvanced INtegrated technologies and managEment
Aims to investigate the viability of artificial recharge in groundwater aquifers, and produce decision support mechanisms
Partners include: Portugal, Germany, Greece, Spain, Belgium, Israel, Palestine
University of Nottingham is developing numerical methods to handle: Phreatic aquifers (with associated moving surfaces) Transport processes Unsaturated zone problems (with nonlinear Governing equations)
RBF Interpolation Methods
xPxxu m
N
jjj 1
1
Apply the above at each of the N test locations:
Niff ii ,...,1
N unknowns…
ijji fx
222)(
)ln()(m
m
crr
rrr
0
0.2
0.4
0.6
0.8
1
1.2
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
Kansa’s Method
N
jjj xxu
1
onxgxuB
inxfxuL
)()]([
)()]([ 2
2
.jx
Leg
(Diffusion operator)
i
ij
ijx
ijx
f
g
L
B
][
][Collocating:
Hermitian Method
N
NBjjj
NB
jjj xLxBxu
11
onxgxuB
inxfxuL
)()]([
)()]([
i
ij
ijxijx
ijxijx
f
g
LLBL
LBBB
][][
][][
Operators are also applied to basis functions:
SYMMETRIC SYSTEM
RBF – Features and Drawbacks Partial derivatives are obtained cheaply and accurately by
differentiating the (known) basis functions
Leads to a highly flexible formulation allowing boundary operators to be implemented exactly and directly
Once solution weights are obtained, a continuous solution can be reconstructed over the entire solution domain
A densely populated linear system must be solved to obtain the solution weights
Leads to high computational cost – O(N2), and numerical conditioning issues with large systems, setting a practical limit of ~1000 points
Formulation: LHI method
111
m
N
NBjjj
NB
jjj PxxBxu
inxfxu
onxgxuB
)()(
)()]([
Hermitian Interpolation using solution values, and boundary operator(s) if present
Initial value problem:
Lut
u
LHI method formulation cont…
)()()( kkk dH
Form N systems, based on a local support:
011
1
1
Tm
Tm
mxxx
m
PBP
PBBBB
PB
H
0
)( i
i
xg
u
d
Hence, can reconstruct the solution in the vicinity of local system k via:
.)()( )()( xRxu kk
)(,,)( 1 xpxBxxR mii where:
H ~ 10 x 10 matrix
CV-RBF (Modified CV Scheme) Classic CV approach
Polynomial interpolation to compute the flux
CV-RBF approach
Apply internal operator
Apply Dirichlet operator
Apply Boundary operator
RBF interpolation to compute the flux
L x f x
B x g x
3xon
on
Simulation Workflow
Dataset Generation
Pre-Processing
SimulationPost Processing
GridGenRBF
CVRBF
Triangle
Meshless
TecPlot
CAD
Meshless
RBF specific
Our Code
Convection-Diffusion: Validation
Pe = 50 - LHIM (c = 0.12)
0.8
1.0
1.2
1.4
1.6
1.8
2.0
0.75 0.80 0.85 0.90 0.95 1.00
Analytical
Calculated
Pe = 100 - LHIM (c = 0.08)
0.8
1.0
1.2
1.4
1.6
1.8
2.0
0.75 0.80 0.85 0.90 0.95 1.00
Analytical
Calculated
k = 120 - c = 0.04
0
50
100
150
200
250
300
0.0 0.2 0.4 0.6 0.8 1.0
Calculated
Analytic
Both methods have been validated against a series of 1D and 3D linear and nonlinear advection-diffusion reaction problems, eg:
Pe = 200 - LHIM (c = 0.03)
0.8
1.0
1.2
1.4
1.6
1.8
2.0
0.75 0.80 0.85 0.90 0.95 1.00
Analytical
Calculated
CV-RBF: Infiltration well + Pumping
75 , 31 , 50y zx m L m L mL
Well diameter: 3d m
1.0 0.3 0.2x z yK K m h K m h Soil properties:
Pumping location: 25m from the infiltration well (height y=15m)3
20m
I P qAh
Infiltration-Pumping rate:
CV-RBF: Infiltration well + Pumping 3D model: Mesh (60000 cells) and BC
Boundary conditions31m
0 : 0yn
Everywhere else
CV-RBF: Infiltration well + Pumping
0.2welldh m
Piezometric head contour and streamlines at t=30h
Maximum Displacements: 0.85pumpdh m
plane at z=25m plane at y=29m
Length scale: 100m
LHI: Infiltration model - Setup
Infiltrate over 10m x 10m region at ground surface
Infiltration pressure = 2m
Zero-flux boundary at base (‘solid wall’)Fixed pressure distribution at sides
Initial pressure:(8.0 )
1.0
z m Saturated zone
m Unsaturated zone
Solving Richards’ equation:2
0
1 ( ) ( )( )
'( ) ijp i j
K
t S x x z
sK
LHI: Infiltration model – Soil properties
40 10S
1(1.0, 1.0, 0.2)s mhK
sp SS
)(
00
2
3
2
2
1)(
g
sh
hKK
2
1
2
2
1)()(
g
rsrh
h
Using Van-Genuchten soil representation:
Saturated conductivity:
Storativity:
Relative Hydraultic Conductivity
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-1.2 -1 -0.8 -0.6 -0.4 -0.2 0
Water pressure (m)
Rel
ativ
e co
nd
uct
ivit
y
Water Content
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
-1.2 -1 -0.8 -0.6 -0.4 -0.2 0
Water pressure (m)
Wat
er C
on
ten
t
0.15, 0.45,r s
LHI: Infiltration model - Results 11,585 points arranged in 17 layers
‘Short’ runs: solution to 48 hours
‘Long’ run: solution to 160 hours
Richards’ equation - First example Using the steady-state example given in Tracy (2006)* for the solution of Richards’ equation, with:
eKK s)( ersr )()(
On a domain:
mz
my
mx
24.150
24.150
24.150
With:
24.15sin
24.15sin)1(ln
1)(
yxee rr hh
x
24.15)( x
Top face
All other faces
* F.T.Tracy, Clean two- and three-dimensional analytical solutions of Richards’ equation for testing numerical solvers
00 pS
Richards’ equation - First example
-18
-16
-14
-12
-10
-8
-6
-4
-2
0
0 2 4 6 8 10 12 14 16 18
z (m)
Hyd
. H
ead
(m
)
AnalyticalCalculated
-18
-16
-14
-12
-10
-8
-6
-4
-2
0
0 2 4 6 8 10 12 14 16 18
z (m)
Hy
d.
He
ad
(m
)
Analytical
Calculated
-18
-16
-14
-12
-10
-8
-6
-4
-2
0
0 2 4 6 8 10 12 14 16 18
z (m)
Hyd
. H
ead
(m
)
AnalyticalCalculated
With: (11 x 11 x 11) and (21 x 21 x 21) uniformly spaced points
α = 0.164N = 11
N = 22
α = 0.328N =11
N = 22
-18
-16
-14
-12
-10
-8
-6
-4
-2
0
0 2 4 6 8 10 12 14 16 18
z (m)
Hyd.
Hea
d (
m)
Analytical
Calculated
Richards’ equation - First example (error analysis)
Alpha = 0.164
11*11*11 1.72E-02 3.16E-02
21*21*21 3.52E-03 7.09E-03 6.62e-2
Alpha = 0.328
11*11*11 1.61E-02 3.20E-02
21*21*21 4.24E-03 8.57E-03 1.11
Alpha = 0.492
11*11*11 1.77E-02 2.43E-02
21*21*21 5.16E-03 7.26E-03 5.13
L2 error norm Max errorFinite Volume – max error
Improvement factor:
706.6
129.5
9.34
Good improvement over finite volume results from Tracy paper, particularly with rapidly decaying K and θ functions
Reasonable convergence rate, with increasing point density
Future work Future work will focus on large-scale problems, including:
Regional scale models of real-world experiments in Portugal and Greece Country-scale models of aquifer pumping and seawater intrusion across Israel and Gaza
The large problem size will require a parallel implementation for efficient solution – hence our interest in HPC and GPUs
To our knowledge, we are the only group working on large-scale hydrology problems using meshless numerical techniques
Practical implementation will require the parallelisation of large, sparsely-populated, iterative matrix solvers
GPU Computing: GPU: Graphics Processing Unit Originally designed to accelerate floating-point heavy calculations in computer games eg.
Pixel shader effects (Lighting effects, reflection/refraction, other effects) Geometry setup (character meshes etc) Solid-body physics (…not yet widely adapted)
Massively parallel architecture – currently up to 128 floating point processing units
Recent hardware (from Nov 2006) and software (Feb 2007) advances have allowed programmable processing units (rather than units specialised for pixel or vertex processing)
Has led to "General Purpose GPUs" - ‘GPGPUs’
GPU Computing: GPUs are extremely efficient at handling add-multiply instructions in small ‘packets’ (usually the main computational cost
in numerical simulations)
FP capacity outstrips CPUs, in both theoretical capacity and efficiency (if properly implemented)
GPU Computing: Modern GPUs effectively work like a shared-memory cluster:
GPUs have an extremely fast (~1000Mhz vs ~400Mhz), dedicated onboard memory
Onboard memory sizes currently range up to 1.5Gb (in addition to system memory)
CUDA - BLAS and FFT libraries The CUDA toolkit is a C language
development environment for CUDA-enabled GPUs.
Two libraries implemented on top of CUDA: Basic Linear Algebra System
(BLAS) Fast Fourier Transform (FFT)
Pre-parallelised routines.
Available Examples/Demos:•Parallel bitonic sort •Matrix multiplication •Matrix transpose •Performance profiling using timers •Parallel prefix sum (scan) of large arrays •Image convolution •1D DWT using Haar wavelet •graphics interoperation examples •CUDA BLAS and FFT examples •CPU-GPU C and C++ code integration •Binomial Option Pricing •Black-Scholes Option Pricing •Monte-Carlo Option Pricing •Parallel Mersenne Twister•Parallel Histogram •Image Denoising •Sobel Edge Detection Filter
TESLA - GPUs for HPC
Deskside (2 GPUs 3GB) and Rackmount (4 GPUs 6GB) options
With ~500Gflops per GPU, that is ~2 Teraflops per rack
Deskside is listed at around $4200
GPU Computing – Some results
Example: Multiplication of densely populated matrices O(N3) algorithm…
GPU specs:
GPU model Processors
GPU Clock speed
Memory size
Memory bus COST
8600GTS 32 675Mhz 512Mb 256bit ~£75
8800GT 112 600Mhz 512Mb 256bit ~£150
8800GTX 128 575Mhz 768Mb 320bit ~£250
Matrices are broken down into vector portions, and sent to the GPU stream processors
Use CUDA – nVidia’s C-based API for GPUs
Various CPUs vs GPU
Note: Performance of dual-core and quad-core CPUs is approximated from an idealised parallelisation (ie. 100% efficiency)
Improvement of 8800GTX over various CPUs
0
20
40
60
80
100
120
140
160
180
200
0 1000 2000 3000 4000 5000 6000
Matrix size ( column or row size )
Imp
rov
em
en
t fa
cto
r o
ve
r C
PU
Pentium 4 3Ghz
E2160 (1.86Ghz dual)- £55E6850 (3.0Ghz dual) -£190QX9650 (3Ghz quad)- £690
More GPU propaganda:
Dr. John Stone, Beckmann Institute of Advanced Technology, NAMD virus simulations: 110 CPU hours on SGI Itanium supercomputer => 47minutes with a single GPU Represents a 240-fold speedup
Dr. Graham Pullan, University of Cambridge, CFD with turbine blades (LES and RANS models) 40x absolute speedup switching from a CPU cluster to ‘a few’ GPUs Use 10 million cells on GPU, up from 500,000 on CPU cluster
Good anecdotal evidence for improvement in real-world simulations is available from those who have switched to GPU computing:
Closing words GPU performance is advancing at a much faster rate than CPUs. This is expected to continue for some time yet
With CUDA and BLAS, exploiting parallelism of the GPUs is in some cases easier than traditional MPI approaches
Later this year: 2-3 times the performance of current hardware (over 1 TFLOP per card) Native 64bit capability
More info: www.nvidia.com/tesla www.nvidia.com/cuda www.GPGPU.org