“High Performance Computing and Simulation Symposium 2008” Ottawa, Canada, April 14-16, 2008
-
Upload
yetta-bailey -
Category
Documents
-
view
36 -
download
0
description
Transcript of “High Performance Computing and Simulation Symposium 2008” Ottawa, Canada, April 14-16, 2008
Solution of the Implicit Formulation of High Order Diffusion for the Canadian
Atmospheric GEM Model
“High Performance Computing and Simulation Symposium 2008”
Ottawa, Canada, April 14-16, 2008
Abdessamad Qaddouri & Vivian LeeAtmospheric Science & Technology
Ottawa, Canada, April 14-16, 2008 2
Outline
• Introduction of GEM Model • High order Diffusion equation and solution • Parallelization of the solution• Numerical performance Tests• Conclusion
Ottawa, Canada, April 14-16, 2008 3
Numerical Weather Prediction (NWP)
• Physics• Applied Mathematics• Real-time applications• Computers at Canadian Meteorological centre (CMC) IBM P5+
NECSX-5/32M2
NECSX-4/80M3
NECSX-4/16
NEC SX-3/44R
Cray1S
CDC176
CrayXMP 416
CDC 7600
NEC SX-3/44
NEC SX-6/80M10
1
10
100
1000
10000
100000
1000000
10000000
1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
MFL
OPF
s
CrayXMP 28
IBM P4
Ottawa, Canada, April 14-16, 2008 4
0 5 10 30 365deterministic
forecastsprobabilistic
forecasts(days)
902
Statist
ical
(4 tim
es pe
r year
)
1empiricalforecasts
2.5 km
resol
ution
(once
per d
ay)
15 km
resol
ution
(twice
per d
ay)
35 km
resol
ution
(once
per d
ay)
100 k
m resol
ution
(onc
e per
day)
250 k
m resol
ution
(twice
per m
onth)
250-4
00 km
resol
ution
(4 tim
es pe
r year
)Forecast lead time
Ottawa, Canada, April 14-16, 2008 5
Var
iabl
e
Uni
form
Rotated
LimitedArea
15km= 574x641x58
35km=800x600x58
2.5km=672x494x58
Ottawa, Canada, April 14-16, 2008 6
Hydrostatic Model
• Horizontal motion (momentum)
• Thermodynamics, hydrostatic and state
• Continuity and boundary conditions
lnH
H Hd v
d R T p fdt
V k V F
ln ln ( ) 1; ; d T d p gh pFdt dt p RT
ln 0; , 0 bottom top
d p ZD Z Zdt Z Z
Ottawa, Canada, April 14-16, 2008 7
Schematic for Semi lagrangian implicitMethod used for the integration of GEM Model
Discretization ...),,(
0)(
pTX
XdtdX
V
H
( )
( )
X X R
XR X
H
H 2),(),,()(~
)(~
tttdtd
rVrVrV
rVr
Trajectory
)()(
)(
)1()1()1(
)1()()(
kkk
kkk
XXX
XRXX
NH
N
L
L
Nonlinear IterationsDiffusion
on specific fields
Ottawa, Canada, April 14-16, 2008 8
Horizontal High order Diffusion
• Horizontal prognostic field
• Damping rate
121 ; 2, 4,6,8
mm m
t
Wave-length
Dam
ping
rate
Ottawa, Canada, April 14-16, 2008 9
Horizontal High order Diffusion…
• Horizontal prognostic field
• Implicit Discretization
121 ; 2, 4,6,8
mm m
t
1 1
2 2
/22 1 1
22
2 22
1 1
1 1
1with coscos
m n n nm m
m n n
t t
R
a
Ottawa, Canada, April 14-16, 2008 10
Horizontal High order Diffusion …
• Del 4 Horizontal Diffusion
• Spatial Discretization
2
2 0
R
,
, 0
with , ; R
AA
A
P P P P
P P P P
III
r
r
Ottawa, Canada, April 14-16, 2008 11
Spatial disretization
2 21 1
1 12 2 2
1 1 2
1 1 2
21
1 1 1 1
0 1 1 01
11
1
1 1 1 1
1 1
cos cossin sin
cos cos cossin sin sin
coss
;
Nj
Ni
Ni Ni Ni Ni
P
P1
2 21 1
1 1
in
cos cossin sin
Nj
Nj Nj
Nj Nj
Ottawa, Canada, April 14-16, 2008 12
Horizontal High order Diffusion …
• Fast Direct Solution
• Projection
1 1
1
; Z Z
with
Ni NiI I I I
ij i j ij i jI I
I II
NiI Ii i IIii
i
P P
P
0
with
A Z I Z r
A I Z
A ; I
I I I I I
I I I I
I IIP P P
Ottawa, Canada, April 14-16, 2008 13
Horizontal High order Diffusion …
• Direct Solution
• Matrix Form
, 1 , , , 1
, 1 , , , 1
1
1
A 0 A ( ) A 0
0 A ( ) A 0 A
r ; 1, .0
with
I I Ij j j j j j j j
I I Ij j j j j j j j
j Ij
j
j
Ij
j Ij
P
P
XX j NjX
XZ
BXM
Ottawa, Canada, April 14-16, 2008 14
Horizontal High order Diffusion …
• Block Tri-diagonal problem solution
• Solution
1 1
2 2 2
3
1 1
with1
11 1 1
M
M ( ) ( ); ; 2,
Nj Nj
Nj Nj
i i i i i
D EF D E
FD E
F D
L UD D F E i Nj
( ) ; ( ) * L Y B U X Y
Ottawa, Canada, April 14-16, 2008 15
Summary of the algorithm
• Analysis of the right hand side (FFT or MMM)
• Solution of (Nk*Ni) tri-diagonal Problems
• Synthesis of the solution (FFT or MMM)
,
1
r r ,
Ni
I Ij i i j
i
,
1
.
Ni
I Ii j i j
i
BXM
Ottawa, Canada, April 14-16, 2008 16
A Parallel algorithm
• Global Transposition (Ni/P,Nj/Q,Nk) (Nj/Q,Nk/P,Ni)• Analysis of the right hand side• Global Transposition (Nj/Q,Nk/P,Ni) (Nk/P,Ni/Q,Nj)• Solution of the block tridiagonal problems• Global Transposition (Nk/P,Ni/Q,Nj) (Nj/Q,Nk/P,Ni)• Synthesis of the solution• Global Transposition (Nj/Q,Nk/P,Ni) (Ni/P,Nj/Q,Nk)
Ottawa, Canada, April 14-16, 2008 17
35km mesoglobal runAt 72hr forecast
U component without diffusion
U component with DEL 6 diffusion
Ottawa, Canada, April 14-16, 2008 18
Table 1. Breakdown of timings in the major components of the Canadian 35Km mesoglobal operational model for an integration of 72 hours on 12 nodes (2 x 24 x 4)
Components Time(sec) Percentage
Rhs 14.08 1.48
Adv 247.71 26.01
Prep 14.24 1.49
Nli 33.11 3.48
Sol 71.06 7.46
Bac 13.4 1.41
Phy 435.19 45.7
Hzd 82.86 8.7 vspng 82.86 2.14
output 10.38 1.09
Others 9.91 1.04
Total 952.31 100
Ottawa, Canada, April 14-16, 2008 19
Table 2. MPI test runs for 35km mesoglobal (OpenMP=1);the number of calls to the diffusion is 964 timesSetupP x Q
Number ofPEs
Nodes DiffusionTime(sec)
RelativeIdealSpeedup
RelativeSpeedup
1x16 16 1 596.46 1 1
2x16 32 2 320.46 2 1.86
2x24 48 3 222.34 3 2.68
4x16 64 4 170.12 4 3.51
Ottawa, Canada, April 14-16, 2008 20
Table 3. MPI test runs for 17 Km mesoglobal (OpenMP=1); the number of calls to the diffusion is 964 times.SetupP x Q
Number ofPEs
Nodes DiffusionTime(sec)
RelativeIdealSpeedup
RelativeSpeedup
2x16 32 2 1769.48 1 1
2x24 48 3 1206.01 1.5 1.47
4x16 64 4 915.83 2 1.93
4x20 80 5 764.13 2.5 2.32
4x24 96 6 646.64 3 2.74
7x16 112 7 620.98 3.5 2.85
8x16 128 8 595.77 4 2.97
Ottawa, Canada, April 14-16, 2008 21
MPI Relative Speedup
•35km Mesoglobal FFT 17km Mesoglobal FFT
Ottawa, Canada, April 14-16, 2008 22
Table 4. OpenMP test runs for 35Km mesoglobal configured (1 x 16 x OpenMP) using FFT: the number of calls to the diffusion is 964 times.
OpenMP Nodes Diffusion Time(sec)
Relative Ideal Speedup
Relative Speedup
1 1 596.46 1 1
4 4 186.41 4 3.2
8 8 132.27 8 4.51
Ottawa, Canada, April 14-16, 2008 23
Table 5. OpenMP test runs for 35Km mesoglobal configured(1 x 16 x OpenMP) using Matrix multiplication: the number of calls to the diffusion is 1084 times.
OpenMP Nodes Diffusion Time(sec) Relative Ideal Speedup
Relative Speedup
1 1 2129.93 1 1
4 4 588.08 4 3.62
8 8 348.44 8 6.11
Ottawa, Canada, April 14-16, 2008 24
OpenMP relative Speedup
•35km Mesoglobal FFT 35km Mesoglobal MXM
Ottawa, Canada, April 14-16, 2008 25
Conclusion
• An efficient implementation of the parallel Fast Direct Solution for the implicit formulation of horizontal diffusion problem
• Comparison with iterative methods like preconditioned Krylov methods.
Ottawa, Canada, April 14-16, 2008 26
Thank You!
Merci!