Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

38
Earthquake Simula/ons with AWPODC on Titan, Blue Waters and Keeneland Yifeng Cui, San Diego Supercomputer Center Efecan Poyraz, Jun Zhou, Dongj u Choi, Heming Xu, Kyle Withers, Scott Callaghan, Po Chen, Zheqiang Shi, Kim Olsen, Steven Day, Philip Maechling, Thomas Jordan and SCEC CME Collaboration NVIDIA Technology Theater @ SC13 November 21, 2013

Transcript of Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Page 1: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Earthquake  Simula/ons  with  AWP-­‐ODC  on  Titan,  Blue  Waters  and  Keeneland  

 Yifeng Cui, San Diego Supercomputer Center!

!Efecan Poyraz, Jun Zhou, Dongj u Choi, Heming Xu, Kyle Withers, Scott

Callaghan, Po Chen, Zheqiang Shi, Kim Olsen, Steven Day, Philip Maechling, Thomas Jordan and SCEC CME Collaboration!

!NVIDIA Technology Theater @ SC13!

November 21, 2013 !

Page 2: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

HPGeoC  

Supported  by  

Dr.  Heming  Xu   Dr.  Yifeng  Cui   Sheau-­‐Yen  Chen  

Jun  Zhu   Efecan  Poyraz  Amit  Chourasia   Dr.  Daniel  Roten  

Ian  Zhang  

Page 3: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

FEMA  336  Report  (2000)  

About  50%  of  the  na/onal  seismic  risk  is  in  Southern  California  

U. S. Seismic Risk Map  

Page 4: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

San  Andreas  Fault  System  

Creeping Section

Pacific plate is moving NW relative to North America at 5 meters per 100 years

1906 San Francisco Earthquake, M 7.8

1812 Earthquake M 7.5

1680 Earthquake M 7.4 1857 Fort Tejon Earthquake, M 7.9

Page 5: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

San  Andreas  Fault  System  

Creeping Section

Pacific plate is moving NW relative to North America at 5 meters per 100 years

Open interval 104 years Open interval 153 years

Open interval 198 years

Open interval 330 years

The  en/re  southern  San  Andreas  fault  is  “locked  and  loaded.”  

New  paleoseismic  data  reduce  mean  recurrence  interval  for  Carrizo  sec/on  from  260  yr  to  <  140  yr.  

(Source:  T.  Jordan,  SCEC)  

Page 6: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Area-­‐Magnitude  and  Slip  Magnitude  Scaling  San  Andreas  Fault  System  

log10  A  ~  MW  

log10  D  ~  ½  MW  

Page 7: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Frequency-­‐Magnitude  Scaling  San  Andreas  Fault  System  

M8  “outer  scale”  

UCERF2  (Field  et  al.,  2008)  

Page 8: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Natural  Frequency  of  Buildings  

Building  Height   Typical  Natural  Period  

2  story   0.2  seconds  

5  story   0.5  seconds  

10  story   1.0  second  

20  story   2.0  second  

30  story   3.0  second  

50  story   5.0  second  

Tall  buildings  tend  to  have  a  lower  natural    frequency  than  shorter  buildings  

f = 12π

KM

f − natural frequency in HertzK − the stiffness of the building with a specific modeM − the mass of the building associated with the mode

Wenchun  Earthquake,  2008  

Determinis/c  earthquake  wave  propaga/on  simula/ons  do  not  yet  approach  frequencies  of  interest  for  building  engineers  for  most  common  buildings          

Page 9: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Computa/onal  Requirements  

1E+13  

1E+15  

1E+17  

1E+19  

1E+21  

TeraShake   ShakeOut   M8  2-­‐Hz   M8  10-­‐Hz  

Compu

ta/o

nal  

Requ

iremen

ts    

(Mesh  po

ints  X  Tim

e  step

s)  

SDSC  DataStar,2004   OLCF  Jaguar,  2010  TACC  Ranger,  2007  

4D  ra/o  7  x  1016  

Page 10: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Computa/onal  Requirements  

1E+13  

1E+15  

1E+17  

1E+19  

1E+21  

TeraShake   ShakeOut   M8  2-­‐Hz   M8  10-­‐Hz  

Compu

ta/o

nal  

Requ

iremen

ts    

(Mesh  po

ints  X  Tim

e  step

s)  

SDSC  DataStar,2004   OLCF  Jaguar,  2010  TACC  Ranger,  2007  

Page 11: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

M8 Earthquake Simulation

Page 12: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

•  443-­‐billion  elements,  using  16,640  Titan  GPUs  

•  Small-­‐scale  fault  geometry  and  media  complexity  

•  Dynamic  rupture  propaga/on  along  a  rough  fault  embedded  in  a  3D  velocity  structure  

0.306  

3.06  

30.6  

306  

3060  

2  

20  

200  

2000  

20000  

2   20   200   2000   20000  

Tflop

s  

Speedu

p  

Number  of  GPUs  vs  XE6  Cores  

Ideal  Speedup  

NCCS  Titan  Speedup  

NCCS  Titan  XK7  FLOPS  

Blue  Waters  XK7  FLOPS  

2.3  Pflop/s  

Ground  Mo/on  Up  to  10-­‐Hz  on  BW/Titan  

Page 13: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Improvement  of  models  

4

Invert  

Other  Data    Geology  Geodesy

4

Ground-­‐mo/on  inverse  problem  (AWP-­‐ODC,  SPECFEM3D)  

Physics-­‐based  simula8ons  

AWP  =  Anelas/c  Wave  Propaga/on  NSR  =  Nonlinear  Site  Response

KFR  =  Kinema/c  Fault  Rupture  DFR  =  Dynamic  Fault  Rupture    

2

AWP   Ground  Mo/ons  NSR  

2 KFR  

Ground  mo/on  simula/on  (AWP-­‐ODC,  Hercules)  

Empirical  models  

Standard  seismic  hazard  analysis  (RWG,  AWP-­‐ODC)  1

Intensity  Measures  

Ajenua/on  Rela/onship

1 Earthquake  Rupture    

Forecast

SCEC  Computa/onal  Pathways  

“Extended”  Earthquake  Rupture  Forecast

Structural  Representa/on  

3

AWP  DFR  3

Dynamic  rupture  modeling  (SORD,  AWP-­‐ODC)  Hybrid  MPI/CUDA  

Page 14: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Probabilis/c  Seismic  Hazard  Model  

Receiver  

Source  1  

Source  3  Source  2  

M  sources  to  N  receivers  requires  M  simula/ons  M  sources  to  N  receivers  requires  3N  simula/ons  

•  Physics-­‐based  seismic  hazard  model  requires  more  than  a  few  earthquake  simula/ons  

–  Standard  “forward”  simula/on  compu/ng  3-­‐component  seismograms  from  M  sources  at  N  sites  requires  M  simula/ons  (M  >  105,  N  <  103)  

–  Strain  Green  tensor  based  “reciprocal”  simula/on  compu/ng  3-­‐component  seismograms  for  M  sources  at  N  sites  requires  only  3N  simula/ons  

–  Use  of  reciprocity  reduces  CPU  /me  by  a  factor  of  ~2,000    

Un(r,rs)=Gnj,i(r,rs)Mji  Un(r,rs)=H(rs,r)Mji  H(rs,r)=[Gjn,i(rs,r)  +  Gin,j(rs  ,r)]/2  

P(IMk)  P(IMk  |  Sn)  P(Sn)  

Intensity  Measures  

Ajenua/on  Rela/onship

Earthquake  Rupture    Forecast

Probabilisbc  Seismic  Hazard  Analysis  

Page 15: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

CyberShake    hazard  map  PoE  =  2%  in  50  yrs  

CyberShake  seismogram  

•  1144 sites in LA region, f < 0.5 Hz, 2013 -  Produced four alternative seismic hazard maps for

southern California -  7.1 million CPU hrs (28-day run using Blue Waters

and Stampede) -  189 million jobs -  165 TB of total output data -  10.6 TB of stored data

CyberShake  Hazard  Model  

•  5,000 sites, f < 1.0, 2014-2015 -  Produce seismic hazard maps for entire

California -  723 million core-hours -  4.2 billion jobs -  56 PB of total output data -  3.0 PB of stored data

LA  Region  Map,  v13.4   State-­‐wide  Map  

Page 16: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

CyberShake  SGT  Simula/ons  on  XK7  vs  XE6  CyberShake  1.0  Hz   XE6   XK7   XK7  

(CPU-­‐GPU  co-­‐scheduling)  

Nodes   400   400   400  

SGT  hrs  per  site   10.36   2.80   2.80  

Post-­‐processing  hours  per  site**   0.94   1.88   2.00  

Total  Hrs  per  site   11.30   4.68   2.80  

Total  SUs(Millions)*     723  M   299  M   179  M  

SUs    saving  (Millions)   424  M   543  M  

*  Scale  to  5000  sites  based  on  two  strain  Green  tensor  runs  per  site;  **  based  on  CyberShake  13..4  map  

3.7x    speedup  

Page 17: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

AWP-­‐ODC  •  Started as personal research code (Olsen 1994) •  3D velocity-stress wave equations solved by explicit staggered-grid 4th-order FD •  Memory variable formulation of inelastic relaxation

using coarse-grained representation (Day 1998) •  Dynamic rupture by the staggered-grid split-node (SGSN)

method (Dalguer and Day 2007) •  Absorbing boundary conditions by perfectly matched layers

(PML) (Marcinkovich and Olsen 2003) and Cerjan et al. (1985)

∂tν =1ρ∇ ⋅σ

∂tσ = λ(∇⋅ν )Ι+ µ(∇ν +∇νΤ )

τ idς i(t)dt

+ ς i(t) = λiδMMu

ε(t)

σ ( t) = Mu ε(t) − ς i( t)i=1

N

∑'

( )

*

+ ,

Q −1(ω ) ≈δMMu

λiωτ iω 2τ i

2 +1i=1

N

1

x

x

x

!

!

!!

!

!

!

!

!

!

!

!

!

!

1

2

3

1

4

4

3

5 5

6

6

7

8

8

2

2

3

1

"

"

"12

23

1313

{!

"11 #11"

2222 #"33 33#

12#

23#

#

2

3

v

v

v

Relaxation Time

Distribution

Unit Cells

Inelas8c  relaxa8on  variables  for  memory-­‐variable  ODEs  in  AWP-­‐ODC  

Page 18: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Two-­‐layer  3D  domain  decomposi/on  on  CPU-­‐GPU  Ø  X&Y  

decomposi/on  for  CPUs    

Ø  Y&Z  decomposi/on  for  GPU  SMs  

GPU  Code:  Decomposi/on  on  CPU  and  GPU    

Page 19: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Single-­‐GPU  Op/miza/ons    

Global  memory  Op/miza/on  •  global  memory  coalescing  •  texture  memory  for  3D  constant  variables  •  Constant  memory  for  scalar  constants  

Using  L1/L2  cache  rather  shared  memory  

Page 20: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Velocity  as  input    

to  compute  stress  

Velocity    

communica/on  

Communica/on  Reduc/on  •  Extend  ghost  cell  region  with  two  extra  layers  and  compute  rather  

communicate  for  the  ghost  cell  region  updates  before  stress  computa/on.    •  The  2D  XY  plane  represents  the  3D  sub-­‐domain,  as  no  communica/on  in  Z  

direc/on  is  required  due  to  2D  decomposi/on  for  GPUs.    

Velocity  before  computabon   Velocity  aier  computabon   Velocity  aier  communicabon   Stress  aier  computabon  

Stress  as  input  to  compute  next  /me  step  velocity,    

∂tν =1ρ∇ ⋅σ

Page 21: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

GPU-­‐GPU  Communica/on  

Communica/on  Velocity   Stress  

Frequency   Message  size   Frequency   Message  size  

Before  Communicabon  Reducbon   4   6*(nx+ny)*NZ   4   12*(nx+ny)*NZ  

Aier  Communicabon  Reducbon   4   12*(nx+ny+4)*NZ   No  communica/on  

Page 22: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Compu/ng  and  Communica/on  Overlapping  

Page 23: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Compu/ng  and  Communica/on  Overlapping  

XK7  Nodes  used   Elements    (1000s)    Wall  Clock  Time    Parallel  Efficiency  

8192   429,496,729   0.1085   100%  

16384   858,993,459   0.1159   93.2%  

Page 24: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

•  Parallel  I/O  •  Read  and  redistribute  mul/ple  terabytes  inputs  

•  Conbguous  block  read  by  reduced  number  of  readers  

•  High  bandwidth  asynchronous  point-­‐to-­‐point  communicabon  redistribubon  

Two-­‐phase  I/O  Model  cores  

Shared  file  

Page 25: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

•  Parallel  I/O  •  Read  and  redistribute  mul/ple  terabytes  inputs  

•  Conbguous  block  read  by  reduced  number  of  readers  

•  High  bandwidth  asynchronous  point-­‐to-­‐point  communicabon  redistribubon  

Two-­‐phase  I/O  Model  

OSTs

•  Aggregate  and  write  

Page 26: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

OSTs

Stripe    size

Stripe    size

Stripe    size

Temporal  aggregator  

bme  step  1  bme  step  2  …  …  bme  step  N  

MPI-­‐IO  •  Parallel  I/O  

•  Read  and  redistribute  mul/ple  terabytes  inputs  •  Conbguous  block  read  by  reduced  number  of  

readers  •  High  bandwidth  asynchronous  point-­‐to-­‐point  

communicabon  redistribubon  •  Aggregate  and  write  

•  Temporal  aggregabon  buffers  •  Conbguous  writes    •  Throughput  

Two-­‐phase  I/O  Model  

Page 27: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

•  ADIOS  checkpoin/ng  •  Effec/ve  I/O  by  separa/ng  metadata  and  an  API  

library  

Two-­‐phase  I/O  Model  

ADIOS    

Temporal  aggregator  

External  metadata  (XML  file)  

bme  step  1  bme  step  2  …  …  bme  step  N  

OST1 OST2 OST3

File  1   File  2   File  3  

Spabal  aggregator  

…  

…  

Joint ADIOS work with S. Klasky, N. Podhorszki and Q. Liu of ORNL

•  Parallel  I/O  •  Read  and  redistribute  mul/ple  terabytes  inputs  

•  Conbguous  block  read  by  reduced  number  of  readers  

•  High  bandwidth  asynchronous  point-­‐to-­‐point  communicabon  redistribubon  

•  Aggregate  and  write  •  Temporal  aggregabon  buffers  •  Conbguous  writes    •  Throughput  

Page 28: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

AWP-­‐ODC  Weak  Scaling  

0.1$

1$

10$

100$

1000$

2$ 20$ 200$ 2000$ 20000$

TFLOP

S'

Number'of'nodes'

ideal$

AWPg/XK7$

AWPg/HPSL250$

AWPc/XE6$

94%  efficiency  

100%  efficiency  

Page 29: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

CPUs/GPUs  Co-­‐scheduling  

aprun  -­‐n  50    <GPU  executable>  <arguments>  &  get  the  PID  of  the  GPU  job  cybershake_coscheduling.py:                build  all  the  cybershake  input  files              divide  up  the  nodes  and  work  among  a  customizable  number  of  jobs                for  each  job:                                fork  extract_sgt.py  cores  -­‐-­‐>  performs  pre-­‐processing  and  launches    

   "aprun  -­‐n  <cores  per  job>  -­‐N  15  -­‐r  1  <cpu  executable  A>&"                                get  PID  of  the  CPU  job                while  executable  A  jobs  are  running:                                check  PIDs  to  see  if  job  has  completed                                if  completed:  launch                                              “aprun  -­‐n  <cores  per  job>  -­‐N  15  -­‐r  1  <cpu  executable  B>&”                  while  executable  B  jobs  are  running:                                check  for  comple8on  check  for  GPU  job  comple8on    

–  CPUs  run  reciprocity-­‐based  seismogram  and  intensity  computa/ons  

–  Run  mul/ple  MPI  jobs  on  compute  nodes  using  Node  Managers  (MOM)  

Page 30: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Post-­‐processing  on  CPUs:  API  for  Pthreads  •  AWP-­‐API  lets  individual  

pthreads  make  use  of  CPUs:  post-­‐processing    –  Vmag,  SGT,  seismograms  –  Stabsbcs  (real-­‐bme  performance  

measuring)  –  Adapbve/interacbve  control  

tools  –  Visualizabon  –  Output  wribng  is  introduced  as  a  

pthread  that  uses  the  API  

Ini$alize)

Calculate)SGT)

Is)the)signal)

received?)

Write)out):)MPI:IO)

Is)it)$me)to)write)out?)

Ini$alize)simula$on)

Ini$alize)modules)

Start)computa$on)

on)GPU)

Specified)$me)step?)

Copy)velocity)data)and)signal)modules)

Finalize)

More)$me)steps?)

Main)thread)

GPU)

Modules)on)other)CPUs)on)XK7)

yes)

no)

yes)

yes)

yes)

no)

no)

Page 31: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Accelara/ng  CyberShake  Calcula/ons  On  GPUs  USC

1 0- 2 2 3 4 5 6 1 0

- 1 2 3 4 5 6 7 1 00 2 3

3s SA (g)

1 0- 5

1 0- 4

1 0- 3

1 0- 2

1 0- 1

Probab

ility Ra

te (1/y

r)

Page 32: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

CyberShake  as  a  Platorm  for  Opera/onal  Earthquake  Forecas/ng  

Page 33: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Opera/onal  Forecast  –  Harvard  Curves  

Page 34: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Opera/onal  Forecast  -­‐  NSHMP  

Page 35: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Opera/onal  Forecast  –  Aver  2009  Bombay  Beach  

Page 36: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Opera/onal  Forecast  –  Aver  2004  Parkfield  

Page 37: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

10-­‐Hz  Visualiza/on  

Page 38: Earthquake Simula=ons with AWP-‐ODC on Titan, Blue Waters and ...

Collaborators  Carl  Ponder,  Cyril  Zeller,  Stanley  Posey  and  Roy  Kim  (NVIDIA),  Jeffrey  Veper,  Mitch  Horton,  Graham  Lopez,  Richard  Glassbrook  (NICS/ORNL),  Maphew  Norman  and  Jack  Wells  (ORNL),  Bruce  Loiis  (NICS),  DK.  Panda,  Sreeram  Potluri  and  DK’s  team  (OSU),  Gregory  Bauer,  Jay  Alameda,  Omar  Padron  (NCSA);  Robert  Fiedler  (Cray),  

Scop  Baden  and  Didem  Unat  (UCSD),  Liwen  Shih  (UH)    Compu/ng  Resources  

NCSA  Blue  Waters,  OLCF  Titan,  XSEDE  Keeneland,    NVIDIA  GPUs  donabon  to  HPGeoC/SDSC  

NSF  Grants  NCSA  NEIS-­‐P2/PRAC  OCI-­‐0832698,  XSEDE  ECCS,  PRAC,  SI2-­‐SSI  (OCI-­‐1148493  ),  Geoinformabcs  (EAR-­‐1226343),  NSF/USGS  SCEC4  Core  (EAR-­‐0529922  and  

07HQAG0008)  

Acknowledgements