Accurate(Numerical(Simulaons(of( Chemical ... · PDF...

19
Accurate Numerical Simula.ons of Chemical Phenomena Involved in Energy Produc.on and Storage Using MADNESS Álvaro VázquezMayagoi3a (ALCF), Jeff Hammond (ALCF), Nick Romero (ALCF), Lee Killough ([ALCF]), Kevin Stock (ALCF,OSU), Robert Harrison (ORNL>SBU), ScoQ Thornton (UTK), George Fann (ORNL), Ed Valeev (VT), Justus Calvin (VT), and the rest of the MADNESS team.

Transcript of Accurate(Numerical(Simulaons(of( Chemical ... · PDF...

Page 1: Accurate(Numerical(Simulaons(of( Chemical ... · PDF fileAccurate(Numerical(Simulaons(of(Chemical(PhenomenaInvolved(in(Energy( (and(Storage(Using(MADNESS(Álvaro’VázquezMayagoi3a

Accurate  Numerical  Simula.ons  of  Chemical  Phenomena  Involved  in  Energy  Produc.on  and  Storage  Using  MADNESS  

Álvaro  Vázquez-­‐Mayagoi3a  (ALCF),  Jeff  Hammond  (ALCF),  Nick  Romero  (ALCF),  Lee  

Killough  ([ALCF]),  Kevin  Stock  (ALCF,OSU),  Robert  Harrison  (ORNL-­‐>SBU),  ScoQ  Thornton  (UTK),  George  Fann  (ORNL),  Ed  Valeev  (VT),  Justus  

Calvin  (VT),  and  the  rest  of  the  MADNESS  team.  

Page 2: Accurate(Numerical(Simulaons(of( Chemical ... · PDF fileAccurate(Numerical(Simulaons(of(Chemical(PhenomenaInvolved(in(Energy( (and(Storage(Using(MADNESS(Álvaro’VázquezMayagoi3a

✓A  general  purpose  numerical  framework  for  reliable  and  fast  scien.fic  simula.ons:  Chemistry,  nuclear  physics,  atomic  physics,  material  science,  climate,  fusion,  …    

MADNESS  IS  NOT  A  QUANTUM  CHEMISTRY  CODE!!!  

✓A  general  purpose  parallel  programming  environment  designed  for  the  petascale  and  beyond  –  Standard  C++  with  concepts  from  Cilk,  Charm++  and  Chapel  –  Compa.ble  by  design  with  exis.ng  applica.ons    –  Runs  on  Titan/OLCF,  Mira/ALCF,  …,  Macbook,  …,  Amazon  EC2  

MADNESS:  Mul.resolu.on  Adap.ve  Numerical  Environment  for  Scien.fic  Simula.on  

Page 3: Accurate(Numerical(Simulaons(of( Chemical ... · PDF fileAccurate(Numerical(Simulaons(of(Chemical(PhenomenaInvolved(in(Energy( (and(Storage(Using(MADNESS(Álvaro’VázquezMayagoi3a

World:  parallel  run.me  used  by  MADNESS  but  also  TiledArray  (VT  group)  –  implements  thread  pool  and  queue,  atomics,  ac.ve-­‐messages,  futures,  distributed  containers.  Tensor:  Core  numerical  func.onality  for  mul.dimensional  arrays  and  contrac.ons  –  mTxm  is  the  low-­‐level  kernel.  MRA:  Mul.resolu.on  analysis  –  higher-­‐level  mathema.cal  opera.ons  Apps:  where  the  PDEs  live  –  this  project  focuses  on  moldc,  which  is  the  molecular  density  func.onal  code.  The  moldA  source  is  ~3000  lines…  

Page 4: Accurate(Numerical(Simulaons(of( Chemical ... · PDF fileAccurate(Numerical(Simulaons(of(Chemical(PhenomenaInvolved(in(Energy( (and(Storage(Using(MADNESS(Álvaro’VázquezMayagoi3a
Page 5: Accurate(Numerical(Simulaons(of( Chemical ... · PDF fileAccurate(Numerical(Simulaons(of(Chemical(PhenomenaInvolved(in(Energy( (and(Storage(Using(MADNESS(Álvaro’VázquezMayagoi3a

Quantum  Mechanics  

Hϕ(r) = ( 12∇2 + V )ϕ(r) = εϕ(r)

ϕ(r) = −2(∇2 + 2ε)−1Vϕ(r)

We  can  write  in  an  integral  form  the  solu.on  for  an  electronic  wavefunc.on.  

–  Higher  accuracy  achievable  in  integral  form  –  Correct  asympto.cs  by  design  –  Computa.onally  efficient:  LSR  and  local  refinement  –  MRA  provides  fast  algorithms  with  guaranteed  precision  

Integral  equa.on  from:  

Page 6: Accurate(Numerical(Simulaons(of( Chemical ... · PDF fileAccurate(Numerical(Simulaons(of(Chemical(PhenomenaInvolved(in(Energy( (and(Storage(Using(MADNESS(Álvaro’VázquezMayagoi3a

drϕi*(r)Hϕ j∫ (r) = Hij

( H −ε I ) S = 0

drϕi*(r)ϕ j∫ (r) = Sij

Matrix  representa.on  leads  to  dense  eigenvalue  problem  

MADNESS  solves  a  very  small  basis  set  problem  for  the  ini.al  guess,  so  we  s.ll  need  s.ll,  but  not  like  MPQC  or  other  tradi.onal  codes.    On  the  other  hand,  gegng  all  the  orbitals  is  cri.cal  for  QMBT  methods,  as  in  MPQC,  NWChem,  AQ/CTF.  

Ψ(r1, r2,...) = 1N!det ϕ1(r1)ϕ2 (r2 )...

Hϕ(r) = ( 12∇2 + V )ϕ(r) = εϕ(r)

by  {ϕ} {ϕ}size  of  the  problem  

Page 7: Accurate(Numerical(Simulaons(of( Chemical ... · PDF fileAccurate(Numerical(Simulaons(of(Chemical(PhenomenaInvolved(in(Energy( (and(Storage(Using(MADNESS(Álvaro’VázquezMayagoi3a

•  2010:  Ini.al  port  to  Blue  Gene/P.  

•  2010  –  present:  Raf  Schietekat  ports  Intel  TBB  to  PowerPC  then  Blue  Gene/P  then  Blue  Gene/Q.  

•  2011  –  present:  Alvaro  develops  molecular  DFT  code.    

•  2012  –  2013:  Alvaro  modernizes  dense  linear  algebra  with  Eigen3  and  Elemental.  

•  Summer  2012:  Kevin  Stock  developed  highly  op.mized  mTxm  kernel  for  Blue  Gene/Q.    QPX  vector  intrinsic  code  is  auto-­‐generated  just  as  for  SSE,  AVX,  MIC  and  BGP  (Summer  2010  at  ALCF).  

•  2012  –  present:  George  Fann  scaling  work  for  the  nuclear  physics  code  on  Mira.  

MADNESS:  Summary  of  developments  for  ESP  

Page 8: Accurate(Numerical(Simulaons(of( Chemical ... · PDF fileAccurate(Numerical(Simulaons(of(Chemical(PhenomenaInvolved(in(Energy( (and(Storage(Using(MADNESS(Álvaro’VázquezMayagoi3a

Improving  numerical  library  usage  in  MADNESS  

NAME    Purpose  

GESVD    Singular  value  decomposi.on  (SVD)  

GESV    Solu.on  to:    Ax  =  B  

GELSS    Minimum  norm  solu.on  to  |B-­‐Ax|  

SYGV    Eigenvectors  and  eigen  values    Ax=λBx  

SYEV    Compute  all  eigenvalues  of  a  symmetric  mat  A  

POTRF    Cholesky  factoriza.on  

Page 9: Accurate(Numerical(Simulaons(of( Chemical ... · PDF fileAccurate(Numerical(Simulaons(of(Chemical(PhenomenaInvolved(in(Energy( (and(Storage(Using(MADNESS(Álvaro’VázquezMayagoi3a

Eigen3  -­‐  Serial  Dense  Linear  Algebra  •  Pure  C++  template  library  –  headers  only,  no  binaries.  •  Included  in  all  Linux  distribu.ons.  •  Open  source  license:  LGPL3+  •  Users:  Google,  Qt-­‐KDE,  Shogun,  Avogadro,  FEM  codes,  …    •  Compilers:  –  GCC,  MSVC,  Intel  ICC,  Clang/LLVM  •  Processors:  –  x86/x86_64,  ARM,  PowerPC  •  BeQer  for  small  opera.ons  (compile-­‐.me  dimensioning  in  

some  cases).  •  Works  around  LAPACK  thread-­‐safety  problems.  

typedef Matrix<double,Dynamic,Dynamic> MatrixX; MatrixX A, B, X; // init A and B // solve for A.X=B using LU decomposition X = A.lu().solve(B);

Page 10: Accurate(Numerical(Simulaons(of( Chemical ... · PDF fileAccurate(Numerical(Simulaons(of(Chemical(PhenomenaInvolved(in(Energy( (and(Storage(Using(MADNESS(Álvaro’VázquezMayagoi3a

0

50

100

150

200

250

300

0 5 10 15 20

LAPACK

EIGEN

Matrix  M  size  (n  by  n)  

CPU  .me  /m

illise

cond

s  M =UΣV *Singular  Value  Decomposi.on  (SVD)  

*Serial  execu.on  in  1  core  in  BG/Q  

Eigen3  -­‐  Serial  Dense  Linear  Algebra  

Page 11: Accurate(Numerical(Simulaons(of( Chemical ... · PDF fileAccurate(Numerical(Simulaons(of(Chemical(PhenomenaInvolved(in(Energy( (and(Storage(Using(MADNESS(Álvaro’VázquezMayagoi3a

Matrix  M  size  (n  by  n)  

CPU  .me  /secon

ds  

M =UΣV *Singular  Value  Decomposi.on  (SVD)  

0

40

80

120

160

30 50 70 90 110 130 150 170 190

LAPACK EIGEN

*Serial  execu.on  in  1  core  in  BG/Q  

Eigen3  -­‐  Serial  Dense  Linear  Algebra  

Page 12: Accurate(Numerical(Simulaons(of( Chemical ... · PDF fileAccurate(Numerical(Simulaons(of(Chemical(PhenomenaInvolved(in(Energy( (and(Storage(Using(MADNESS(Álvaro’VázquezMayagoi3a

Elemental-­‐  Parallel  Dense  Linear  Algebra  

•  Developed  by  Jack  Poulson  (Georgia  Tech)  with  collaborators  at  the  University  of  Texas,  RWTH,  Argonne,  …  

•  Modern  C++  design  with  opaque  data  containers  –  very  easy  to  use  interface  compared  to  ScaLAPACK.  

•  Designed  to  make  heavy  use  of  MPI  collec.ves  rather  than  BLACS  (based  upon  p2p);  topology-­‐aware  on  Blue  Gene.  

•  Thread-­‐safe  and  mul.threaded  using  OpenMP  in  matrix  opera.ons  and  Pthreads  in  RWTH’s  MRRR  tridiagonal  solver.  

•  Open-­‐source  license:  new  BSD.    •  Used  by  MADNESS,  Qbox,  MPQC,  PETSc,  PSP,  Clique,  etc.  

•  NSF-­‐funded  collabora.on  between  Elemental  and  chemists.  

Page 13: Accurate(Numerical(Simulaons(of( Chemical ... · PDF fileAccurate(Numerical(Simulaons(of(Chemical(PhenomenaInvolved(in(Energy( (and(Storage(Using(MADNESS(Álvaro’VázquezMayagoi3a

DistMatrix  18  by  18  

Grid  2  by  3  

6  MPI  processes  

The  data  is  distributed  in  the  grid  of  processes.  The  set  processes  is  mapped  

in  a  2D  Grid.  

e.g.  

Elemental-­‐  Parallel  Dense  Linear  Algebra  

Page 14: Accurate(Numerical(Simulaons(of( Chemical ... · PDF fileAccurate(Numerical(Simulaons(of(Chemical(PhenomenaInvolved(in(Energy( (and(Storage(Using(MADNESS(Álvaro’VázquezMayagoi3a

Example:  Matrix  Matrix  Mul.plica.on    

AllGather  opera.ons  are  performed  by  rows,  then  by  columns  of  Grid  

Elemental-­‐  Parallel  Dense  Linear  Algebra  

Page 15: Accurate(Numerical(Simulaons(of( Chemical ... · PDF fileAccurate(Numerical(Simulaons(of(Chemical(PhenomenaInvolved(in(Energy( (and(Storage(Using(MADNESS(Álvaro’VázquezMayagoi3a

0.0  

0.2  

0.4  

0.6  

0.8  

1.0  

1.2  

8   16   32   64   128  

3200  

3400  

3600  

5000  

6000  

8000  

9000  

N  processes  

Speed  Up  

Performance  rela.ve  to  8  processes  

1    MPI  process  =  1  node  

Elemental-­‐  Parallel  Dense  Linear  Algebra  

Eigensolve  Ax=wBx  

Page 16: Accurate(Numerical(Simulaons(of( Chemical ... · PDF fileAccurate(Numerical(Simulaons(of(Chemical(PhenomenaInvolved(in(Energy( (and(Storage(Using(MADNESS(Álvaro’VázquezMayagoi3a

Elemental Eigen3 LAPACK Water*70 (Full SCF) 1272.4 1387.7 1357.3 16 nodes 6.7% -2.2%

Water*213 (maxiter=2) 2753.5 3546.9 3171.4 256 nodes 15.2% -10.6%

~700  s  1.5  s  eigen  solu.on  .me  per  itera.on    

•  Eigen3  has  a  similar  performance  than  LAPACK  

•  The  poten.al  of  Elemental  is  clear  when  we  deal  with  big  systems  (>1K  size  matrices).  

 

Elemental-­‐  Parallel  Dense  Linear  Algebra  

Page 17: Accurate(Numerical(Simulaons(of( Chemical ... · PDF fileAccurate(Numerical(Simulaons(of(Chemical(PhenomenaInvolved(in(Energy( (and(Storage(Using(MADNESS(Álvaro’VázquezMayagoi3a

•  Interfacing  Eigen3  or  Elemental  to  a  C++  code  is  not  difficult.  

•  Eigen3  is  an  excellent  compe.tor  of  LAPACK  for  small  matrices  in  BG/Q,  Eigen3  could  improved  with  specific  tuning  for  PowerPC.  (Future  versions  will  be  mul.-­‐threaded).  

•  Elemental  in  combina.on  with  LAPACK  has  shown  an  outstanding  performance  Blue  Gene/Q.  

•  Elemental  is  a  modern  replacement  for  ScaLAPACK.    Quantum  chemistry  codes,  par.cularly  those  that  use  hybrid  programming  models  (i.e.  MPI+Threads)  or  use  modern  programming  languages  like  C++  will  benefit.  

Summary:  Dense  Linear  Algebra  

Page 18: Accurate(Numerical(Simulaons(of( Chemical ... · PDF fileAccurate(Numerical(Simulaons(of(Chemical(PhenomenaInvolved(in(Energy( (and(Storage(Using(MADNESS(Álvaro’VázquezMayagoi3a

•  Dozens  of  PMRs  filed  for  XLC;  mo.vated  fundamental  changes  in  the  IBM  XL  OpenMP  run.me;  both  MPQC  and  MADNESS  found  thread  safety  issues  in  MPI.  

•  Port  of  Intel  TBB  to  Blue  Gene  (BlueTBB)  is  mo.vated  by  MADNESS  but  broadly  applicable  (e.g.  Deal.II).  

•  MPQC  has  been  ported  but  not  a  lot  of  work  yet  due  to  Curt  Janssen  moving  to  Google;  MPQC  ported  almost  immediately  on  VEAS@IBM;  VT  group  is  working  on  addi.onal  developments.  

•  Ongoing  improvements  to  Elemental  driven  by  Blue  Gene  and  MADNESS  usage.  

•  Aquarius  (AQ)  and  Cyclops  Tensor  Framework  (CTF)  have  been  developed  as  part  of  this  alloca.on  since  they  are  like-­‐minded  efforts  that  provide  unique  features  (QMBT).  

Other  contribu.ons  of  this  ESP  

Page 19: Accurate(Numerical(Simulaons(of( Chemical ... · PDF fileAccurate(Numerical(Simulaons(of(Chemical(PhenomenaInvolved(in(Energy( (and(Storage(Using(MADNESS(Álvaro’VázquezMayagoi3a

Density  Func.onal  Theory  

DFT  is  an  exact  theory,  and  the  energy  of  the  ground  state  is  given  precisely  if  the  correct  func.onal  is  known  and  minimized.  Electron  density  is  a  fundamental  variable.  

E = F[ρ0 ]− drρo(r)υext (r)∫ + 12

ZAZB

RABAB∑

ü  DFT  can  compute  energies  of    chemical  systems  

υcoul[ρ]=δECoul[ρ]

δρ=ρ(r ')r − r '

Eele = T0[ρ0 ]+ dr ρo(r ')r − r '

ρo(r)∫ +Vxc[ρ0 ]+ drρo(r)υext (r)∫

υcoul[ρ]=δEXC[ρ]δρ

Exchange  Correla.on  Func.onal