Nvidia Cuda Apps Jun27 11
-
Upload
dominic-monkhouse -
Category
Technology
-
view
1.855 -
download
0
description
Transcript of Nvidia Cuda Apps Jun27 11
––
Accelerating High Performance Applications
Strategic Focus on Applications
Senior-level relationship and market
managers
Dedicated technical resources
More than 150 people devoted to
libraries, tools, application porting
and market development
Worldwide focus
Reaching a Broad Range of Markets
Scientific computing Creative pro Education / research
CAD/ CAM/
CAID
CAE/ EDA
Computational
chemistry
Computational
Finance
Defence &
Intelligence
Digital
Content
creation
Physical
Sciences
Seismic
processing
and
visualization
Autodesk
Ansys
Amber
MATLAB Ikena Adobe
Quda (L-QCD) Schlumberger
Dassault
Systemes:
CATIA
Solidworks
Dassault
Systemes:
Simulia
NAMD
Mathematica Intergraph Autodesk M&E
WRF Landmark
PTC
Nastran
Gromacs NAG ESRI Avid
ACUSA Paradigm
Siemens
LSTC
Lammps Murex Manifold MainConcept
HOMME
Synopsys
GAMESS
Sony HYCOM
Strategic Partners
Application Features
Supported GPU Perf Release Status Notes
AMBER PMEMD :
Explicit & Implicit
Solvent 8X V11 Released
Single and multi-GPUs.
Expect 2x more performance in
V11 patch release (shortly)
GROMACS Implicit (5x), Explicit
(2x) Solvent 2x-5x
Single GPU released,
Version 4.5.4
Next release: 2H2011
Better Explicit, MPI
LAMMPS Lennard-Jones, Gay-
Berne 6x Released Single and multi-GPU.
NAMD Non-bond force
calculation 2x-7x Released, v2.8 Single and multi-GPU.
Leading MD Applications
GPU Perf compared against Multi-core x86 CPU socket.
GPU Perf benchmarked on GPU supported features
and may be a kernel to kernel perf comparison
Application Features
Supported GPU Perf Release Status Notes
Abalone TBD,
“Simulations” 4-29X
(on 1060 GPU) Released
Single GPU.
Agile Molecule, Inc.
ACEMD Written for use on
GPUs
“µ-sec long
trajectories on
workstation”
Released
Production bio-molecular
dynamics (MD) software specially
optimized to run on single and
multi-GPUs
DL_POLY Two-body Forces, Link-
cell Pairs, Ewald SPME
forces, Shake VV 4x
V 4.0 Source only
Results Published
Next release: 2H2011
Multi-GPU, multi-node supported
HOOMD-
Blue Written for use on
GPUs
2X (32 CPU cores vs.
2 10XX GPUs)
Released, Version
0.9.2 Single and multi-GPU.
Additional MD/MM Applications Ramping
GPU Perf compared against Multi-core x86 CPU socket.
GPU Perf benchmarked on GPU supported features
and may be a kernel to kernel perf comparison
Related
Applications
Features
Supported GPU Perf Release Status Notes
Amira 5® 3D visualization of
volumetric data and
surfaces
N/A Released, Version 5.3.3
Visualization from Visage
Imaging. Next release, 5.4, will
use GPU for general purpose
processing in some functions
Core
Hopping GPU accelerated
application
Up to
5000X Released, Suite 2011
Single and multi-GPUs.
Schrodinger, Inc.
FastROCS Real-time shape
similarity
searching/comparison
800-3000X Released Single and multi-GPUs.
Open Eyes Scientific Software
VMD
High quality rendering,
large structures (100 million atoms),
GPU acceleration for
computationally demanding analysis
and visualization tasks, multiple
GPU support for very fast display of
molecular orbitals arising in
quantum chemistry calculations
100-125X or
greater Released, Version 1.9
Visualization from University of
Illinois at Urbana-Champaign
Viz and “Docking” Applications
GPU Perf compared against Multi-core x86 CPU socket.
GPU Perf benchmarked on GPU supported features
and may be a kernel to kernel perf comparison
Application Features
Supported
GPU
Perf Release Status Notes
GAMESS-US
Libqc with Rys
Quadrature Algorithm,
integral evaluation,
closed shell Fock
matrix construction
2.5X Released
Single GPU supported in 10/1/10
release.
Multi-GPU supported in
July 2011 release.
NWChem
Triples part of Reg-
CCSD(T), CCSD &
EOMCCSD task
schedulers
3-8X
projected
Date TBA,
in development
Development GPGPU
benchmarks: www.nwchem-
sw.org
Q-CHEM Various features
including RI-MP2
8-14x
projected
Date TBA,
In development
Significant porting already
TeraChem “Full GPU-based
solution”
44-650X
vs.
GAMESS
CPU ver.
Version 1.45 released
Single and Multi-GPU.
Completely redesigned to exploit
massive GPU parallelism
Quantum Chemistry
GPU Perf compared against Multi-core x86 CPU socket.
GPU Perf benchmarked on GPU supported features
and may be a kernel to kernel perf comparison
Application Features
Supported
GPU
Perf Release Status Notes
Abinit BigDFT - 50% of the
program (short
convolutions) 6-30X Released June 2009
http://inac.cea.fr/L_Sim/BigDFT
/news.html
Quantum-
Espresso/
PWscf
PWscf package: linear
algebra (matrix
multiply), explicit
computational kernels,
3D FFTs
TBD Released May 5, 2011 Created by Irish Centre for High-
End Computing
Material Science
GPU Perf compared against Multi-core x86 CPU socket.
GPU Perf benchmarked on GPU supported features
and may be a kernel to kernel perf comparison
Bioinformatics
CUDA-BLASTP
CUDA-EC
CUDA-MEME
CUDASW++ (Smith-Waterman)
DNADist
GPU Blast
GPU-HMMER
HEX Protein Docking
Jacket (MATLAB Plugin)
MUMmerGPU
MUMmerGPU++
SARUMAN
SeqNFind
UGENE
Additional details can be found at Tesla Bio Workbench:
http://www.nvidia.com/object/tesla_bio_workbench.html
Structural Mechanics
GPU Perf compared against Multi-core x86 CPU socket.
GPU Perf benchmarked on GPU supported features and may be a kernel to kernel perf comparison
Application GPU Features GPU Perf Release Status Notes
ANSYS Mechanical Linear eqn solvers 2x Total Today, release 13 SP2 FE implicit, single-GPU
Abaqus/Standard Linear eqn solver 2x Total Today, release 6.11 FE implicit, single-GPU
IMPETUS Afea Explicit solver, SPH 10x SPH, 2x Total Today, release 1.0 FE explicit, multi-GPU
LS-DYNA implicit Linear eqn solver 3x Total Planned for 2011 FE implicit, multi-GPU
MD Nastran Linear eqn solvers 2x Solver Planned for 2011 FE implicit, multi-GPU
Marc Linear eqn solver 1.5x Total Planned for 2011 FE implicit, single-GPU
RADIOSS Implicit Linear eqn solver 1.5x Total Demonstration FE implicit, single-GPU
PAM-CRASH implicit Linear eqn solver 1.5x Total Demonstration FE implicit, single-GPU
NX Nastran Linear eqn solver 1.4x Total Demonstration FE implicit, single-GPU
Fluid Dynamics
GPU Perf compared against Multi-core x86 CPU socket.
GPU Perf benchmarked on GPU supported features and may be a kernel to kernel perf comparison
Application GPU Features GPU Perf Release Status Notes
Altair AcuSolve Linear eqn solver 2x Total Today, release 1.8 FE unstructured NS, multi-GPU
Autodesk Moldflow Linear eqn solver 2x Total Today, release 2011 FE unstructured NS, single-GPU
FluiDyna LBultra LBM, particle CFD 20x Total Today, release 1.0 Structured LBM, multi-GPU
FluiDyna Culises-
OpenFOAM Solver Linear eqn solvers 3x Solver Today, release 1.0 Unstructured NS, single-GPU
Vratis SpeedIT-
OpenFOAM Solver Linear eqn solvers 3x Solver Today, release 1.2 Unstructured NS, multi-GPU
Prometech
Particleworks MPS, particle CFD 4x-9x Total Q3CY11 release 2.5 Particle based, multi-GPU
Sandia NL S3D Chemistry kernel 8x SP, 5x DP kernel Demonstration Structured grid DNS, multi-GPU
Turbostream Explicit solver 19x Total Today, release 2.0 Structured grid NS, multi-GPU
SD++ (Jameson) Explicit solver 16x Total Planned for 2011 FE unstructured NS, multi-GPU
FEFLO (Lohner) Explicit solver 2x Total Planned for 2011 FE unstructured NS, multi-GPU
Electromagnetics
Application Features
Supported GPU Perf Release Status Notes
Agilent EMPro FDTD 6X 2011.07 Released Single & multi-GPU;
EMPro 2011 PR
CST Microwave
Studio
Transient (FIT)
solver; Combined MPI
& GPU computing
9X on 1 GPU
to 20X+ on 4
GPUs
2011 Released Single & multi-GPU;
www.cst.com/perf
Remcom XFdtd FDTD 30-300X XF7 Released Single and multi-GPU;
XStream GPU acceleration
SPEAG SEMCAD X FDTD;
Acceleware 100X 14.4.3 Released
Single and multi-GPU;
www.speag.com/perf
GPU Performance compared against quad-core x86 CPU socket;
Remcom XFdtd GPU performance compared against single core CPU
Climate/ Weather/ Ocean
GPU Perf compared against Multi-core x86 CPU socket.
GPU Perf benchmarked on GPU supported features and may be a kernel to kernel perf comparison
Application GPU Features GPU Perf Production Status Notes
WRF WSM5, WSM3, Ice
Microphysics models 4x-6x Models Today, release 3.2 single-GPU
ASUCA Most routines 12x Total In production at JMA multi-GPU
NIM Most routines 7x Dynamics Limited production multi-GPU
HIRLAM Dynamical core 3x Solver Planned for 2011 multi-GPU
HOMME Models 3x Models Planned for 2011 single-GPU
CAM Linear eqn solver 2x Solver Planned for 2011 single-GPU
GEOS-5 Most routines 10x Models, 3x
Dynamics Demonstration multi-GPU
MITgcm Linear eqn solver 3x solver Demonstration single-GPU
HYCOM Linear eqn solver 2x solver Demonstration single-GPU