The Kinetic PreProcessor Accelerated - ParaTools, Inc. · Kppa: The Kinetic PreProcessor...

12
ParaTools, Inc. 1 Kppa The Kinetic PreProcessor Accelerated November 15, 2012

Transcript of The Kinetic PreProcessor Accelerated - ParaTools, Inc. · Kppa: The Kinetic PreProcessor...

Page 1: The Kinetic PreProcessor Accelerated - ParaTools, Inc. · Kppa: The Kinetic PreProcessor Accelerated Code generation Analysis Lexical parser MATLAB Kppa Fortran C Python Code optimized

ParaTools, Inc. 1

Kppa The Kinetic PreProcessor Accelerated

November 15, 2012

Page 2: The Kinetic PreProcessor Accelerated - ParaTools, Inc. · Kppa: The Kinetic PreProcessor Accelerated Code generation Analysis Lexical parser MATLAB Kppa Fortran C Python Code optimized

ParaTools, Inc. 2

Chemical Kinetic Simulation

• Pyrolysis and Combustion

• Air and Water Quality

• Climate Change

• Wildfires, Volcanic Eruptions

• Plastics Devolatilzation

• Microorganism Growth

• Cell Biology

Page 3: The Kinetic PreProcessor Accelerated - ParaTools, Inc. · Kppa: The Kinetic PreProcessor Accelerated Code generation Analysis Lexical parser MATLAB Kppa Fortran C Python Code optimized

ParaTools, Inc. 3

Solving Coupled Stiff ODE Systems is Expensive

Twelve Threads Intel® Xeon® X5650

@ 2.67GHz

Page 4: The Kinetic PreProcessor Accelerated - ParaTools, Inc. · Kppa: The Kinetic PreProcessor Accelerated Code generation Analysis Lexical parser MATLAB Kppa Fortran C Python Code optimized

ParaTools, Inc. 4

Solving Coupled Stiff ODE Systems is Expensive

Twelve Threads Intel® Xeon® X5650

@ 2.67GHz

Page 5: The Kinetic PreProcessor Accelerated - ParaTools, Inc. · Kppa: The Kinetic PreProcessor Accelerated Code generation Analysis Lexical parser MATLAB Kppa Fortran C Python Code optimized

ParaTools, Inc. 5

Solving Coupled Stiff ODE Systems is Expensive

Twelve Threads Intel® Xeon® X5650

@ 2.67GHz

Page 6: The Kinetic PreProcessor Accelerated - ParaTools, Inc. · Kppa: The Kinetic PreProcessor Accelerated Code generation Analysis Lexical parser MATLAB Kppa Fortran C Python Code optimized

ParaTools, Inc. 6

Kppa: The Kinetic PreProcessor Accelerated

Code generation Analysis Lexical

parser

MATLAB

Kppa

Fortran

C

Python

Code optimized

High-Level Description Serial

Multi-core

GPGPU

Intel MIC

Architecture

Page 7: The Kinetic PreProcessor Accelerated - ParaTools, Inc. · Kppa: The Kinetic PreProcessor Accelerated Code generation Analysis Lexical parser MATLAB Kppa Fortran C Python Code optimized

ParaTools, Inc. 7

Kppa-generated can be 40x faster than hand-tuned

• Reorder chemical species to minimize fill-in

• Reorder concentration data to maximize vectorization

• Reorder grid cells to reduce excessive iteration

Code generation Analysis Lexical

parser

0

10

20

30

40

50

1 3 5 7 9 11 13 15 17

Sp

eed

up

Threads

SMALL STRATO

Linear NVIDIA® C1060

IBM® BladeCenter® QS22 Intel® Xeon® 5400

Page 8: The Kinetic PreProcessor Accelerated - ParaTools, Inc. · Kppa: The Kinetic PreProcessor Accelerated Code generation Analysis Lexical parser MATLAB Kppa Fortran C Python Code optimized

ParaTools, Inc. 8

Meta-programming for Automatic Code Generation

Code generation

Analysis Lexical parser

<% !d_Decomp.begin()!

piv = lang.Variable('piv', REAL, ! 'Row element divided by diagonal')!piv.declare() !%>! ${size_t} idx = blockDim.x*blockIdx.x+threadIdx.x;! if(idx < ${ncells32}) {! ${A} += idx;!<%!lang.upindent()!for i in xrange(1, nvar):! for j in xrange(crow[i], diag[i]):! c = icol[j]! d = diag[c]! piv.assign(A[j*ncells32] / A[d*ncells32])! A[j*ncells32].assign(piv)!...!%>! }!<% d_Decomp.end() %>!

A[334] = t1;! A[337] = -A[272]*t1 + A[337];! A[338] = -A[273]*t1 + A[338];! A[339] = -A[274]*t1 + A[339];! A[340] = -A[275]*t1 + A[340];! t2 = A[335]/A[329];! A[335] = t2;! A[337] = -A[330]*t2 + A[337];! A[338] = -A[331]*t2 + A[338];! A[339] = -A[332]*t2 + A[339];! A[340] = -A[333]*t2 + A[340];! t3 = A[341]/A[59];!

C/C++/CUDA, Fortran

Page 9: The Kinetic PreProcessor Accelerated - ParaTools, Inc. · Kppa: The Kinetic PreProcessor Accelerated Code generation Analysis Lexical parser MATLAB Kppa Fortran C Python Code optimized

ParaTools, Inc. 9

Kppa’s Architecture Parameterization

CPU GPU Intel® MIC CBEA

Instruction Cardinality 1 32 16 4

Integrator Cardinality 1 ∞! 16 4

Scratch Size N/A 16 KB N/A 256 KB

Targeting the Intel® Xeon® Phi™ coprocessor was easy!

Just had to parameterize the new architecture in Kppa and generate an OpenMP code

Page 10: The Kinetic PreProcessor Accelerated - ParaTools, Inc. · Kppa: The Kinetic PreProcessor Accelerated Code generation Analysis Lexical parser MATLAB Kppa Fortran C Python Code optimized

ParaTools, Inc. 10

Kppa-generated SAPRC Kernel: 35.6x Speedup

0

10

20

30

40

50

60

70

80

Intel® Xeon® E3-1220 (3.1GHz)

NVIDIA® GeForce® GTS 450

Intel® Xeon Phi™ Coprocessor

73.438

16.532 2.06

Seco

nd

s

Grid: Time:

# Species: # Equations:

Method:

24x36x8 8 hours 79 211 Rodas4

Page 11: The Kinetic PreProcessor Accelerated - ParaTools, Inc. · Kppa: The Kinetic PreProcessor Accelerated Code generation Analysis Lexical parser MATLAB Kppa Fortran C Python Code optimized

ParaTools, Inc. 11

Download Kppa from www.paratools.com/kppa

Page 12: The Kinetic PreProcessor Accelerated - ParaTools, Inc. · Kppa: The Kinetic PreProcessor Accelerated Code generation Analysis Lexical parser MATLAB Kppa Fortran C Python Code optimized