Paraiso project - cps-jp.org
Transcript of Paraiso project - cps-jp.org
![Page 1: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/1.jpg)
「Paraiso」project for automated generation
of partial differential equations solvers for parallel computers
Takayuki Muranush @nushio
Astrophisicist, Assistant professor at
The Hakubi Center, Kyoto University(2010-2015)
![Page 2: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/2.jpg)
Chapter 1. An Introduction to a Problem
we (numerical astronomers) have.
![Page 3: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/3.jpg)
![Page 4: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/4.jpg)
Acceleration
is
Parallelization
![Page 5: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/5.jpg)
![Page 6: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/6.jpg)
1 node
メモリ: (4GBx6) + (8GBx3 + 2GBx3) CPU:Westmere EP x2 GPU:Tesla 2050 (515Gflops + 3GB) x3 通信:Infiniband QDR 10GB/s ローカルディスク:SSD x2 RAID0 (460MB/s read)
2.2×1015flops
2.4?×1014flops
1.3×1013 Byte
7.6×1013 Byte
7.0? ×1015 Byte
Disks
1.4×1013 Byte/s
6.1×1014 Byte/s
3.4×1013 Byte/s
6.6×1011 Byte/s
1×1014 Byte/s
GPU CPU
GPUのメモリ ホストの
メモリ
SSD
DISK
Communications
L1 L2 L3
shared L1 L2
TSUBAME 2.0 Memory Hierarchy
![Page 7: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/7.jpg)
• Thanks to the architects, today we have access to huge amount of memory and compute capability.
• That doesn’t come for free to the programmers. anymore.
90’832’896 CUDA Thread in Parallel
L1 Cache
L2 Cache
VRAM
HOST MEMORY
SSD
Hard Disk
Register
Shared Memory
4,386,816 floating operations in Parallel
![Page 8: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/8.jpg)
What kind of computations
I’d like to do with such hardwares?
![Page 9: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/9.jpg)
many categories of problems
![Page 10: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/10.jpg)
Target Problem of Paraiso: Partial Differential Equations,
Explicit Solvers, on Uniform Mesh
General Relativity
Magneto-Hydrodynamics Hydrodynamics
Radiative Transfer (Relativistic)
• Hyperbolic PDEs that appers in astrophysics • combinations of these equations • combinations with chemistry etc..
![Page 11: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/11.jpg)
Partial Differential Equations, Explicit Solvers, on Uniform Mesh
From computational point of view: • They are d-Dimensional, real-number cell
automata. • The state of each cell is a tuple of real numbers. • The state of the cell at generation (n+1) is defiend
as function of the states of its neighbor cells at generation (n).
n-th generation n+1 -th generation
![Page 12: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/12.jpg)
What kind of equations?
• For example, the General Theory of Relativity which only two man on the earth truly understood, is as follows
it’s easy, isn’t it?
![Page 13: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/13.jpg)
What kind of algorithms?
• A description of BSSN algorithm, to solve the Einstein’s equations
![Page 14: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/14.jpg)
What kind of code we use?
![Page 15: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/15.jpg)
*BSSN algorithm Fortran+OpenMP
![Page 16: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/16.jpg)
*My MHD Solver in CUDA+MPI
![Page 17: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/17.jpg)
What’s wrong?
• what makes our programs this long?
![Page 18: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/18.jpg)
Programming is to choose
algebraic concepts
physical equations
time integration methods
space interpolation methods
data structures
optimization techniques
and hardware designs
tensors, its symmetry…
HD, MHD, GR, …
1st order, 2nd order, 4th order, more…
1st, 2nd, 3rd, TVD, Shock…
SoA, AoS, distribution, communication timing…
CPU, GPU, what next, …
![Page 19: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/19.jpg)
To make things worse...
• We write more than one program.
• We make trial & error in search for better programs
• Suddenly the architecture changes and we are forced to use SSE, CUDA, AVX ...
• We need to insert communications correctly
To make a single point of change, we have to grep all over the codes
![Page 20: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/20.jpg)
Modern Parallel Programming is like this
The amount of programs we write in our life is the product of the factors mentioned
Specify each of the sufficient knowledge modules, and programs like above are automatically generated
I want it like this
![Page 21: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/21.jpg)
What a code generator aims for
• Generally you write Nf×Nmath×Neq×Nint×Nhw… lines of code
• You find a bug / improvement and want Neq = Neq + 1; then you need to re-write Nf×Nmath×1×Nint×Nhw… lines
• With code generator you only have to write
Nf + Nmath + Neq + Nint + Nhw… lines
• You want Neq = Neq + 1; then just add 1 line
• You can concentrate on physics
![Page 22: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/22.jpg)
Can’t you do that in existing language
•possibly
• we can define vectors and tensors as classes
• we can overload operators
• we have templates
• we have accumulated sophisticated techniques such as expression templates
![Page 23: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/23.jpg)
as a result ...
![Page 24: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/24.jpg)
I want a language that is modular
(easy to reuse components) transplantable
fast beautiful
• hopeless or is it...?
if you limit the problem domain
![Page 25: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/25.jpg)
Paraiso Input: Discretized Algorithms for solving Partial
Differential Equations, in mathematical notations
Output: Implementations on Distributed, Manycore Machines.
PARallel Automated Integration Scheme Organizer
The overall design of Chapter 2.
![Page 26: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/26.jpg)
8 building blocks of PDE solvers
Imm load constant value Load (graph starts here) read from named array Store (graph ends here) write to named array Reduce array to scalar value Broadcast scalar to array Shift copy each cell to neighbourhood LoadIndex & LoadSize get coordinate of each cell get array size Arith various mathematical operations
![Page 27: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/27.jpg)
Basic Equations
Discretized Form
OM Dataflow Graph
Native Code
Executable Files
a_i_j <- … …
q_i <- dt *
a_i_j * f_j
ld r2, g2[0,0,0]
ld r1, g2[0,0,1]
add r1,r2,r3
st r3,g1
*q=cudaMalloc(…);
__shared__ a,b;
a=q[idx];
b=q[idx+1];
p[idx]=a+b;
Manually
OM Builder
OM Compiler
Native Compiler
Orthotope Machine Code
result
Discrete PDE Language
![Page 28: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/28.jpg)
サーベイしたところ出てきた 先行研究
![Page 29: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/29.jpg)
かなり似ている・・・
基礎方程式
離散化形
VVM上のコード
実マシン上のコード
実マシン上の実行ファイル
さしあたり人手
自動
自動
既存コンパイラ
![Page 30: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/30.jpg)
DEQSOLの末路・・・
![Page 31: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/31.jpg)
Discretized PDE Language
• Describe your numerical algorithm using natural notations e.g. tensor notations, difference operators.
• Translates to Orthotope Machine
Algorithm notation
![Page 32: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/32.jpg)
Orthotope Machine • A Real number cellular automata
• A virtual machine with multidimensional array vector register of infinite size
• arithmetic operations work in parallel on each mesh, loads from neighbour cells,
No intention of buiding a real hardware:
a thought object to construct a dataflow graph
![Page 33: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/33.jpg)
Orthotope Machine Compiler • convert dataflow graph
to real codes
• Divide OM registers onto distributed memories
• Generate Communication codes
• The code generator has wide choice on data layout, granularity of communication, computation
Orthotope Machine Code
Native Codes
OM Compiler
![Page 34: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/34.jpg)
Basic Equations
Discretized Form
OM Dataflow Graph
Native Code
Executable Files
a_i_j <- … …
q_i <- dt *
a_i_j * f_j
ld r2, g2[0,0,0]
ld r1, g2[0,0,1]
add r1,r2,r3
st r3,g1
*q=cudaMalloc(…);
__shared__ a,b;
a=q[idx];
b=q[idx+1];
p[idx]=a+b;
Manually
OM Builder
OM Compiler
Native Compiler
Orthotope Machine Code
result
Discrete PDE Language
Chapter 3.
![Page 35: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/35.jpg)
More Details on Orthotope Machine
![Page 36: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/36.jpg)
Orthotope Machine
![Page 37: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/37.jpg)
Three Type parameters OM components have
Type Constructor for The dimension of the Orthotope Machine
Type of the Array Index
Type of the Annotation you can apply at graph node
![Page 38: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/38.jpg)
![Page 39: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/39.jpg)
bipartile graph consisting of value nodes and inst nodes
![Page 40: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/40.jpg)
8 building blocks of PDE solvers
Imm load constant value Load (graph starts here) read from named array Store (graph ends here) write to named array Reduce array to scalar value Broadcast scalar to array Shift copy each cell to neighbourhood LoadIndex & LoadSize get coordinate of each cell get array size Arith various mathematical operations
![Page 41: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/41.jpg)
Mul
Add
Shift(-1,0)
Reduce(Min)
Broadcast
1 4
7
2 5
8
3 6
9
1 4
7
2 5
8
3 6
9
4 10
16
3 9
15
5 11
17
12 30
48
9 27
45
15 33
51
3 3
3
3 3
3
3 3
3
3
NValue NInst
global value (scalar value)
local value(Array)
local value(Array)
Load(“hoge”)
Store(“hoge”)
![Page 42: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/42.jpg)
DynValue = Array (data container) with realm and element type information
Local Realm = Array Global Realm = Scalar data
![Page 43: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/43.jpg)
Type-level representation of Array
![Page 44: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/44.jpg)
Value : type-level DynValue : value-level
![Page 45: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/45.jpg)
• User interface is in Type-level
• The type-checker helps user
• and assures type-consistency for the backend
• Dataflow graph under cover is Value-level
• can handle the graph in one type.
Value TLocal Float -> Value TGlobal Int -> Value Tlocal Float
Value TLocal Float
Value TGlobal Int
DynValue -> DynValue -> DynValue
DynValue
DynValue
User space Internal Dataflow Graph
![Page 46: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/46.jpg)
Basic Equations
Discretized Form
OM Dataflow Graph
Native Code
Executable Files
a_i_j <- … …
q_i <- dt *
a_i_j * f_j
ld r2, g2[0,0,0]
ld r1, g2[0,0,1]
add r1,r2,r3
st r3,g1
*q=cudaMalloc(…);
__shared__ a,b;
a=q[idx];
b=q[idx+1];
p[idx]=a+b;
Manually
OM Builder
OM Compiler
Native Compiler
Orthotope Machine Code
result
Discrete PDE Language
Chapter 4.
![Page 47: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/47.jpg)
Builder Monads constructs dataflow graph
the interface has type-level info on Realm and Type but internal representations are value-level on Realm and Type
![Page 48: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/48.jpg)
a helper function to define binary operators for Builder Monad
![Page 49: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/49.jpg)
Builder monad being an Additive Builder monad being a Ring
![Page 50: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/50.jpg)
• programing language Paraiso lacks frontend (lexer, parser, AST analyzer ...)
• Instead, Paraiso components are first-class objects of a widely-used language (Haskell) (the (ultimate (source (of (programmability)))))
![Page 51: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/51.jpg)
Example : Implement Hydro Solver
![Page 52: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/52.jpg)
An HLLC Riemann-solver in Builder Monad
It looks like usual mathematics, but every terms appear here are Builder Monads, and the expressions are as themselves code generators for Orthotope Machine.
![Page 53: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/53.jpg)
sample hydrodynamics solver in Paraiso
![Page 54: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/54.jpg)
Basic Equations
Discretized Form
OM Dataflow Graph
Native Code
Executable Files
a_i_j <- … …
q_i <- dt *
a_i_j * f_j
ld r2, g2[0,0,0]
ld r1, g2[0,0,1]
add r1,r2,r3
st r3,g1
*q=cudaMalloc(…);
__shared__ a,b;
a=q[idx];
b=q[idx+1];
p[idx]=a+b;
Manually
OM Builder
OM Compiler
Native Compiler
Orthotope Machine Code
result
Discrete PDE Language
Chapter 5.
![Page 55: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/55.jpg)
old code generator...
• not so productive
OM Dataflow Graph
Native Code
directly
runBinder graph0 n0 binder = unlines $ header ++ [bindStr] ++ footer where (header,footer) = case context state of CtxGlobal -> ([],[]) CtxLocal loopIndex -> ([loop (symbol Cpp loopIndex) ++ " {"], ["}"]) loop i = "for (int " ++ i ++ " = 0 ; " ++ i ++ " < " ++ symbol Cpp sizeName ++ "() ; " ++ "++" ++ i ++ ")"
![Page 56: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/56.jpg)
code generator ver. 2 OM Dataflow
Graph
Native Code
OMTrans Optimization :: OM -> OM
Analysis = add annotation
Plan = decisions made upon
• how much memory to allocate
• which part of calculation to take place in same subroutine
Claris
• a C++ -like syntax tree with CUDA extension.
Plan
PlanTrans
Claris
ClarisTrans
Annotated and Optimized OM
Optimization/Analysis
![Page 57: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/57.jpg)
Annotation • Allocation, Ballon, Boundary, Comment,
Dependency
-- | a type that represents valid region of computation. newtype Valid g = Valid [Interval (NearBoundary g)] -- | the displacement around either side of the boundary. data NearBoundary a = NegaInfinity | LowerBoundary a | UpperBoundary a | PosiInfinity deriving (Eq, Ord, Show, Typeable) data Allocation = Existing -- ^ This entity is already allocated as a static variable. | Manifest -- ^ Allocate additional memory for this entity. | Delayed -- ^ Do not allocate, re-compute it whenever if needed. deriving (Eq, Show, Typeable)
![Page 58: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/58.jpg)
Analysis
• decide allocation
• boundary analysis
• dependency analysis and write grouping
optimize level = case level of O0 -> gmap identity . writeGrouping . gmap boundaryAnalysis . gmap decideAllocation _ -> optimize O0
![Page 59: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/59.jpg)
Allocation
• some of the dataflow graph nodes are marked ‘Manifest.’
• Manifest nodes are stored in memory.
• Delayed nodes are re-computed as needed.
data Allocation = Existing -- ^ This entity is already allocated as a static variable. | Manifest -- ^ Allocate additional memory for this entity. | Delayed -- ^ Do not allocate, re-compute it whenever if needed. deriving (Eq, Show, Typeable)
![Page 60: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/60.jpg)
• Less computation
for(;;){
f[i] = calc_f(a[i], a[i+1]);
}
for (;;){
b[i] += f[i] – f[i-1];
}
• Less storage consumption & access for(;;){
f0 = calc_f(a[i-1], a[i]);
f1 = calc_f(a[i], a[i+1]);
b[i] += f1 – f0;
}
a
f
b
a
f
b
![Page 61: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/61.jpg)
Valid Boundary Analysis
• Shift operations create undefined regions in value.
• Boundary analysis trace this to find out valid regions for every node in the graph.
• How many additional mesh we need to obtain valid answers for desired region.
• To generate boundary-region communications.
Add
Shift(-1,0)
1 4
7
2 5
8
3 6
9
nan nan
nan
2 5
8
3 6
9
nan nan
nan
3 9
15
5 11
17
NValue NInst
![Page 62: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/62.jpg)
write grouping Kernel
• a user-defined API of the generated class
Subkernel
• a set of calculation executed in a loop
• = Fortran subroutine
• = CUDA __global__ kernel
void Life::proceed () { Life_sub_2(static_2_cell, manifest_1_67); Life_sub_3(static_1_generation, manifest_1_67, manifest_1_69, manifest_1_74); (static_0_population) = (manifest_1_69); (static_1_generation) = (manifest_1_74); (static_2_cell) = (manifest_1_67); }
![Page 63: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/63.jpg)
a Kernel
![Page 64: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/64.jpg)
write grouping = a Kernel -> subkernels
• all node written by one subkernel must have the same valid region
• nodes written by one subkernel must not depend on each other
• greedy
![Page 65: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/65.jpg)
a Kernel Existing nodes
![Page 66: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/66.jpg)
a Kernel Existing nodes
write group 0
(・A・) dependency
(・A・) not dependent but different boundary
![Page 67: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/67.jpg)
a Kernel Existing nodes
subkernel 0
![Page 68: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/68.jpg)
a Kernel Existing nodes
subkernel 0
write group 1
![Page 69: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/69.jpg)
a Kernel Existing nodes
subkernel 0
subkernel 1
![Page 70: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/70.jpg)
a Kernel Existing nodes
subkernel 0
subkernel 1
write group 2
![Page 71: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/71.jpg)
a Kernel Existing nodes
subkernel 0
subkernel 1
subkernel 2
![Page 72: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/72.jpg)
Diffusion Equation
• an example of code manipulation
![Page 73: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/73.jpg)
![Page 74: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/74.jpg)
Diffusion Equation Example Annotation size of .cpp file number of
subkernels memory consumption
no annotation 1019 lines 4 5 x N
Manifest 301 lines 6 9 x N
![Page 75: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/75.jpg)
homework 「内職」
all Paraiso-generated code become OpenMP Compatible
in one line!
![Page 76: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/76.jpg)
x8 faster! cpu consumption
![Page 77: Paraiso project - cps-jp.org](https://reader030.fdocuments.net/reader030/viewer/2022012806/61bd40e061276e740b10e9a6/html5/thumbnails/77.jpg)