Parallel Programming with Synthesis...

1
Parallel Programming with Synthesis 2.0 Shaon Barman Rastislav Bodik Sagar Jain Yewen Pu Saurabh Srivastava Nicholas Tung ParLab spec: int foo (int x) { return x + x; } sketch: int bar (int x) implements foo { return x << ??; } result: int bar (int x) implements foo { return x << 1; } SKETCH synthesizer x + x x << ?? x << 1 int spec (int x) { return x*x*x - 19*x + 30; } #define Root {| ?? | -?? |} int sketch (int x) implements spec { return (x - Root) * (x - Root) * (x - Root); } void sketch(int x) implements spec { return (x+5)*(x-3)*(x-2); } SKETCH synthesizer int[16] trans(int[16] M) { int[16] T = 0; for (int i = 0; i < 4; i++) for (int j = 0; j < 4; j++) T[4 * i + j] = M[4 * j + i]; return T; } int[16] trans_sse(int[16] M) implements trans { int[16] S = 0, T = 0; repeat (??) S[??::4] = shufps(M[??::4], M[??::4], ??); repeat (??) T[??::4] = shufps(S[??::4], S[??::4], ??); return T; } int[16] trans_sse(int[16] M) implements trans { S[4::4] = shufps(M[6::4], M[2::4], 11001000b); S[0::4] = shufps(M[11::4], M[6::4], 10010110b); S[12::4] = shufps(M[0::4], M[2::4], 10001101b); S[8::4] = shufps(M[8::4], M[12::4], 11010111b); T[4::4] = shufps(S[11::4], S[1::4], 10111100b); T[12::4] = shufps(S[3::4], S[8::4], 11000011b); T[8::4] = shufps(S[4::4], S[9::4], 11100010b); T[0::4] = shufps(S[12::4], S[0::4], 10110100b); } SKETCH synthesizer A simple optimization Factor a polynomial SIMD transpose of a 4x4 matrix We are now working on exploiting synthesis throughout the HPC workflow, from design of parallel algorithms to kernel tuning. Ask us about details or come to our talk. Domain expert: problem spec dynamic programming parallel scan O(2 n ) O(n) O(log n) Parallel algorithm expert: example of parallel scan network SIMD algorithm GPU tuning expert: SIMD algorithm bank conflicts index expressions What are the limitations behind the magic? Sketch doesn’t produce a proof of correctness: SKETCH checks correctness of the synthesized program on all inputs of up to certain size. The program could be incorrect on larger inputs. This check is up to programmer. Scalability: Some programs are too hard to synthesize. Hence our proposal to use refinement, which provides modularity and breaks the synthesis task into smaller problems. Proposed solution: Gradually refine the program In each programming step, optimize just one function. This reduces the scope of synthesis, improving scalability and simplifying verification. How a classical code generator is developed: Step 1: Codegen developer writes a skeleton of how the generated code should look like. Step 2: He writes the code generator that produces the desired code variant, by parameterizing the skeleton. How a synthesing code generator is developed: Step 1: The developer writes only the template. No actual generator is developed. The synthesizer completes the template to make the code variants behave like the spec. Try SKETCH online at bit.ly/sketch-language

Transcript of Parallel Programming with Synthesis...

Page 1: Parallel Programming with Synthesis 2parlab.eecs.berkeley.edu/sites/all/parlab/files/evanposter.pdf · Parallel Programming with Synthesis 2.0 Shaon Barman Rastislav Bodik Sagar Jain

Parallel Programming with Synthesis 2.0 Shaon Barman Rastislav Bodik Sagar Jain Yewen Pu Saurabh Srivastava Nicholas Tung

ParLab

spec: int foo (int x) {

return x + x; }

sketch: int bar (int x) implements foo {

return x << ??; }

result: int bar (int x) implements foo {

return x << 1; }

SKETCH synthesizer

x + x

x << ?? x << 1

int spec (int x) { return x*x*x - 19*x + 30; }

#define Root {| ?? | -?? |} int sketch (int x) implements spec { return (x - Root) * (x - Root) * (x - Root); }

void sketch(int x) implements spec { return (x+5)*(x-3)*(x-2); }

SKETCH synthesizer

int[16] trans(int[16] M) { int[16] T = 0; for (int i = 0; i < 4; i++) for (int j = 0; j < 4; j++) T[4 * i + j] = M[4 * j + i]; return T; }

int[16] trans_sse(int[16] M) implements trans { int[16] S = 0, T = 0; repeat (??) S[??::4] = shufps(M[??::4], M[??::4], ??); repeat (??) T[??::4] = shufps(S[??::4], S[??::4], ??); return T; }

int[16] trans_sse(int[16] M) implements trans { S[4::4] = shufps(M[6::4], M[2::4], 11001000b); S[0::4] = shufps(M[11::4], M[6::4], 10010110b); S[12::4] = shufps(M[0::4], M[2::4], 10001101b); S[8::4] = shufps(M[8::4], M[12::4], 11010111b); T[4::4] = shufps(S[11::4], S[1::4], 10111100b); T[12::4] = shufps(S[3::4], S[8::4], 11000011b); T[8::4] = shufps(S[4::4], S[9::4], 11100010b); T[0::4] = shufps(S[12::4], S[0::4], 10110100b); }

SKETCH synthesizer

A s

imp

le o

pti

miz

atio

n

Fact

or

a p

oly

no

mia

l

SIM

D t

ran

spo

se o

f a

4x

4 m

atri

x

We are now working on exploiting synthesis throughout the HPC workflow, from design of parallel algorithms to kernel tuning. Ask us about details or come to our talk.

Domain expert:

problem spec → dynamic programming → parallel scan O(2n) O(n) O(log n)

Parallel algorithm expert:

example of parallel scan network → SIMD algorithm

GPU tuning expert:

SIMD algorithm → bank conflicts → index expressions

What are the limitations behind the magic?

Sketch doesn’t produce a proof of correctness:

SKETCH checks correctness of the synthesized program on all inputs of up to certain size. The program could be incorrect on larger inputs. This check is up to programmer.

Scalability:

Some programs are too hard to synthesize. Hence our proposal to use refinement, which provides modularity and breaks the synthesis task into smaller problems.

Proposed solution: Gradually refine the program

In each programming step, optimize just one function. This reduces the scope of synthesis, improving scalability and simplifying verification.

How a classical code generator is developed:

Step 1: Codegen developer writes a skeleton of how the generated code should look like.

Step 2: He writes the code generator that produces the desired code variant, by parameterizing the skeleton.

How a synthesing code generator is developed:

Step 1: The developer writes only the template. No actual generator is developed. The synthesizer completes the template to make the code variants behave like the spec.

Try SKETCH online at bit.ly/sketch-language