Parallel Programming with Synthesis...
Transcript of Parallel Programming with Synthesis...
Parallel Programming with Synthesis 2.0 Shaon Barman Rastislav Bodik Sagar Jain Yewen Pu Saurabh Srivastava Nicholas Tung
ParLab
spec: int foo (int x) {
return x + x; }
sketch: int bar (int x) implements foo {
return x << ??; }
result: int bar (int x) implements foo {
return x << 1; }
SKETCH synthesizer
x + x
x << ?? x << 1
int spec (int x) { return x*x*x - 19*x + 30; }
#define Root {| ?? | -?? |} int sketch (int x) implements spec { return (x - Root) * (x - Root) * (x - Root); }
void sketch(int x) implements spec { return (x+5)*(x-3)*(x-2); }
SKETCH synthesizer
int[16] trans(int[16] M) { int[16] T = 0; for (int i = 0; i < 4; i++) for (int j = 0; j < 4; j++) T[4 * i + j] = M[4 * j + i]; return T; }
int[16] trans_sse(int[16] M) implements trans { int[16] S = 0, T = 0; repeat (??) S[??::4] = shufps(M[??::4], M[??::4], ??); repeat (??) T[??::4] = shufps(S[??::4], S[??::4], ??); return T; }
int[16] trans_sse(int[16] M) implements trans { S[4::4] = shufps(M[6::4], M[2::4], 11001000b); S[0::4] = shufps(M[11::4], M[6::4], 10010110b); S[12::4] = shufps(M[0::4], M[2::4], 10001101b); S[8::4] = shufps(M[8::4], M[12::4], 11010111b); T[4::4] = shufps(S[11::4], S[1::4], 10111100b); T[12::4] = shufps(S[3::4], S[8::4], 11000011b); T[8::4] = shufps(S[4::4], S[9::4], 11100010b); T[0::4] = shufps(S[12::4], S[0::4], 10110100b); }
SKETCH synthesizer
A s
imp
le o
pti
miz
atio
n
Fact
or
a p
oly
no
mia
l
SIM
D t
ran
spo
se o
f a
4x
4 m
atri
x
We are now working on exploiting synthesis throughout the HPC workflow, from design of parallel algorithms to kernel tuning. Ask us about details or come to our talk.
Domain expert:
problem spec → dynamic programming → parallel scan O(2n) O(n) O(log n)
Parallel algorithm expert:
example of parallel scan network → SIMD algorithm
GPU tuning expert:
SIMD algorithm → bank conflicts → index expressions
What are the limitations behind the magic?
Sketch doesn’t produce a proof of correctness:
SKETCH checks correctness of the synthesized program on all inputs of up to certain size. The program could be incorrect on larger inputs. This check is up to programmer.
Scalability:
Some programs are too hard to synthesize. Hence our proposal to use refinement, which provides modularity and breaks the synthesis task into smaller problems.
Proposed solution: Gradually refine the program
In each programming step, optimize just one function. This reduces the scope of synthesis, improving scalability and simplifying verification.
How a classical code generator is developed:
Step 1: Codegen developer writes a skeleton of how the generated code should look like.
Step 2: He writes the code generator that produces the desired code variant, by parameterizing the skeleton.
How a synthesing code generator is developed:
Step 1: The developer writes only the template. No actual generator is developed. The synthesizer completes the template to make the code variants behave like the spec.
Try SKETCH online at bit.ly/sketch-language