Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.
-
Upload
damian-turner -
Category
Documents
-
view
215 -
download
0
description
Transcript of Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.
![Page 1: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/1.jpg)
Superoptimization
Venkatesh Karthik SrinivasanGuest Lecture in CS 701, Nov. 10, 2015
![Page 2: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/2.jpg)
2
IA-32 Primer
Registers
1000
ESP
Stack
push eax
20
EAX20
mov ebx, eax
20
EBX
mov [esp], 60
60
lea esp, [esp+4]
9961000
add eax, ebx
40
RegistersEAX, EBX, ECX, EDX,
ESP, EBP, EIP,ESI, EDI
FlagsCF, OF, ZF, SF, PF, …
![Page 3: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/3.jpg)
Superoptimization
• Input: An instruction sequence I– Instructions belong to an ISA – No loops
• Output: An instruction-sequence I’– Instructions belong to same ISA – I’ is equivalent to I • On all inputs, I and I’ produce the same results
– I’ is the optimal implementation of I
+ timeout t
an improved • shorter• faster• lesser energy consumption
![Page 4: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/4.jpg)
Superoptimization
sub eax, ecxneg eaxsub eax, 1
IA-32Superoptimizer
not eaxadd eax, ecx
EAX ← ECX – EAX – 1
![Page 5: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/5.jpg)
Superoptimization
add ebp. 4mov [ebp], eaxadd ebp, 4mov [ebp], ebx
IA-32Superoptimizer
mov [ebp+4], eaxmov [ebp+8], ebxadd ebp, 8
![Page 6: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/6.jpg)
Why do we need a superoptimizer?
![Page 7: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/7.jpg)
© Alvin Cheung, CSE 501
![Page 8: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/8.jpg)
Peephole Optimization
• Done at low-level IR– Closer to assembly code
• Sliding window – “peephole” • Peephole rules– Rewrite rules of form “LHS → RHS”
• If current peephole matches LHS of some rule, replace with RHS
![Page 9: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/9.jpg)
Peephole RulesRule category LHS RHS
Eliminating redundant loads and stores
mov reg, [addr] // Load contents of [addr] in regmov [addr], reg // Store contents of reg in [addr]
(Both instructions must be in the same basic block)
mov reg, [addr]
Control-flow optimizations
goto L1…L1: goto L2
goto L2…L1: goto L2
Algebraic rewrites
add oprnd, 0 (none)
imul oprnd, 2 shl oprnd, 1
Machine idioms
add oprnd, 1 inc oprnd
![Page 10: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/10.jpg)
Problem with Peephole Rules
• Manually written– Cumbersome– Error-prone
![Page 11: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/11.jpg)
Peephole Superoptimizer
Superoptimizer
.
.
.mov eax, [esp]mov [esp], eax...mov eax, [esp]add eax, 1mov [esp], eax...
.
.
.mov eax, [esp]...mov eax, [esp]add eax, 1mov [esp], eax...
![Page 12: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/12.jpg)
Peephole Superoptimizer
Superoptimizer
.
.
.mov eax, [esp]mov [esp], eax...mov eax, [esp]add eax, 1mov [esp], eax...
.
.
.mov eax, [esp]....add [esp], 1mov eax, [esp]...
![Page 13: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/13.jpg)
Peephole Superoptimizer
Superoptimization is a sloooooooooowwwww process
![Page 14: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/14.jpg)
Optimization Database
Training programs Instruction sequences
Superoptimizer
Optimization database
![Page 15: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/15.jpg)
Peephole Superoptimizer
.
.
.mov eax, [esp]mov [esp], eax...mov eax, [esp]add eax, 1mov [esp], eax...
.
.
.mov eax, [esp]...mov eax, [esp]add eax, 1mov [esp], eax...
Optimization database
![Page 16: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/16.jpg)
Peephole Superoptimizer
.
.
.mov eax, [esp]mov [esp], eax...mov eax, [esp]add eax, 1mov [esp], eax...
.
.
.mov eax, [esp]....add [esp], 1mov eax, [esp]...Optimization
database
![Page 17: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/17.jpg)
Outline• Problem statement• Motivation
– Peephole optimization– Speeding up critical inner loops
• Equivalence of two instruction-sequences• A naïve superoptimizer• Massalin/Bansal-Aiken superoptimizer
– Canonicalization– Test cases– Counterexamples as tests – Pruning away sub-optimal candidates
• Stochastic superoptimization
![Page 18: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/18.jpg)
Checking equivalence of two instruction-sequences
• Key primitive in superoptimizers and related tools
• “Do two loop-free instruction sequences I and I’ produce the same outputs for all possible inputs?”
![Page 19: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/19.jpg)
Checking equivalence of two instruction-sequences
1. Encode I as a logical formula ϕ2. Encode I’ as a logical formula ψ3. Use an SMT solver to check if ϕ ⟺ ψ is
logically valid
How to encode an instruction sequence
as a logical formula?
How to use an SMT solver to check if two
formulas are equivalent?
![Page 20: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/20.jpg)
How to encode an instruction sequence I as a logical formula ϕ?
1. Symbolically evaluate I on Id state to obtain post-state σ
2. Convert σ into a logical formula
![Page 21: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/21.jpg)
Concrete Evaluation
• IA-32 state • IA-32 interpreter– Concrete operational semantics of IA-32
instructions
![Page 22: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/22.jpg)
IA-32 State
• ⟨RegMap, FlagMap, MemMap⟩• RegMap : Register ↦ 32-bit value• FlagMap : Flag ↦ Boolean value• MemMap : 32-bit address ↦ 8-bit value
⟨[EBX ↦ 8, ESP ↦ 1000], [ ], [1000 ↦ 2]⟩⟨[EAX ↦ 0, EBX ↦ 8, ECX ↦ 0, … , ESP ↦ 1000, EBP ↦ 0, … , EDI ↦ 0],
[SF ↦ false, CF ↦ false, … OF ↦ false], [0 ↦ 0, 4 ↦ 0, 8 ↦ 0, … ,1000 ↦ 2, 1004 ↦ 0, … , 4294967292 ↦ 0]⟩
32-bit address ↦ 32-bit value
![Page 23: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/23.jpg)
Concrete Evaluation
mov eax, [esp]add eax, 4sub ebx, 4mov [esp], ebx ⟨[EBX ↦ 8, ESP ↦ 1000],
[ ], [1000 ↦ 2]⟩
![Page 24: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/24.jpg)
Concrete Evaluation
mov eax, [esp]add eax, 4sub ebx, 4mov [esp], ebx ⟨[EBX ↦ 8, ESP ↦ 1000],
[ ], [1000 ↦ 2]⟩
![Page 25: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/25.jpg)
Concrete Evaluation
mov eax, [esp]add eax, 4sub ebx, 4mov [esp], ebx ⟨[EAX ↦ 2, EBX ↦ 8, ESP ↦ 1000],
[ ], [1000 ↦ 2]⟩
![Page 26: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/26.jpg)
Concrete Evaluation
mov eax, [esp]add eax, 4sub ebx, 4mov [esp], ebx ⟨[EAX ↦ 6, EBX ↦ 8, ESP ↦ 1000],
[PF ↦ true], [1000 ↦ 2]⟩
![Page 27: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/27.jpg)
Concrete Evaluation
mov eax, [esp]add eax, 4sub ebx, 4mov [esp], ebx ⟨[EAX ↦ 6, EBX ↦ 4, ESP ↦ 1000],
[PF ↦ true], [1000 ↦ 2]⟩
![Page 28: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/28.jpg)
Concrete Evaluation
mov eax, [esp]add eax, 4sub ebx, 4mov [esp], ebx ⟨[EAX ↦ 6, EBX ↦ 4, ESP ↦ 1000],
[PF ↦ true], [1000 ↦ 4]⟩
![Page 29: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/29.jpg)
Symbolic Evaluation
• Symbolic IA-32 state• Symbolic IA-32 interpreter– Symbolic transformers for instructions– Obtained from concrete operational semantics of
IA-32 instructions
![Page 30: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/30.jpg)
Symbolic IA-32 State• ⟨RegMap, FlagMap, MemMap⟩• RegMap : Symbolic register-constant ↦ term• FlagMap : Symbolic flag-constant ↦ formula• MemMap : Function symbol ↦ function-update expression
ID state⟨[EAX’ ↦ EAX, EBX’ ↦ EBX, … , ESP’ ↦ ESP, EBP’ ↦ EBP, … , EDI’ ↦ EDI], [SF’ ↦ SF, CF’ ↦ CF, … ZF’ ↦ ZF],
[Mem’ ↦ Mem]⟩
QFBV+Arrays
![Page 31: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/31.jpg)
Symbolic Evaluation
mov eax, [esp]add eax, 4sub ebx, 4mov [esp], ebx
⟨[EAX’ ↦ EAX, EBX’ ↦ EBX, … , ESP’ ↦ ESP, … ], [SF’ ↦ SF, … , ZF’ ↦ ZF],
[Mem’ ↦ Mem]⟩
![Page 32: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/32.jpg)
Symbolic Evaluation
mov eax, [esp]add eax, 4sub ebx, 4mov [esp], ebx
⟨[EAX’ ↦ EAX, EBX’ ↦ EBX, … , ESP’ ↦ ESP, … ], [SF’ ↦ SF, … , ZF’ ↦ ZF],
[Mem’ ↦ Mem]⟩
![Page 33: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/33.jpg)
Symbolic Evaluation
mov eax, [esp]add eax, 4sub ebx, 4mov [esp], ebx
⟨[EAX’ ↦ Mem(ESP), EBX’ ↦ EBX, … , ESP’ ↦ ESP, … ], [SF’ ↦ SF, … , ZF’ ↦ ZF],
[Mem’ ↦ Mem]⟩
![Page 34: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/34.jpg)
Symbolic Evaluation
mov eax, [esp]add eax, 4sub ebx, 4mov [esp], ebx
⟨[EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX, … , ESP’ ↦ ESP, … ], [SF’ ↦ ((Mem(ESP) + 4) < 0), … , ZF’ ↦ ((Mem(ESP) + 4) = 0)],
[Mem’ ↦ Mem]⟩
![Page 35: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/35.jpg)
Symbolic Evaluation
mov eax, [esp]add eax, 4sub ebx, 4mov [esp], ebx
⟨[EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX – 4, … , ESP’ ↦ ESP, … ], [SF’ ↦ ((EBX – 4) < 0), … , ZF’ ↦ ((EBX – 4) = 0)],
[Mem’ ↦ Mem]⟩
![Page 36: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/36.jpg)
Symbolic Evaluation
mov eax, [esp]add eax, 4sub ebx, 4mov [esp], ebx
⟨[EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX – 4, … , ESP’ ↦ ESP, … ], [SF’ ↦ ((EBX – 4) < 0), … , ZF’ ↦ ((EBX – 4) = 0)],
[Mem’ ↦ Mem[ESP ↦ EBX – 4]]⟩
![Page 37: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/37.jpg)
Convert a symbolic state into a logical formula
⟨[EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX – 4, … , ESP’ ↦ ESP, … ], [SF’ ↦ ((EBX – 4) < 0), … , ZF’ ↦ ((EBX – 4) = 0)],
[Mem’ ↦ Mem[ESP ↦ EBX – 4]]⟩
EAX’ = Mem(ESP) + 4 ∧ EBX’ = EBX – 4 ∧ … ∧ ESP’ = ESP ∧ … ∧ SF’ = ((EBX – 4) < 0) ∧ … ∧ ZF’ = ((EBX – 4) = 0) ∧ Mem’ = Mem[ESP ↦ EBX – 4]
![Page 38: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/38.jpg)
How to encode an instruction sequence I as a logical formula ϕ?
1. Symbolically evaluate I on Id state to obtain post-state σ
2. Convert σ into a logical formula
push eax EAX’ = EAX ∧ EBX’ = EBX ∧ … ∧ ESP’ = ESP – 4 ∧ … ∧ SF’ = SF ∧ … ∧ ZF’ = ZF ∧ Mem’ = Mem[ESP – 4 ↦ EAX]ESP’ = ESP – 4 ∧ Mem’ = Mem[ESP – 4 ↦ EAX]
![Page 39: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/39.jpg)
Exercise
• Convert the following instruction sequence into a QFBV formula
lea esp, [esp - 4]mov [esp], 1push ebpmov ebp, esp
![Page 40: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/40.jpg)
Checking Equivalence of Two Instruction-Sequences
1. Encode I as a logical formula ϕ2. Encode I’ as a logical formula ψ3. Use an SMT solver to check if ϕ ⟺ ψ is
logically valid
How to encode an instruction sequence
as a logical formula?
How to use an SMT solver to check if two
formulas are equivalent?
![Page 41: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/41.jpg)
How to use SMT solver to check equivalence?
• Conventionally used for “satisfiability”– Symbolic execution for test-case generation
EAX > 0 EBX = EAX + 4 ∧ EBX > 0 ∧ SMT solver
SAT[EAX ↦ 1, EBX ↦ 5]
EAX > 0 EBX = EAX + 4 ∧ EBX ∧ ≤ 0
SMT solver UNSAT
![Page 42: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/42.jpg)
How to use SMT solver to check equivalence?
• Can also be used to check “validity”– Synthesis and superoptimization
• To check if ϕ ⟺ ψ is logically valid, check satisfiability of (ϕ ⟺ ψ)
((EAX’ = EAX + EAX EBX’ = EBX * 2) ∧⟺ (EAX’ = EAX * 2 EBX’ = EBX >> ∧1))
SMT solverUNSAT
(Two formulas are equivalent)
SMT solverSAT
[EAX ↦ 1, EBX ↦ 1](Counterexample)
((EAX’ = EAX + EAX EBX’ = EBX * 2) ∧⟺ (EAX’ = EAX * 3 EBX’ = EBX >> ∧1))
![Page 43: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/43.jpg)
Checking equivalence of two instruction sequences
sub eax, ecxneg eaxsub eax, 1
EAX’ = -1 * (EAX + (-1 * ECX)) - 1 ∧SF’ = (-1 * (EAX + (-1 * ECX)) - 1) < 0
not eaxadd eax, ecx
EAX’ = -1 * EAX + ECX - 1 ∧SF’ = (-1 * EAX + ECX - 1) < 0
((EAX’ = -1 * (EAX + (-1 * ECX)) - 1 ∧SF’ = (-1 * (EAX + (-1 * ECX)) - 1) < 0) ⟺ (EAX’ = -1 * EAX + ECX - 1 ∧
SF’ = (-1 * EAX + ECX - 1) < 0))
SMT solver UNSAT
![Page 44: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/44.jpg)
Checking equivalence of two instruction sequences
mov eax, [esp]add eax, ebx
EAX’ = Mem(ESP) + EBX ∧SF’ = (Mem(ESP) + EBX) < 0
mov eax, [esp]sub eax, ebx
((EAX’ = Mem(ESP) + EBX ∧SF’ = (Mem(ESP) + EBX) < 0) ⟺ (EAX’ = Mem(ESP) + EBX ∧SF’ = (Mem(ESP) - EBX) < 0))
SMT solverSAT
[EBX ↦ 1, ESP ↦ 1000,Mem[1000 ↦ 1]]
EAX’ = Mem(ESP) + EBX ∧SF’ = (Mem(ESP) - EBX) < 0
![Page 45: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/45.jpg)
Outline• Problem statement• Motivation
– Peephole optimization• Equivalence of two instruction-sequences• A naïve superoptimizer• Massalin/Bansal-Aiken superoptimizer
– Canonicalization– Test cases– Counterexamples as tests – Pruning away sub-optimal candidates
• Stochastic superoptimization
![Page 46: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/46.jpg)
Thousands of unique IA-32 instruction
schemas
Exponential cost of
enumeration+ =
Instruction-Sequence
Enumerator
Equivalence Check
I
A Naïve Superoptimizer
Optimality check
No
Timeoutexpired
I’
![Page 47: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/47.jpg)
Outline• Problem statement• Motivation
– Peephole optimization• Equivalence of two instruction-sequences• A naïve superoptimizer• Massalin/Bansal-Aiken superoptimizer
– Canonicalization– Test cases– Counterexamples as tests – Pruning away sub-optimal candidates
• Stochastic superoptimization
![Page 48: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/48.jpg)
Canonicalizationmov ebp, espmov esp, ebp
IA-32Superoptimizer
mov ebp, esp
mov ebp, eaxmov eax, ebp
IA-32Superoptimizer
mov ebp, eax
mov esi, espmov esp, esi
IA-32Superoptimizer
mov esi, esp
![Page 49: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/49.jpg)
Canonicalizationmov ebp, espmov esp, ebp
IA-32Superoptimizer
mov ebp, esp
mov ebp, eaxmov eax, ebp
IA-32Superoptimizer
mov ebp, eax
mov esi, espmov esp, esi
IA-32Superoptimizer
mov esi, esp
mov reg1, reg2mov reg2, reg1
IA-32Superoptimizer
mov reg1, reg2
![Page 50: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/50.jpg)
Canonicalization
add ebp, 20add ebp, 30
IA-32Superoptimizer
add ebp, 50
add reg1, c1add reg1, c2
IA-32Superoptimizer
add reg1, c1+c2
![Page 51: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/51.jpg)
Canonicalization
add ebp, 20add ebp, 30 Canonicalizer
add ebp, 50
add reg1, c1add reg1, c2
Uncanonicalizer add reg1, c1+c2
IA-32Superoptimizer
reg1 ↦ ebpc1 ↦ 20c2 ↦ 30
![Page 52: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/52.jpg)
Instruction-Sequence
Enumerator
Equivalence Check
I
A Naïve Superoptimizer
Optimality check
No
Timeoutexpired
I’
![Page 53: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/53.jpg)
Instruction-Sequence
Enumerator
Equivalence Check
I
A Naïve Superoptimizer + Canonicalization
Optimality check
No
Timeoutexpired
I’
Canonicalizer
Canonicalizer
I’’
Uncanonicalizer
![Page 54: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/54.jpg)
Outline• Problem statement• Motivation
– Peephole optimization• Equivalence of two instruction-sequences• A naïve superoptimizer• Massalin/Bansal-Aiken superoptimizer
– Canonicalization– Test cases– Counterexamples as tests – Pruning away sub-optimal candidates
• Stochastic superoptimization
![Page 55: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/55.jpg)
Instruction-Sequence
Enumerator
Equivalence Check
I
A Naïve Superoptimizer + Canonicalization
Optimality check
No
Timeoutexpired
I’
Canonicalizer
Canonicalizer
I’’
Uncanonicalizer
![Page 56: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/56.jpg)
Test Cases
mov reg1, [reg2]add reg1, reg3
mov reg1, [reg2]sub reg1, reg3
[REG3 ↦ 0, REG2 ↦ 1000,Mem[1000 ↦ 1]]
[REG1 ↦ 1, REG3 ↦ 0, REG2 ↦ 1000,Mem[1000 ↦ 1]]
[REG1 ↦ 1, REG3 ↦ 0, REG2 ↦ 1000,Mem[1000 ↦ 1]]
[REG3 ↦ 1, REG2 ↦ 1000,Mem[1000 ↦ 1]]
[REG1 ↦ 2, REG3 ↦ 1, REG2 ↦ 1000,Mem[1000 ↦ 1]]
[REG1 ↦ 0, REG3 ↦ 1, REG2 ↦ 1000,Mem[1000 ↦ 1]]
![Page 57: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/57.jpg)
Instruction-Sequence
Enumerator
Equivalence check
I
Test Cases
Optimalitycheck
No
Timeoutexpired
I’
Canonicalizer
Canonicalizer Test cases
Fail
I’’
Uncanonicalizer
![Page 58: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/58.jpg)
Checking equivalence of two instruction sequences
mov reg1, [reg2]add reg1, reg3
REG1’ = Mem(REG2) + REG3 ∧SF’ = (Mem(REG2) + REG3) < 0
mov reg1, [reg2]sub reg1, reg3
((REG1’ = Mem(REG2) + REG3 ∧SF’ = (Mem(REG2) + REG3) < 0) ⟺ (REG1’ = Mem(REG2) + REG3 ∧SF’ = (Mem(REG2) – REG3) < 0))
SMT solver
SAT[REG3 ↦ 1, REG2 ↦ 1000,Mem[1000 ↦ 1]]
REG1’ = Mem(REG2) + REG3 ∧SF’ = (Mem(REG2) – REG3) < 0
Counterexample
![Page 59: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/59.jpg)
Instruction-Sequence
Enumerator
Equivalence check
I
Test Cases
Optimalitycheck
No
Timeoutexpired
I’
Canonicalizer
Canonicalizer Test cases
Fail
Counter-example
Counterexample-guided inductive
synthesis (CEGIS)
I’’
Uncanonicalizer
![Page 60: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/60.jpg)
Outline• Problem statement• Motivation
– Peephole optimization• Equivalence of two instruction-sequences• A naïve superoptimizer• Massalin/Bansal-Aiken superoptimizer
– Canonicalization– Test cases– Counterexamples as tests – Pruning away sub-optimal candidates
• Stochastic superoptimization
![Page 61: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/61.jpg)
Pruning sub-optimal candidates
.
.
.
.
.
.add ebp, c1+c2.....
One-instruction sequences Two-instruction sequences Three-instruction sequences
.
.
.
.
.
.add ebp, c1add ebp, c2.....
.
.
.
.
.
.
.
.
.
.
.
.
.
![Page 62: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/62.jpg)
Instruction-Sequence
Enumerator + Pruning
Equivalence check
I
Superoptimizer
Optimalitycheck
No
Timeoutexpired
I’
Canonicalizer
Canonicalizer Test cases
Fail
Counter-example
![Page 63: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/63.jpg)
Results – Search-space pruningLength Original search
spaceAfter canonicalization
After pruning Reduction factor
1 5453 997 644 8.5
2 29 million 2.49 million 1.2 million 24.7
3 162.1 billion 8.6 billion 3.11 billion 52.1
Original search space
After canonicalization
After pruning
After test-cases
![Page 64: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/64.jpg)
But still ….
Superoptimization is still a sloooooooooowwwww process
After canonicalization, pruning, and testing, checking all candidates of length up to 3 instructions takes several hours
![Page 65: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/65.jpg)
Outline• Problem statement• Motivation
– Peephole optimization• Equivalence of two instruction-sequences• A naïve superoptimizer• Massalin/Bansal-Aiken superoptimizer
– Canonicalization– Test cases– Counterexamples as tests – Pruning away sub-optimal candidates
• Stochastic superoptimization
![Page 66: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/66.jpg)
© Eric Schkufza, ASPLOS 2013
![Page 67: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/67.jpg)
© Eric Schkufza, ASPLOS 2013
![Page 68: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/68.jpg)
© Eric Schkufza, ASPLOS 2013
![Page 69: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/69.jpg)
© Eric Schkufza, ASPLOS 2013
Montgomery multiplication kernel in OpenSSL library (116
instructions):STOKE’s result is 16 instructions
shorter and 1.6 times faster than gcc –O3’s result
![Page 70: Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081517/5a4d1b257f8b9ab059996edd/html5/thumbnails/70.jpg)
Outline• Problem statement• Motivation
– Peephole optimization• Equivalence of two instruction-sequences• A naïve superoptimizer• Massalin/Bansal-Aiken superoptimizer
– Canonicalization– Test cases– Counterexamples as tests – Pruning away sub-optimal candidates
• Stochastic superoptimization