Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of...
-
date post
19-Dec-2015 -
Category
Documents
-
view
216 -
download
1
Transcript of Optimal Polynomial Time Algorithms for Register Assignment Presented at the Chinese University of...
Optimal Polynomial Time Algorithms for Register Assignment
Presented at the Chinese University
of Hong Kong
- Fernando M. Q. Pereira -
August 28th, 2007
University of California, Los Angeles
Background
Register Allocation Assign physical locations to the
variables in a program. Registers are fast, but few. Memory is large, but slow.
Constraints: variables simultaneously alive must be assigned to different physical locations.
If there are not enough registers, some variables must be mapped into memory. These are called spilled variables.
Spill Free Register Allocation Instance: program P and K registers Problem: can each of the variables
of P be mapped to one of the K registers such that variables simultaneously alive are given different registers?
Liveness? Live Range? A variable is alive if it can be used in the
future. Live range of a variable is the collection
of program points where it is alive.a := 1
b := 2
c := a
d := b
e := c
ret a + e
a
b
c
d
a
e
a := d
1)
2)
3)
4)
5)
6)
7)
Quiz 1 How many registers?
a := 1
b := 2
c := a
d := b
e := c
ret a + e
a
b
c
d
aa := d
1)
2)
3)
4)
5)
6)
7)
Is there a general algorithm? Is this problem in P or NP?
a(R1):= 1b(R2):= 2
c(R1):= a(R1)d(R2):= b(R2)e(R3):= c(R1)a(R1):= d(R2)
ret a(R1)+e(R3)
e
Register Allocation and Graphs SFRA = Graph coloring [Chaitin81]
a := 1
b := 2
c := a
d := b
e := c
ret a + e
a
b
c
d
a
SFRA is NP-complete…
a b c
dee
a := d
Example
a := 1
b := 2
c := a
d := b
e := c
ret a + e
a(R1)
b(R2)
c(R1)
d(R3)e(R2)
a := d
Thee registers: R1, R2 and R3 R1 := 1
R2 := 2
R1 := R1
R3 := R2
R2 := R1
ret R1 + R2
R1 := R3
Live Range Splitting Live ranges are split via copy instructions
and/or renaming of variables. May reduce the degree of the
interference graph.
a := 1
b := 1
:= b
c := 1
:= a
:= c
b
ca a
b
c
a1 := 1
b := 1
:= b
c := 1
:= a2
:= c
a2 := a1
b
c
a1
a2
b
c
a1
a2
(a) (b) (c) (d) (e) (f)
Quiz 2 If I can split live ranges, how many
registers?
a := 1
b := 2
c := a
d := b
e := c
ret a + e
a := d
a
b
c
d
a
e
a1 := 1b := 2c := a1
d := be := ca2 := d
ret a2 + e
a2
a1bcde
Quiz 3 P or NP?
Instance: program P, K registers Problem: is there a way to split the live
ranges of P so that all its variables can fit into K registers?
This problem has polynomial solution! Three independent proofs in 2005:
Philip Brisk, WLS’05 Florent Bouchez, INRIA, Master’s thesis Sebastian Hack, CC’06
Quiz 4, and a bit of intuition…
Is coloring of Circular arc-graphs in P or NP?
Is coloring of Interval-graphs in P or NP?
b
c
a
d
e
a b c d e
Intuition on Live Range Splitting
b
c
a
d
e
a b c
de
b
c
a1
d
e
a2
b c
dea1
a2
SSA-Form: the new hope. Static Single Assignment[CFR+91]. Intermediate program representation. Each variable is defined only once.
b
c
da2
a1 a1
a2
a1 := 1
b := 2
c := a1
d := b
e := c
ret a2 + e
b
c
d
e
a2 := d e
1)
2)
3)
4)
5)
6)
7)
Polynomial time SFRA [Brisk05,Bouchez05,Hack06]: the
interference graph of SSA-form programs is chordal.
Chordal graphs can be colored in polynomial time.
SFRA has polynomial solution for SSA-form programs.
Any program can be converted to SSA-form.
The SSA-form program never requires more regs than the original program.
Quiz 5: RA in basic blocks
A basic block is a sequence of instructions with no branches.
How is the interference graph of a SSA-form basic block?
Give polynomial time algorithm for register assignment in basic blocks.
Too good, but… … real computer architectures are a
little too surreal…
There are more things in x86, Horatio…
The polynomial time register assignment algorithm is too abstract.
Some computer architectures are messy: Pre-colored registers Registers of different sizes.
Testimony: no publicly available implementation for x86 after two years.
Pre-colored registers Some variables must be assigned to
particular registers. Ex.: calling conventions, division, etc
a := 10;b := 2;R0 := a;R1 := b;call(R0, R1);
a := 10;b := 2;AX := a;(AL,AH) := DIV AX, b;d := AL; // quotientr := AH; // remainder
Function call (PowerPC) Division (x86)
Quiz 6: pre-coloring extension
Pre-coloring extension is NP-complete for interval graphs[Biro92] and even for Unit-interval graphs[Marx06]
…
easy :) difficult :(
Is pre-coloring extension of interval graphs in P or NP?
Alias Register Allocation Aliased registers can be used
independently, or in combination. Ex.: x86, Sun SPARC, MIPS floating point
numbers, etc. Ex.: aliased registers in the Pentium:
EAX EBX ECX EDX
AX BX CX DX
AH AL BH BL CH CL DH DL
32 bits
16 bits
8 bits
Quiz 7: Weighted Coloring
a b e
d
c
a
b
c
de
Shipbuilding Alias RA
a(23)
b(0)
c(12)
d(3)e(1)
a(01)
b(2)
c(01)
d(4)e(3)
What is the optimal 1-2-coloring of the graph in the left?
Alias Register Allocation Alias Register Allocation is similar to the
shipbuilding problem[Gol04, pp 204] Alias Register Allocation is NP-
complete[LPP07] for interval graphs. And so is the shipbuilding problem...
What can SSA do?
The SSA transformation is too weak to handle
alias register allocation and programs with pre-
colored variables.
Register Allocation by Puzzle Solving
Polynomial time 1-2-coloring extension with live range splitting.
Aliased Register Allocation with Pre-coloring
Instance: program P containing variables that are either short or long, 2K available registers, plus a partial function that associates variables with registers. Long variables are assigned two registers {2i, 2i+1}, 0 i < K, and short variables are assigned one register.
Problem: is it possible to extend so that it constitutes a valid register allocation of P? The register allocator is allowed to split live ranges.
In other words… Optimal spill free register allocation.
x86, Ultra SPARC, MIPS, PowerPC, … as far as I know, any register based architecture.
Heuristics for spilling.
Heuristics for spilling? Optimal solution for spill free register
allocation. If it is not possible to find an optimal
register assignment for program P, variables of P must be stored in memory. Finding the minimum number of variables
that must be spilled is NP-complete. Finding the largest K colorable induced
subgraph of a chordal graph is NP-complete [Yannakakis87].
[PP07] - The Main Ideas Elementary Programs and Elementary
graphs. Elementary programs have elementary
interference graphs. Any well structured program can be
converted to an elementary program. Each connected component of an
elementary graph is a clique substitution of P3.
[PP07] - The Main Ideas
Elementary Programs
P is an elementary program if:
1. P is strict
2. P is in static single assignment form
3. For any variable v of P, LR(v) contains at most one program point outside the basic block that contains def(v)
4. If two variables u,v of P interfere, then either def(u) = def(v), or kill(u) = kill(v)
5. If two variables u,v of P interfere, then either LR(u) LR(v), or LR(v) LR(u)
(a) Strict program (b) Elementary program
Interference graph
Clique Substitution of P3
P3 is a path with three vertices.
P3K2 K3
P3[K2, K2, K3]
X Clique Y Clique
Z Clique
Elementary Graphs Definition: G is an elementary graph if
and only if every connected component of G is a
clique substitution of P3
Theorem: An elementary program has an elementary interference graph.
Aligned 1-2-coloring extension
Instance: Graph G with nodes that are either short or long, 2K available colors, plus a partial function that associates nodes with colors. Long nodes are assigned two colors {2i, 2i+1}, 0 i < K, and short nodes are assigned one.
Problem: is it possible to extend so that it constitutes a valid coloring of G?
Graph Hierarchy
The Puzzles
The Board:
The Pieces:
From graphs to puzzles
Given PX,Y,Z we build a puzzle: Vertex piece Color column X-clique upper row Y-clique both rows Z-clique lower row Pre-coloring some pieces are already on the
board Theorem: Aligned 1-2-coloring extension for
clique substitutions of P3 and puzzle solving are equivalent under linear-time reductions
Rules, Patterns and matches
match
Don’t match
Example Program
Our Solution
Counter-example 1
Lesson: use a size-2 piece before two size-1 pieces
Counter-example 2
Lesson: statements 7-10 must come before statements 11-14
Counter-example 3
Lesson: statement 15 must come before statements 11-14
Counter-example 4
Lesson: the order in statement 11-14 is crucial
Running Complexity Theorem: a puzzle is solvable if, and
only if, our program succeeds on the puzzle.
Our puzzle solving program runs in linear time.
Spilling Visit each puzzle once. If the puzzle is not solvable, then
remove some pieces and try to solve again.
Each time we remove a piece, we also remove all other pieces that stem from the same variable in the original program.
Spill farthest use first.
Experimental Results Puzzle solver has been implemented in
the LLVM[CV04] framework. Compile C programs to x86 target.
Over one million lines of code compiled! We have compared our allocator with
LLVM’s default algorithm, and a graph coloring well known heuristics.
Benchmarks
Benchmark LoC Asm btcode
ASCI Purple:smg2000 74,875 73,039 303,037
SPEC2000:175.vpr 70,253 52,917 173,475
SPEC2000:188.ammp 54,335 35,567 149,245
MallocBench:expresso 52,853 45,041 250,770
SPEC2000:197.parser 49,388 32,849 163,025
SPEC2000:164.gzip 39,157 8,130 46,188
(six more) … … …
Total 409,540 286,900 1,345,898
Types of Puzzles
Number of Iterations
Benchmark Puzzles Avg max Once
ASCI Purple:smg2000 52,791 1.33 8 33,822
SPEC2000:175.vpr 47,276 1.10 10 45,575
SPEC2000:188.ammp 33,428 1.09 9 28,515
MallocBench:expresso 43,791 1.06 3 38,925
SPEC2000:197.parser 30,868 1.05 4 28,992
SPEC2000:164.gzip 7,840 1.06 3 6,718
(six more) … … … …
Total 251,428 1.13 10 213,411
Execution Time of Generated Code
Data normalized with respect to GCC -02.
Conclusion If you want to do register allocation for
the Pentium, your problem is to solve a collection of puzzles.
Fast compilation time, competitive code quality.
Many possible directions for future research.