LLVM Register Allocation (2nd Version)
-
Upload
wang-hsiangkai -
Category
Software
-
view
495 -
download
8
Transcript of LLVM Register Allocation (2nd Version)
Outline• Introduction to Register Allocation Problem
• LLVM Register Allocation Template Method
• LLVM Basic Register Allocation
• LLVM Greedy Register Allocation
Introduction to Register Allocation
• Definition
• Register allocation is the problem of mapping program variables to either machine registers or memory addresses.
• Best solution
• minimise the number of loads/stores from/to memory
• NP-complete
int main(){ int i, j; int answer;
for (i = 1; i < 10; i++) for (j = 1; j < 10; j++) { answer = i * j; }
return 0;}
_main:@ BB#0: @ %entry
sub sp, #16movsr0, #0str r0, [sp, #12]movsr0, #1str r0, [sp, #8]b LBB0_2
LBB0_1: @ %for.inc.4 @ in Loop: Header=BB0_2 Depth=1
addsr1, #1str r1, [sp, #8]
LBB0_2: @ %for.cond @ =>This Loop Header: Depth=1 @ Child Loop BB0_5 Depth 2
ldr r1, [sp, #8]cmp r1, #9bgt LBB0_6
@ BB#3: @ %for.body @ in Loop: Header=BB0_2 Depth=1
str r0, [sp, #4]b LBB0_5
LBB0_4: @ %for.body.3 @ in Loop: Header=BB0_5 Depth=2
ldr r2, [sp, #4]mulsr1, r2, r1str r1, [sp]ldr r1, [sp, #4]addsr1, #1str r1, [sp, #4]
Graph Coloring• For an arbitrary graph G; a coloring of G assigns a
color to each node in G so that no pair of adjacent nodes have the same color.
2-colorable 3-colorable
Graph Coloring for RA• Node: Live interval
• Edge: Two live intervals have interference
• Color: Physical register
• Find a optimal colouring for the graph
… a0 = …
b0 = … … = b0 d0 = …
c0 = … …
d1 = c0
… = a0 … = d1
B0
B1 B2
B3
… LIa = …
LIb = … … = LIb
LIc = … …
LId = LIc
… = LIa … = LId
B0
B1 B2
B3
LIa
LIb LIc
LId
… LIa = …
LIb = … … = LIb
LIc = … …
LId = LIc
… = LIa … = LId
B0
B1 B2
B3
LLVM Register Allocation• Basic
• Provide a minimal implementation of the basic register allocator
• Greedy
• Global live range splitting.
• Fast
• This register allocator allocates registers to a basic block at a time.
• PBQP
• Partitioned Boolean Quadratic Programming (PBQP) based register allocator for LLVM
Template Method• Define the skeleton of an algorithm in an operation,
deferring some steps to subclasses.
LLVM Register Allocation Template Method
Enqueue All LiveInterval
selectOrSplit for One LiveInterval
Assign the Physical Register
Enqueue Split LiveInterval
dequeue
physical register is available
split live interval
allocatePhysRegs
enqueue
seedLiveRegs
Q
customised by new RA algorithm
for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) { unsigned Reg = TargetRegisterInfo::index2VirtReg(i); if (MRI->reg_nodbg_empty(Reg)) continue; enqueue(&LIS->getInterval(Reg)); }
Basic Register Allocation
LLVM Basic Register Allocation
Calculate LiveInterval Weight
Enqueue All LiveInterval RABasic::selectOrSplit
Assign the Physical Register
Enqueue Split LiveInterval
dequeue
physical register is available
split live intervalupdate LiveInterval.weight (spill cost)
allocatePhysRegs
enqueue
seedLiveRegs
priority Q (spill cost)
customised by RABasic algorithm
struct CompSpillWeight { bool operator()(LiveInterval *A, LiveInterval *B) const { return A->weight < B->weight; } };
1.Assign physical registers to Live Interval with highest spill cost.2.If there is no physical registers for current Live Interval, select
the highest spill cost Live Interval between current one and interferences to assign physical registers.
3.Spill the unassigned Live Intervals.
LiveInterval Weight• Weight for one instruction with the register
• weight = (isDef + isUse) * (Block Frequency / Entry Frequency)
• loop induction variable: weight *= 3
• For all instructions with the register
• totalWeight += weight
• Hint: totalWeight *= 1.01
• Re-materializable: totalWeight *= 0.5
• LiveInterval.weight = totalWeight / size of LiveInterval
Greedy Register Allocation
• Example (assign physical registers by length)Q0
D0 D1Q1
D2 D3
V1
V2
V3 V4V5
Q0D0 D1
Q1D2 D3
V1
V2
V3 V4V5
• No physical register for V1Q0
D0 D1Q1
D2 D3
V1
V2
V3 V4V5
• Evict V2 (evict Live Interval with lower spill cost)Q0
D0 D1Q1
D2 D3
V1
V2
V3V4V5
stack
• Split V2Q0
D0 D1Q1
D2 D3
V1
V2b
V3V4V5
V2a
V2c
• Split V2Q0
D0 D1Q1
D2 D3
V1
V2b
V3V4V5
V2a
V2c
stack
Greedy RA Stages• RS_New: created
• RS_Assign: enqueue
• RS_Split: need to split
• RS_Split2
• used for split products that may not be making progress
• RS_Spill: need to spill
• RS_Done: assigned a physical register or created by spill
RS_Split2• The live intervals created by split will enqueue to
process again.
• There is a risk of creating infinite loops.
… = vreg1 … … = vreg1 … … = vreg1 …
vreg2 = COPY vreg1 … = vreg2 … vreg3 = COPY vreg1 … = vreg3 … … = vreg3 …
RS_New
RS_Split2
Greedy Register Allocation
try to assign physical register
try to evict to find better register
enter RS_Split stage
try last chance recoloring split
spillpick a physical register and evict all interference
found register
stage >= RS_Done or Live Interval is unspillable
stage < RS_Split
selectOrSplit(d+1)
selectOrSplit(d)
stage is RS_Split or RS_Split2
Last Chance Recoloring• Try to assign a physical register to Live Interval by
evicting all its interferences.
• The recoloring process may recursively use the last chance recoloring. Therefore, when a virtual register has been assigned a color by this mechanism, it is marked as Fixed.
vA can use {R1, R2 }vB can use { R2, R3}vC can use {R1 }
vA => R1 vB => R2 vC => fails
vA => R2 vB => R3 vC => R1 (fixed)
selectOrSplit(d) selectOrSplit(d + 1)
How to Split?is stage beyond
RS_Spill?
is in one BB? tryLocalSplit
tryInstructionSplit
No
Yes
tryRegionSplit
is stage less than RS_Split2?
No
spillYes
success?
No
success?
spill
No
tryBlockSplit
Yes
No
success?No
success?
spill
No
done
Yes
Yes
done
Yes
Yes
tryLocalSplit• Try to split virtual register interval into smaller
intervals inside its only basic block.
• calculate gap weights
• adjust the split region
Calculate Gap Weights
NumGaps = 4
define
use
use
use
use
Calculate Gap Weights
LI.weight
VirtReg Live Interval
If there is a physical register occupied by VirtReg.0
0
define
use
use
use
use
Calculate Gap Weights
LI.weight
physical Live Interval
If there is a fixed physical register.0
0
huge_valf
define
use
use
use
use
Adjust Split Region
SplitAfter = 1
SplitBefore = 0
normalise spill weight >
max gap
if Diff > BestDiff: BestBefore = SplitBefore BestAfter = SplitAfter SplitAfter++
SplitBefore++
YesNo
normalise spill weight = spill cost / distance = (#gap * block_freq) / distance(SplitBefore, SplitAfter)
Adjust Split Region
BestAfter
BestBefore
normalise spill weight >
max gap
if Diff > BestDiff: BestBefore = SplitBefore BestAfter = SplitAfter SplitAfter++
SplitBefore++
YesNo
normalise spill weight = spill cost / distance = (#gap * block_freq) / distance(SplitBefore, SplitAfter)
RS_New (or RS_Split2)
RS_New
Go through all physical registers. Find the most critical range.
tryRegionSplit• Use Hopfield Network to find optimal splits.
• Guaranteed to converge to a local minimum.
Hopfield Networka(t)s⇥1 =
⇢ps⇥1 : t = 0S(Ws⇥s ⇥ a(t� 1)s⇥1 + bs⇥1) : t � 1
S(x) =
⇢+1 : x � ✓
�1 : x < ✓
tryRegionSplit1. For every physical register, construct Hopfield Network
• Initialize border constraints
• Initialize Hopfield Network nodes according to border constraints
• Add links to Hopfield Network and iterate
2. Get the best candidate
3. Do region split
Initialize Border Constraints• No Interference.
LiveIn ? PrefReg : DontCare;
LiveOut ? PrefReg : DontCare;
enum BorderConstraint { DontCare, PrefReg, PrefSpill, PrefBoth, MustSpill };
Initialize Border Constraints• There are Interferences.
MustSpill PrefSpill
FirstInstr
LastInstr
PrefReg/DontCare
FirstInstr
LastInstr
FirstInstr
LastInstr
MustSpill
FirstInstr
LastInstr
FirstInstr
LastInstr
FirstInstr
LastInstr
PrefSpill PrefReg/DontCare
Edge BundleBB #0
BB #1
BB #3
BB #2
BB #4 BB #5
BB #6
// Join the outgoing bundle with the ingoing bundles of all successors.for (MachineBasicBlock::const_succ_iterator SI = MBB.succ_begin(), SE = MBB.succ_end(); SI != SE; ++SI) EC.join(OutE, 2 * (*SI)->getNumber());
EC:(BB#0, in) Bundle #0: 0 0 0(BB#0, out) Bundle #1: 1 1 1(BB#1, in) Bundle #2: 2 1 1(BB#1, out) Bundle #3: 3 3 2(BB#2, in) Bundle #4: 4 3 2(BB#2, out) Bundle #5: 5 5 3(BB#3, in) Bundle #6: 6 5 3(BB#3, out) Bundle #7: 7 7 4(BB#4, in) Bundle #8: 8 7 4(BB#4, out) Bundle #9: 9 5 3(BB#5, in) Bundle #10: 10 7 4(BB#5, out) Bundle #11: 11 11 -> 1 1(BB#6, in) Bundle #12: 12 3 2(BB#6, out) Bundle #13: 13 13 5
void join(unsigned a, unsigned b) { unsigned eca = EC[a]; unsigned ecb = EC[b]; while (eca != ecb) if (eca < ecb) EC[b] = eca, b = ecb, ecb = EC[b]; else EC[a] = ecb, a = eca, eca = EC[a];}
Edge Bundle
BB #0
BB #1
BB #3
BB #2
BB #4 BB #5
BB #6 Blocks:Bundle #0: BB#0Bundle #1: BB#0, BB#1, BB#5Bundle #2: BB#1, BB#2, BB#6Bundle #3: BB#2, BB#3, BB#4Bundle #4: BB#3, BB#4, BB#5Bundle #5: BB#6Bundle #6:Bundle #7:Bundle #8:Bundle #9:Bundle #10:Bundle #11:Bundle #12:Bundle #13:
EC:(BB#0, in) Bundle #0: 0 0 0(BB#0, out) Bundle #1: 1 1 1(BB#1, in) Bundle #2: 2 1 1(BB#1, out) Bundle #3: 3 3 2(BB#2, in) Bundle #4: 4 3 2(BB#2, out) Bundle #5: 5 5 3(BB#3, in) Bundle #6: 6 5 3(BB#3, out) Bundle #7: 7 7 4(BB#4, in) Bundle #8: 8 7 4(BB#4, out) Bundle #9: 9 5 3(BB#5, in) Bundle #10: 10 7 4(BB#5, out) Bundle #11: 11 1 1(BB#6, in) Bundle #12: 12 3 2(BB#6, out) Bundle #13: 13 13 5
Edge Bundle
BB #0
BB #1
BB #3
BB #2
BB #4 BB #5
BB #6 Blocks:Bundle #0: BB#0Bundle #1: BB#0, BB#1, BB#5Bundle #2: BB#1, BB#2, BB#6Bundle #3: BB#2, BB#3, BB#4Bundle #4: BB#3, BB#4, BB#5Bundle #5: BB#6Bundle #6:Bundle #7:Bundle #8:Bundle #9:Bundle #10:Bundle #11:Bundle #12:Bundle #13:
EC:(BB#0, in) Bundle #0: 0 0 0(BB#0, out) Bundle #1: 1 1 1(BB#1, in) Bundle #2: 2 1 1(BB#1, out) Bundle #3: 3 3 2(BB#2, in) Bundle #4: 4 3 2(BB#2, out) Bundle #5: 5 5 3(BB#3, in) Bundle #6: 6 5 3(BB#3, out) Bundle #7: 7 7 4(BB#4, in) Bundle #8: 8 7 4(BB#4, out) Bundle #9: 9 5 3(BB#5, in) Bundle #10: 10 7 4(BB#5, out) Bundle #11: 11 1 1(BB#6, in) Bundle #12: 12 3 2(BB#6, out) Bundle #13: 13 13 5
Edge Bundle
BB #0
BB #1
BB #3
BB #2
BB #4 BB #5
BB #6 Blocks:Bundle #0: BB#0Bundle #1: BB#0, BB#1, BB#5Bundle #2: BB#1, BB#2, BB#6Bundle #3: BB#2, BB#3, BB#4Bundle #4: BB#3, BB#4, BB#5Bundle #5: BB#6Bundle #6:Bundle #7:Bundle #8:Bundle #9:Bundle #10:Bundle #11:Bundle #12:Bundle #13:
EC:(BB#0, in) Bundle #0: 0 0 0(BB#0, out) Bundle #1: 1 1 1(BB#1, in) Bundle #2: 2 1 1(BB#1, out) Bundle #3: 3 3 2(BB#2, in) Bundle #4: 4 3 2(BB#2, out) Bundle #5: 5 5 3(BB#3, in) Bundle #6: 6 5 3(BB#3, out) Bundle #7: 7 7 4(BB#4, in) Bundle #8: 8 7 4(BB#4, out) Bundle #9: 9 5 3(BB#5, in) Bundle #10: 10 7 4(BB#5, out) Bundle #11: 11 1 1(BB#6, in) Bundle #12: 12 3 2(BB#6, out) Bundle #13: 13 13 5
Edge Bundle
BB #0
BB #1
BB #3
BB #2
BB #4 BB #5
BB #6 Blocks:Bundle #0: BB#0Bundle #1: BB#0, BB#1, BB#5Bundle #2: BB#1, BB#2, BB#6Bundle #3: BB#2, BB#3, BB#4Bundle #4: BB#3, BB#4, BB#5Bundle #5: BB#6Bundle #6:Bundle #7:Bundle #8:Bundle #9:Bundle #10:Bundle #11:Bundle #12:Bundle #13:
EC:(BB#0, in) Bundle #0: 0 0 0(BB#0, out) Bundle #1: 1 1 1(BB#1, in) Bundle #2: 2 1 1(BB#1, out) Bundle #3: 3 3 2(BB#2, in) Bundle #4: 4 3 2(BB#2, out) Bundle #5: 5 5 3(BB#3, in) Bundle #6: 6 5 3(BB#3, out) Bundle #7: 7 7 4(BB#4, in) Bundle #8: 8 7 4(BB#4, out) Bundle #9: 9 5 3(BB#5, in) Bundle #10: 10 7 4(BB#5, out) Bundle #11: 11 1 1(BB#6, in) Bundle #12: 12 3 2(BB#6, out) Bundle #13: 13 13 5
Edge Bundle
BB #0
BB #1
BB #3
BB #2
BB #4 BB #5
BB #6 Blocks:Bundle #0: BB#0Bundle #1: BB#0, BB#1, BB#5Bundle #2: BB#1, BB#2, BB#6Bundle #3: BB#2, BB#3, BB#4Bundle #4: BB#3, BB#4, BB#5Bundle #5: BB#6Bundle #6:Bundle #7:Bundle #8:Bundle #9:Bundle #10:Bundle #11:Bundle #12:Bundle #13:
EC:(BB#0, in) Bundle #0: 0 0 0(BB#0, out) Bundle #1: 1 1 1(BB#1, in) Bundle #2: 2 1 1(BB#1, out) Bundle #3: 3 3 2(BB#2, in) Bundle #4: 4 3 2(BB#2, out) Bundle #5: 5 5 3(BB#3, in) Bundle #6: 6 5 3(BB#3, out) Bundle #7: 7 7 4(BB#4, in) Bundle #8: 8 7 4(BB#4, out) Bundle #9: 9 5 3(BB#5, in) Bundle #10: 10 7 4(BB#5, out) Bundle #11: 11 1 1(BB#6, in) Bundle #12: 12 3 2(BB#6, out) Bundle #13: 13 13 5
Edge Bundle
BB #0
BB #1
BB #3
BB #2
BB #4 BB #5
BB #6 Blocks:Bundle #0: BB#0Bundle #1: BB#0, BB#1, BB#5Bundle #2: BB#1, BB#2, BB#6Bundle #3: BB#2, BB#3, BB#4Bundle #4: BB#3, BB#4, BB#5Bundle #5: BB#6Bundle #6:Bundle #7:Bundle #8:Bundle #9:Bundle #10:Bundle #11:Bundle #12:Bundle #13:
EC:(BB#0, in) Bundle #0: 0 0 0(BB#0, out) Bundle #1: 1 1 1(BB#1, in) Bundle #2: 2 1 1(BB#1, out) Bundle #3: 3 3 2(BB#2, in) Bundle #4: 4 3 2(BB#2, out) Bundle #5: 5 5 3(BB#3, in) Bundle #6: 6 5 3(BB#3, out) Bundle #7: 7 7 4(BB#4, in) Bundle #8: 8 7 4(BB#4, out) Bundle #9: 9 5 3(BB#5, in) Bundle #10: 10 7 4(BB#5, out) Bundle #11: 11 1 1(BB#6, in) Bundle #12: 12 3 2(BB#6, out) Bundle #13: 13 13 5
Edge Bundle
BB #0
BB #1
BB #3
BB #2
BB #4 BB #5
BB #6 Blocks:Bundle #0: BB#0Bundle #1: BB#0, BB#1, BB#5Bundle #2: BB#1, BB#2, BB#6Bundle #3: BB#2, BB#3, BB#4Bundle #4: BB#3, BB#4, BB#5Bundle #5: BB#6Bundle #6:Bundle #7:Bundle #8:Bundle #9:Bundle #10:Bundle #11:Bundle #12:Bundle #13:
EC:(BB#0, in) Bundle #0: 0 0 0(BB#0, out) Bundle #1: 1 1 1(BB#1, in) Bundle #2: 2 1 1(BB#1, out) Bundle #3: 3 3 2(BB#2, in) Bundle #4: 4 3 2(BB#2, out) Bundle #5: 5 5 3(BB#3, in) Bundle #6: 6 5 3(BB#3, out) Bundle #7: 7 7 4(BB#4, in) Bundle #8: 8 7 4(BB#4, out) Bundle #9: 9 5 3(BB#5, in) Bundle #10: 10 7 4(BB#5, out) Bundle #11: 11 1 1(BB#6, in) Bundle #12: 12 3 2(BB#6, out) Bundle #13: 13 13 5
Initialize Hopfield Network Node• update BiasN, BiasP according to BorderConstraint
BB #n (freq) … = Y op …
PrefReg
PrefSpill
Bundle ib BiasP += freq
Bundle ob BiasN += freq
void addBias(BlockFrequency freq, BorderConstraint direction) { switch (direction) { default: break; case PrefReg: BiasP += freq; break; case PrefSpill: BiasN += freq; break; case MustSpill: BiasN = BlockFrequency::getMaxFrequency(); // (uint64_t)-1ULL break; } }
Add Links to Hopfield Network• add weight to links
Live Through BB #n (freq)
Bundle ib
Bundle ob
void addLink(unsigned b, BlockFrequency w) { // Update cached sum. SumLinkWeights += w;
// There can be multiple links to the same bundle, add them up. for (LinkVector::iterator I = Links.begin(), E = Links.end(); I != E; ++I) if (I->second == b) { I->first += w; return; } // This must be the first link to b. Links.push_back(std::make_pair(w, b)); }
(freq, ob)
(freq, ib)
Update Hopfield Network
Bundle X BiasN BiasP Value = 0
Bundle A Value = -1
Bundle B Value = 1
Bundle C Value = 1
Bundle D Value = 1
SumN = BiasN + freqASunP = BiasP + freqB + freqC + freqD
(freqA, A) (freqB, B) (freqC, C) (freqD, D)
if (SumN >= SumP + Threshold) Value = -1; else if (SumP >= SumN + Threshold) Value = 1; else Value = 0;
a(t)s⇥1 =
⇢ps⇥1 : t = 0S(Ws⇥s ⇥ a(t� 1)s⇥1 + bs⇥1) : t � 1
2
66664
· · ·· · ·· · ·· · ·
FA FB FC FD 0
3
77775⇥
2
66664
�11110
3
77775+
2
666664
...
Biasp �Biasn
3
777775
Region Split• splitLiveThroughBlock
• splitRegInBlock
• splitRegOutBlock
splitLiveThroughBlock
Bundle ib Value == 1
Bundle ob Value != 1
Live Through LiveOut on Stack
first non-PHIStart
New Int
Bundle ib Value != 1
Bundle ob Value == 1
Live Through LiveIn on Stack
last split point
EndNew Int
Live Through No Interference
Bundle ib Value == 1
Bundle ob Value == 1
End
New Int
Start
splitLiveThroughBlock
Bundle ib Value == 1
Bundle ob Value == 1
LiveThrough Non-overlapping interference
New Int
Interference.first()
Interference.last()
New Int
Bundle ib Value == 1
Bundle ob Value == 1
LiveThrough Overlapping interference
New IntInterference.first()
Interference.last()New Int
splitRegInBlock
Bundle ib Value == 1
No LiveOut Interference after kill
Start
New Int
Bundle ib Value == 1
Bundle ob Value != 1
LiveOut on Stack Interference after last use
LiveOut on Stack Interference after last use
Interference.fist()LastInstr
LastInstrlast split point
New IntStart
Bundle ib Value == 1
Bundle ob Value != 1
LastInstr
last split point
New Int
Start
Interference.fist() Interference.fist()
splitRegInBlock
Bundle ib Value == 1
LiveOut on Stack Interference overlapping uses
Start
New Int
Bundle ib Value == 1
Interference.fist()LastInstrlast split point
New Int
Start
New Int
Interference.fist()
LastInstrlast split point
New Int
Bundle ob Value != 1
Bundle ob Value != 1
LiveOut on Stack Interference overlapping uses
splitRegOutBlockNo LiveIn
Interference before def
EndNew Int
Bundle ib Value != 1
Bundle ob Value == 1
Live Through Interference before def
Live Through Interference overlapping uses
Interference.last()
FirstInstr
Bundle ib Value != 1
Bundle ob Value == 1
Bundle ob Value == 1
End
New Int
Interference.last()
FirstInstrlast split point
EndNew Int
Interference.last()
FirstInstrNew Int