Dealing With Register Hierarchies -...
Transcript of Dealing With Register Hierarchies -...
![Page 1: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/1.jpg)
Dealing with Register Hierarchies
S0D0
Q0
S1D1
S2 S3
FP Register
Matthias Braun (MatzeB) / LLVM Developers' Meeting 2016
r0,r1,r2,r3
r1,r2,r3,r4 r2,r3,r4,r5
r3,r4,r5,r6 r4,r5,r6,r7
r5,r6,r7,r8 r6,r7,r8,r9
...
4 Tuple Class
r0;r1;r2;r3r1;r2;r3;r4
r1;r2;r3r2;r3
r3r3;r4
r2;r3;r4r3;r4;r5
![Page 2: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/2.jpg)
Register Allocation• Rewrite program with unlimited number of virtual registers to use
actual registers
• Techniques: Interference Checks, Assignment, Spilling, Splitting, Rematerialization
%0 = const 5 %1 = const 7 %2 = add %0, %1 return %2
r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0
![Page 3: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/3.jpg)
Register Allocation for GPUs• Hundreds of registers available, but using fewer increases
parallelism
• Mix of Scalar (single value) and Vector (multiple values) operations
• Load/Store instructions work on multiple registers(high latency, high throughput)
r[0:3] = load_x4 # Load r0, r1, r2, r3 r4 = add r0, 1 r5 = add r1, 2 r6 = add r2, 3 r7 = add r3, 4 store_x4 r[4:7] # Store r4, r5, r6, r7
![Page 4: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/4.jpg)
Liveness Tracking• Linearize program
• Number instructions consecutively (SlotIndexes)
b1: %1 = const 5 jmp b3
%0 = def cmp ... jeq b2
b2: store %0 %1 = def
b3: 2% = add %1, 1
![Page 5: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/5.jpg)
Liveness Tracking
b1: %1 = const 5 jmp b3
%0 = def cmp ... jeq b2
b2: store %0 %1 = def
b3: 2% = add %1, 1
SlotIdx0 1 2
3 4 5
6 7 8
9 10
• Linearize program
• Number instructions consecutively (SlotIndexes)
![Page 6: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/6.jpg)
Liveness Tracking• Linearize program
• Number instructions consecutively (SlotIndexes)
• Liveness as sorted list of intervals (segments)
SlotIdx%0 %1 %2
… %1: [4:6)[8:9)[9:10) …
b1: %1 = const 5 jmp b3
%0 = def cmp ... jeq b2
b2: store %0 %1 = def
b3: 2% = add %1, 1
0 1 2
3 4 5
6 7 8
9 10
![Page 7: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/7.jpg)
Modeling Register Hierarchies
![Page 8: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/8.jpg)
r[0:3] = load_x4 r4 = add r0, 1 r5 = add r1, 2 r6 = add r2, 3 r7 = add r3, 4 store_x4 r[4:7]
Tuple Registers
![Page 9: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/9.jpg)
%0,%1,%2,%3 = load_x4 %4 = add %0, 1 %5 = add %1, 2 %6 = add %2, 3 %7 = add %3, 4 store_x4 %4,%5,%6,%7
Tuple Registers
![Page 10: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/10.jpg)
%0,%1,%2,%3 = load_x4 %4 = add %0, 1 %5 = add %1, 2 %6 = add %2, 3 %7 = add %3, 4 store_x4 %4,%5,%6,%7
❌ No relation between virtual registers but need to be consecutive
Tuple Registers
![Page 11: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/11.jpg)
Tuple Registers
%0 = load_x4 %1.sub0 = add %0.sub0, 1 %1.sub1 = add %0.sub1, 2 %1.sub2 = add %0.sub2, 3 %1.sub3 = add %0.sub3, 4 store_x4 %1
![Page 12: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/12.jpg)
Tuple Registers
%0 = load_x4 %1.sub0 = add %0.sub0, 1 %1.sub1 = add %0.sub1, 2 %1.sub2 = add %0.sub2, 3 %1.sub3 = add %0.sub3, 4 store_x4 %1
r0,r1,r2,r3
r1,r2,r3,r4 r2,r3,r4,r5
r3,r4,r5,r6 r4,r5,r6,r7
r5,r6,r7,r8 r6,r7,r8,r9
...
4 Tuple Class
• Register class contains tuples
• Allocator picks a single (tuple) register
• Parts called subregisters or lanes
• Select parts with subregister index (.xxx Syntax)
![Page 13: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/13.jpg)
Construction%0 = load %1 = const 42 %2 = reg_sequence %0, sub1, %1, sub0 store_x2 %2
• reg_sequence defines multiple subregisters (for SSA)(there is also insert_subreg, extract_subreg)
![Page 14: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/14.jpg)
Construction%0 = load %1 = const 42 %2 = reg_sequence %0, sub1, %1, sub0 store_x2 %2
%0 = load %1 = const 42 %2.sub0<undef> = copy %0 %2.sub1 = copy %1 store_x2 %2
• TwoAddressInstruction pass translates to copy sequence
• reg_sequence defines multiple subregisters (for SSA)(there is also insert_subreg, extract_subreg)
![Page 15: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/15.jpg)
Construction%0 = load %1 = const 42 %2 = reg_sequence %0, sub1, %1, sub0 store_x2 %2
%0 = load %1 = const 42 %2.sub0<undef> = copy %0 %2.sub1 = copy %1 store_x2 %2
• RegisterCoalescing pass eliminates copies
%2.sub0<undef> = load %2.sub1 = const 42 store_x2 %2
• TwoAddressInstruction pass translates to copy sequence
• reg_sequence defines multiple subregisters (for SSA)(there is also insert_subreg, extract_subreg)
![Page 16: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/16.jpg)
Improving Register Allocation
![Page 17: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/17.jpg)
%0 = load_x4 %1.sub0 = add %0.sub0, 1 %1.sub1 = add %0.sub1, 2 %1.sub2 = add %0.sub2, 3 %1.sub3 = add %0.sub3, 4 store_x4 %1
%0%1
Subregister Liveness
![Page 18: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/18.jpg)
%0%1
%0 = load_x4 %1.sub0 = add %0.sub0, 1 %1.sub1 = add %0.sub1, 2 %1.sub2 = add %0.sub2, 3 %1.sub3 = add %0.sub3, 4 store_x4 %1
Subregister Liveness
![Page 19: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/19.jpg)
Can allocate v0 and v1 to the same register tuple
%0 = load_x4 %1.sub0 = add %0.sub0, 1 %1.sub1 = add %0.sub1, 2 %1.sub2 = add %0.sub2, 3 %1.sub3 = add %0.sub3, 4 store_x4 %1
%0sub0 sub1 sub2 sub3
%1sub0 sub1 sub2 sub3
Subregister Liveness
![Page 20: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/20.jpg)
Subregister Liveness: Lane Masks• Lane Mask: 1 bit per subregister
• Annotate subregister liveness parts with lane mask
• Start with whole virtual register; Split and refine as necessary
![Page 21: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/21.jpg)
Lane Masks:sub0: 0b0001 sub1: 0b0010 sub2: 0b0100
sub1_sub2: 0b0110 sub3: 0b1000 all: 0b1111
%0 = load_x4 store_x4 %0
%1 = load_x4 %1.sub0 = const 13 %1.sub3 = const 42 store_x4 %1
Subregister Liveness: Lane Masks• Lane Mask: 1 bit per subregister
• Annotate subregister liveness parts with lane mask
• Start with whole virtual register; Split and refine as necessary
![Page 22: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/22.jpg)
Lane Masks:sub0: 0b0001 sub1: 0b0010 sub2: 0b0100
sub1_sub2: 0b0110 sub3: 0b1000 all: 0b1111
%1
%0 = load_x4 store_x4 %0
%1 = load_x4 %1.sub0 = const 13 %1.sub3 = const 42 store_x4 %1
%01111
Subregister Liveness: Lane Masks
Lane Mask:
• Lane Mask: 1 bit per subregister
• Annotate subregister liveness parts with lane mask
• Start with whole virtual register; Split and refine as necessary
![Page 23: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/23.jpg)
• Lane Mask: 1 bit per subregister
• Annotate subregister liveness parts with lane mask
• Start with whole virtual register; Split and refine as necessary
Lane Masks:sub0: 0b0001 sub1: 0b0010 sub2: 0b0100
sub1_sub2: 0b0110 sub3: 0b1000 all: 0b1111
%1
%0 = load_x4 store_x4 %0
%1 = load_x4 %1.sub0 = const 13 %1.sub3 = const 42 store_x4 %1
01010001 1000Lane Mask:%01111
Subregister Liveness: Lane Masks
![Page 24: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/24.jpg)
Assignment Heuristics
• Default: Assign in program order
To Assignr0 r1 r2 r3 r4 r5
![Page 25: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/25.jpg)
To Assignr0 r1 r2 r3 r4 r5
Assignment Heuristics
• Default: Assign in program order
![Page 26: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/26.jpg)
r0 r1 r2 r3 r4 r5
Assignment HeuristicsTo Assign
• Default: Assign in program order
![Page 27: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/27.jpg)
r0 r1 r2 r3 r4 r5
Assignment HeuristicsTo Assign
• Default: Assign in program order
• Wide pieces may not fit in holes left by small ones
![Page 28: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/28.jpg)
r0 r1 r2 r3 r4 r5
Assignment HeuristicsTo Assign
• Default: Assign in program order
• Wide pieces may not fit in holes left by small ones
![Page 29: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/29.jpg)
• Default: Assign in program order
• Wide pieces may not fit in holes left by small ones
r0 r1 r2 r3 r4 r5
Assignment HeuristicsTo Assign
![Page 30: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/30.jpg)
r0 r1 r2 r3 r4 r5
Assignment HeuristicsTo Assign
• Default: Assign in program order
• Wide pieces may not fit in holes left by small ones
• Tweak: Prioritize bigger classes
![Page 31: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/31.jpg)
r0 r1 r2 r3 r4 r5
Assignment HeuristicsTo Assign
• Default: Assign in program order
• Wide pieces may not fit in holes left by small ones
• Tweak: Prioritize bigger classes
![Page 32: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/32.jpg)
r0 r1 r2 r3 r4 r5
Assignment HeuristicsTo Assign
• Default: Assign in program order
• Wide pieces may not fit in holes left by small ones
• Tweak: Prioritize bigger classes
![Page 33: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/33.jpg)
Interference Checks: Register Units• Tuples multiply number of registers
• Interference check of single register in target with 1-10 tuples:45 aliases!
r0,r1,r2,r3 r1,r2,r3,r4 r2,r3,r4,r5 r3,r4,r5,r6
r1,r2,r3
r2,r3
r3
r3,r4
r2,r3,r4 r3,r4,r5
...
![Page 34: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/34.jpg)
Interference Checks: Register Units
• Each register mapped to one or more units:Registers alias iff they share a unit
• Liveness/Interference checks of actual registers uses register units
u0 u1 u2 u3 u4 u5
r0;r1;r2;r3
r1;r2;r3;r4
r1;r2;r3
r2;r3
r3
r3;r4
r2;r3;r4
r3;r4;r5
![Page 35: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/35.jpg)
Usage, Results, Future Work
![Page 36: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/36.jpg)
Use in LLVM
• Declare Subregister Indexes + Subregisters in XXXRegisterInfo.td
• TableGen computes register units and combined subregister indexes/classes
• Enable fine grained liveness tracking by overriding TargetSubtargetInfo::enableSubRegLiveness()
• AllocationPriority part of register class specification
![Page 37: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/37.jpg)
Results: Apple GPU Compiler
• Compared various benchmarks and captured application shaders
• Average 20% reduction in register usage (-6% up to 50%)!
• Speedup 2-3% (-4% up to 70%)
![Page 38: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/38.jpg)
Results: AMDGPU Target
![Page 39: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/39.jpg)
Results: AMDGPU Target
![Page 40: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/40.jpg)
Future Work
• Support partially dead/undef operands
• Early splitting and rematerialization (before register limit)
• Partial registers spilling
• Consider partial liveness in register pressure tracking
• Missed optimizations (no obvious use/def relation for lanes)
![Page 41: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/41.jpg)
Thank You for Your Attention!
![Page 42: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/42.jpg)
Thank you for your attention!
Backup Slides
![Page 43: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/43.jpg)
Register Hierarchies• CPU registers can overlap. Partial register accessible by subregister.
Also called lanes (Vector Regs)
AH
AX
AL
EAX
RAX
X86 GP Register
S0D0
Q0
S1D1
S2 S3
ARM FP Registermovw 0xABCD, %ax # Put 16bits into %ax movb %al, x # Uses lower 8 bits: 0xCD movb %ah, y # Uses upper 8 bits: 0xAB
![Page 44: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/44.jpg)
Register Allocation Pipeline
PHIElimination
TwoAddressInstruction
RegisterCoalescer
ProcessImplicitDefs
DetectDeadLanes
RenameIndependentSubregs
MachineScheduler
RegAllocGreedy
VirtRegRewriter
StackSlotColoring
![Page 45: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/45.jpg)
Subregister Indexes• Subregister indexes relate wide/
small registers on virtual registers
• Writes may be marked undef if other parts of register do not matter
• LLVM synthesizes combined indexes (`sub0_low16bits`)
Register Allocation:
%0 = load_x4 %1.sub0<undef> = add %0.sub2, 13 %1.sub1 = const 42 store_x2 %1
r4_r5_r6_r7 = load_x4 r0 = add r6, 13 r1 = const 42 store_x2 r0_r1
![Page 46: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/46.jpg)
Slot Indexes• Position in a program; Each instruction is assigned a number (incremented
by 4 so we need to renumber less often when inserting instructions)
• Slots describe position in the instruction:
• Block/Base (Block begin/end, PHI-defs)
• EarlyClobber (early point to force interference with normal def/use)
• Register (normal def/uses use this)
• Dead (liveness of dead definitions ends here)
![Page 47: Dealing With Register Hierarchies - LLVMllvm.org/devmtg/2016-11/Slides/Braun-DealingWithRegister...%2 = add %0, %1 return %2 r0 = const 5 r1 = const 7 r0 = add r0, r1 return r0 Register](https://reader036.fdocuments.net/reader036/viewer/2022071015/5fce0fae01e843551a605654/html5/thumbnails/47.jpg)
Constraints & Classes• A register class is set of registers; Models register constraints
• Class defined for each register operand of LLVM MI Instruction (MCInstrDesc)
• Each virtual register has a class EAX EBX
ECX EDX ESI
ESP EBPEDI
GR32 class
R8 ...