HW Speculation

31
CS/EE 5810 CS/EE 6810 F00: 1 HW Speculation

description

HW Speculation. HW Support for Speculation. Ideal View Do conditional things in advance of the branch Undo them if the branch goes the wrong way Also implies undoing things on exception Limits Speculated Values can’t overwrite any real results - PowerPoint PPT Presentation

Transcript of HW Speculation

Page 1: HW Speculation

CS/EE 5810CS/EE 6810F00: 1

HW Speculation

Page 2: HW Speculation

CS/EE 5810CS/EE 6810F00: 2

HW Support for Speculation

• Ideal View

– Do conditional things in advance of the branch

– Undo them if the branch goes the wrong way

– Also implies undoing things on exception

• Limits

– Speculated Values can’t overwrite any real results

– Exceptions can’t cause any destructive activity

• Consider storage for destructive operations

– Registers and data caches

– Any ideas on what might help here?

Page 3: HW Speculation

CS/EE 5810CS/EE 6810F00: 3

HW Speculation Aids

• Poison bits on registers

– Fault if regular instruction tries to use that value

– HW (and OS) ignores exception until instruction commits

• More general tags

– Speculative instruction and results must be tagged until condition is resolved

• Boosting

– Provide separate shadow resources for boosted instruction

– If condition resolves in a way that selects the boosted path» Then the results are committed to real registers

» Note that this won’t work for memory…

Page 4: HW Speculation

CS/EE 5810CS/EE 6810F00: 4

Aggression levels in Speculation

• Consider if-then-else block

• Traditional conservative method

– Do them in order…

• Using prediction

– Start predicted path while evaluation condition

– Either continue or nullify based on condition result

• Aggressive

– Start all three blocks

– When condition is known, nullify unselected path

– Implies LOTS of resources

Page 5: HW Speculation

CS/EE 5810CS/EE 6810F00: 5

Hardware-Based Speculation

• Combination of three key ideas

– Dynamic branch prediction

– Speculation - allow speculated blocks to start before condition resolution

– Dynamic scheduling (Tomasulo-style)

• Advantages» More instruction order flexibility - things tend to run as

soon as they can

» Dynamic memory disambiguation

» Dynamic branch prediction works better than static

» Able to maintain precise exceptions (not easy, but doable)

» Relieves compiler from difficult machine-specific stuff

Page 6: HW Speculation

CS/EE 5810CS/EE 6810F00: 6

HW Speculation Approach

• Allow out of order ISSUE

– Require in-order commit when instruction is no longer speculative

– Prevent speculative changes from changing state» e.g. memory write or register write

• Collect pre-commit instructions

– in a reorder buffer» holds completed but not committed instruction

– Effectively contains a set of virtual registers» similar to a reservation station

» and becomes a bypass (forwarding) source

Page 7: HW Speculation

CS/EE 5810CS/EE 6810F00: 7

HW support for More ILP

• Need HW buffer for results of uncommitted instructions: reorder buffer

– 3 fields: instr, destination, value– Reorder buffer can be operand

source => more registers like RS– Use reorder buffer number instead of

reservation station when execution completes

– Supplies operands between execution complete & commit

– Once operand commits, result is put into register

– Instructions commit– As a result, its easy to undo

speculated instructions on mispredicted branches or on exceptions

ReorderBuffer

FP Regs

FPOp

Queue

FP Adder FP Adder

Res Stations Res Stations

Page 8: HW Speculation

CS/EE 5810CS/EE 6810F00: 8

Four Steps of Speculative Tomasulo Algorithm

1. Issue—get instruction from FP Op Queue If reservation station and reorder buffer slot free, issue instr &

send operands & reorder buffer no. for destination (this stage sometimes called “dispatch”)

2. Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch

CDB for result; when both in reservation station, execute; checks RAW (sometimes called “issue”)

3. Write result—finish execution (WB) Write on Common Data Bus to all awaiting FUs

& reorder buffer; mark reservation station available.4. Commit—update register with reorder result

When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called “graduation”)

Page 9: HW Speculation

Time Name Busy Op Vj Vk Qj Qk Dest

0 Add1 No Reservation

0 Add2 No Stations

0 Add3 No

0 Mult1 No

0 Mult2 No

Busy Address

Entry Busy Instruction State Destination Value Load1

1 Load2

2 Load3

3

4

5

6 Reorder Buffer

7

8

9

10

F0 F2 F4 F6 F8 F10 F12 ... F30

Reorder #

Busy no no no no no no no no

Tomasulo With Reorder Buffer - Cycle 0

Page 10: HW Speculation

Time Name Busy Op Vj Vk Qj Qk Dest

0 Add1 No Reservation

0 Add2 No Stations

0 Add3 No

0 Mult1 No

0 Mult2 No

Busy Address

Entry Busy Instruction State Destination Value Load1 Yes 34+Regs[R2]

1 Yes LD F6, 34(R2) Issue F6 Load2

2 Load3

3

4

5

6 Reorder Buffer

7

8

9

10

F0 F2 F4 F6 F8 F10 F12 ... F30

Reorder # #1

Busy no no no Yes no no no no

Tomasulo With Reorder Buffer - Cycle 1

Page 11: HW Speculation

Time Name Busy Op Vj Vk Qj Qk Dest

0 Add1 No Reservation

0 Add2 No Stations

0 Add3 No

0 Mult1 No

0 Mult2 No

Busy Address

Entry Busy Instruction State Destination Value Load1 Yes 34+Regs[R2]

1 Yes LD F6, 34(R2) Ex1 F6 Load2 Yes 45+Regs[R3]

2 Yes LD F2, 45(R3) Issue F2 Load3

3

4

5

6 Reorder Buffer

7

8

9

10

F0 F2 F4 F6 F8 F10 F12 ... F30

Reorder # #2 #1

Busy no Yes no Yes no no no no

Tomasulo With Reorder Buffer - Cycle 2

Page 12: HW Speculation

Time Name Busy Op Vj Vk Qj Qk Dest

0 Add1 No Reservation

0 Add2 No Stations

0 Add3 No

0 Mult1 Yes Mult Regs[F4] #2 #3

0 Mult2 No

Busy Address

Entry Busy Instruction State Destination Value Load1 No

1 Yes LD F6, 34(R2) write F6 Mem[load1] Load2 Yes 45+Regs[R3]

2 Yes LD F2, 45(R3) Ex1 F2 Load3

3 Yes MULT F0, F2, F4 Issue F0

4

5

6 Reorder Buffer

7

8

9

10

F0 F2 F4 F6 F8 F10 F12 ... F30

Reorder # #3 #2 #1

Busy Yes Yes no Yes no no no no

Tomasulo With Reorder Buffer - Cycle 3

Page 13: HW Speculation

Time Name Busy Op Vj Vk Qj Qk Dest

0 Add1 Yes SUB Regs[F6] Mem[45+Regs[R3]] #4 Reservation

0 Add2 No Stations

0 Add3 No

0 Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] #3

0 Mult2 No

Busy Address

Entry Busy Instruction State Destination Value Load1 No

1 No LD F6, 34(R2) commit F6 Mem[load1] Load2 No

2 Yes LD F2, 45(R3) write F2 Mem[load2] Load3

3 Yes MULT F0, F2, F4 EX1 F0

4 Yes SUBD F8, F6, F2 Issue F8

5

6 Reorder Buffer

7

8

9

10

F0 F2 F4 F6 F8 F10 F12 ... F30

Reorder # #3 #2 #4

Busy Yes Yes no no Yes no no no

Tomasulo With Reorder Buffer - Cycle 4

Page 14: HW Speculation

Time Name Busy Op Vj Vk Qj Qk Dest

0 Add1 Yes SUB Regs[F6] Mem[45+Regs[R3]] #4 Reservation

0 Add2 No Stations

0 Add3 No

0 Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] #3

0 Mult2 Yes DIV Regs[F6] #3 #5

Busy Address

Entry Busy Instruction State Destination Value Load1 No

1 No LD F6, 34(R2) commit F6 Mem[load1] Load2 No

2 No LD F2, 45(R3) commit F2 Mem[load2] Load3

3 Yes MULT F0, F2, F4 Ex2 F0

4 Yes SUBD F8, F6, F2 Ex1 F8

5 Yes DIVD F10, F0, F6 Issue F10

6 Reorder Buffer

7

8

9

10

F0 F2 F4 F6 F8 F10 F12 ... F30

Reorder # #3 #4 #5

Busy Yes no no no Yes Yes no no

Tomasulo With Reorder Buffer - Cycle 5

Page 15: HW Speculation

Time Name Busy Op Vj Vk Qj Qk Dest

0 Add1 Yes SUB Regs[F6] Mem[45+Regs[R3]] #4 Reservation

0 Add2 Yes Add Regs[F2] #4 #6 Stations

0 Add3 No

0 Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] #3

0 Mult2 Yes DIV Regs[F6] #3 #5

Busy Address

Entry Busy Instruction State Destination Value Load1 No

1 No LD F6, 34(R2) commit F6 Mem[load1] Load2 No

2 No LD F2, 45(R3) commit F2 Mem[load2] Load3

3 Yes MULT F0, F2, F4 Ex3 F0

4 Yes SUBD F8, F6, F2 Ex2 F8

5 Yes DIVD F10, F0, F6 Issue F10

6 Yes ADDD F6, F8, F2 Issue F6 Reorder Buffer

7

8

9

10

F0 F2 F4 F6 F8 F10 F12 ... F30

Reorder # #3 #6 #4 #5

Busy Yes no no Yes Yes Yes no no

Tomasulo With Reorder Buffer - Cycle 6

Page 16: HW Speculation

Time Name Busy Op Vj Vk Qj Qk Dest

0 Add1 No Reservation

0 Add2 Yes Add #4 Regs[F2] #6 Stations

0 Add3 No

0 Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] #3

0 Mult2 Yes DIV Regs[F6] #3 #5

Busy Address

Entry Busy Instruction State Destination Value Load1 No

1 No LD F6, 34(R2) commit F6 Mem[load1] Load2 No

2 No LD F2, 45(R3) commit F2 Mem[load2] Load3

3 Yes MULT F0, F2, F4 Ex4 F0

4 Yes SUBD F8, F6, F2 write F8 F6 - #2

5 Yes DIVD F10, F0, F6 Issue F10

6 Yes ADDD F6, F8, F2 EX1 F6 Reorder Buffer

7

8

9

10

F0 F2 F4 F6 F8 F10 F12 ... F30

Reorder # #3 #6 #4 #5

Busy Yes no no Yes Yes Yes no no

Tomasulo With Reorder Buffer - Cycle 7

Page 17: HW Speculation

Time Name Busy Op Vj Vk Qj Qk Dest

0 Add1 No Reservation

0 Add2 Yes Add #4 Regs[F2] #6 Stations

0 Add3 No

0 Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] #3

0 Mult2 Yes DIV Regs[F6] #3 #5

Busy Address

Entry Busy Instruction State Destination Value Load1 No

1 No LD F6, 34(R2) commit F6 Mem[load1] Load2 No

2 No LD F2, 45(R3) commit F2 Mem[load2] Load3

3 Yes MULT F0, F2, F4 Ex5 F0

4 Yes SUBD F8, F6, F2 write F8 F6 - #2

5 Yes DIVD F10, F0, F6 Issue F10

6 Yes ADDD F6, F8, F2 Ex2 F6 Reorder Buffer

7

8

9

10

F0 F2 F4 F6 F8 F10 F12 ... F30

Reorder # #3 #6 #4 #5

Busy Yes no no Yes Yes Yes no no

Tomasulo With Reorder Buffer - Cycle 8

Page 18: HW Speculation

Time Name Busy Op Vj Vk Qj Qk Dest

0 Add1 No Reservation

0 Add2 Yes Add #4 Regs[F2] #6 Stations

0 Add3 No

0 Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] #3

0 Mult2 Yes DIV Regs[F6] #3 #5

Busy Address

Entry Busy Instruction State Destination Value Load1 No

1 No LD F6, 34(R2) commit F6 Mem[load1] Load2 No

2 No LD F2, 45(R3) commit F2 Mem[load2] Load3

3 Yes MULT F0, F2, F4 Ex6 F0

4 Yes SUBD F8, F6, F2 write F8 F6 - #2

5 Yes DIVD F10, F0, F6 Issue F10

6 Yes ADDD F6, F8, F2 write F6 #4 + F2 Reorder Buffer

7

8

9

10

F0 F2 F4 F6 F8 F10 F12 ... F30

Reorder # #3 #6 #4 #5

Busy Yes no no Yes Yes Yes no no

Tomasulo With Reorder Buffer - Cycle 9

Page 19: HW Speculation

Time Name Busy Op Vj Vk Qj Qk Dest

0 Add1 No Reservation

0 Add2 No Stations

0 Add3 No

0 Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] #3

0 Mult2 Yes DIV Regs[F6] #3 #5

Busy Address

Entry Busy Instruction State Destination Value Load1 No

1 No LD F6, 34(R2) commit F6 Mem[load1] Load2 No

2 No LD F2, 45(R3) commit F2 Mem[load2] Load3

3 Yes MULT F0, F2, F4 Ex7 F0

4 Yes SUBD F8, F6, F2 write F8 F6 - #2

5 Yes DIVD F10, F0, F6 Issue F10

6 Yes ADDD F6, F8, F2 write F6 #4 + F2 Reorder Buffer

7

8

9

10

F0 F2 F4 F6 F8 F10 F12 ... F30

Reorder # #3 #6 #4 #5

Busy Yes no no Yes Yes Yes no no

Tomasulo With Reorder Buffer - Cycle 10

Page 20: HW Speculation

Time Name Busy Op Vj Vk Qj Qk Dest

0 Add1 No Reservation

0 Add2 No Stations

0 Add3 No

0 Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] #3

0 Mult2 Yes DIV Regs[F6] #3 #5

Busy Address

Entry Busy Instruction State Destination Value Load1 No

1 No LD F6, 34(R2) commit F6 Mem[load1] Load2 No

2 No LD F2, 45(R3) commit F2 Mem[load2] Load3

3 Yes MULT F0, F2, F4 Ex8 F0

4 Yes SUBD F8, F6, F2 write F8 F6 - #2

5 Yes DIVD F10, F0, F6 Issue F10

6 Yes ADDD F6, F8, F2 write F6 #4 + F2 Reorder Buffer

7

8

9

10

F0 F2 F4 F6 F8 F10 F12 ... F30

Reorder # #3 #6 #4 #5

Busy Yes no no Yes Yes Yes no no

Tomasulo With Reorder Buffer - Cycle 11

Page 21: HW Speculation

Time Name Busy Op Vj Vk Qj Qk Dest

0 Add1 No Reservation

0 Add2 No Stations

0 Add3 No

0 Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] #3

0 Mult2 Yes DIV Regs[F6] #3 #5

Busy Address

Entry Busy Instruction State Destination Value Load1 No

1 No LD F6, 34(R2) commit F6 Mem[load1] Load2 No

2 No LD F2, 45(R3) commit F2 Mem[load2] Load3

3 Yes MULT F0, F2, F4 Ex9 F0

4 Yes SUBD F8, F6, F2 write F8 F6 - #2

5 Yes DIVD F10, F0, F6 Issue F10

6 Yes ADDD F6, F8, F2 write F6 #4 + F2 Reorder Buffer

7

8

9

10

F0 F2 F4 F6 F8 F10 F12 ... F30

Reorder # #3 #6 #4 #5

Busy Yes no no Yes Yes Yes no no

Tomasulo With Reorder Buffer - Cycle 12

Page 22: HW Speculation

Time Name Busy Op Vj Vk Qj Qk Dest

0 Add1 No Reservation

0 Add2 No Stations

0 Add3 No

0 Mult1 No

0 Mult2 Yes DIV #2xRegs[F4] Regs[F6] #5

Busy Address

Entry Busy Instruction State Destination Value Load1 No

1 No LD F6, 34(R2) commit F6 Mem[load1] Load2 No

2 No LD F2, 45(R3) commit F2 Mem[load2] Load3

3 Yes MULT F0, F2, F4 write F0 #2 x Regs[F4]

4 Yes SUBD F8, F6, F2 write F8 F6 - #2

5 Yes DIVD F10, F0, F6 Ex1 F10

6 Yes ADDD F6, F8, F2 write F6 #4 + F2 Reorder Buffer

7

8

9

10

F0 F2 F4 F6 F8 F10 F12 ... F30

Reorder # #3 #6 #4 #5

Busy Yes no no Yes Yes Yes no no

Tomasulo With Reorder Buffer - Cycle 13

Figure 4.35

P 313

Page 23: HW Speculation

Time Name Busy Op Vj Vk Qj Qk Dest

0 Add1 No Reservation

0 Add2 No Stations

0 Add3 No

0 Mult1 No

0 Mult2 Yes DIV #2xRegs[F4] Regs[F6] #5

Busy Address

Entry Busy Instruction State Destination Value Load1 No

1 No LD F6, 34(R2) commit F6 Mem[load1] Load2 No

2 No LD F2, 45(R3) commit F2 Mem[load2] Load3

3 No MULT F0, F2, F4 commit F0 #2 x Regs[F4]

4 Yes SUBD F8, F6, F2 write F8 F6 - #2

5 Yes DIVD F10, F0, F6 Ex2 F10

6 Yes ADDD F6, F8, F2 write F6 #4 + F2 Reorder Buffer

7

8

9

10

F0 F2 F4 F6 F8 F10 F12 ... F30

Reorder # #6 #4 #5

Busy No no no Yes Yes Yes no no

Tomasulo With Reorder Buffer - Cycle 14

Page 24: HW Speculation

Time Name Busy Op Vj Vk Qj Qk Dest

0 Add1 No Reservation

0 Add2 No Stations

0 Add3 No

0 Mult1 No

0 Mult2 Yes DIV #2xRegs[F4] Regs[F6] #5

Busy Address

Entry Busy Instruction State Destination Value Load1 No

1 No LD F6, 34(R2) commit F6 Mem[load1] Load2 No

2 No LD F2, 45(R3) commit F2 Mem[load2] Load3

3 No MULT F0, F2, F4 commit F0 #2 x Regs[F4]

4 No SUBD F8, F6, F2 commit F8 F6 - #2

5 Yes DIVD F10, F0, F6 Ex3 F10

6 Yes ADDD F6, F8, F2 write F6 #4 + F2 Reorder Buffer

7

8

9

10

F0 F2 F4 F6 F8 F10 F12 ... F30

Reorder # #6 #5

Busy no no no Yes no Yes no no

Tomasulo With Reorder Buffer - Cycle 15

Page 25: HW Speculation

Time Name Busy Op Vj Vk Qj Qk Dest

0 Add1 No Reservation

0 Add2 No Stations

0 Add3 No

0 Mult1 No

0 Mult2 Yes DIV #2xRegs[F4] Regs[F6] #5

Busy Address

Entry Busy Instruction State Destination Value Load1 No

1 No LD F6, 34(R2) commit F6 Mem[load1] Load2 No

2 No LD F2, 45(R3) commit F2 Mem[load2] Load3

3 No MULT F0, F2, F4 commit F0 #2 x Regs[F4]

4 No SUBD F8, F6, F2 commit F8 F6 - #2

5 Yes DIVD F10, F0, F6 Ex4 F10

6 Yes ADDD F6, F8, F2 write F6 #4 + F2 Reorder Buffer

7

8

9

10

F0 F2 F4 F6 F8 F10 F12 ... F30

Reorder # #6 #5

Busy no no no Yes no Yes no no

Tomasulo With Reorder Buffer - Cycle 16

Need 36 more EX cycles for

DIV to finish…

Page 26: HW Speculation

CS/EE 5810CS/EE 6810F00: 26

Executing Across Branches

• One implication of the reorder buffer is that you can more easily maintain precise interrupts

• Another is supporting speculation across branches

– Speculated state is in the reorder buffer

– State which is written, but not yet committed is available for future speculated operations

Page 27: HW Speculation

0 Add1 No Reservation

0 Add2 No Stations

0 Add3 No

0 Mult1 No MULT Mem[0+Regs[R1]] Regs[F2] #2

0 Mult2 No MULT Mem[0+Regs[R1]] Regs[F2] #7

Busy Address

Entry Busy Instruction State Destination Value Load1 No

1 No LD F0, 0(R1) commit F0 Mem[0+R1] Load2 No

2 No MULT F4, F0, F2 commit F4 F0 x F2 Load3

3 Yes SD 0(R1), F4 write 0+Reg[R1] #2

4 Yes SUBI R1, R1, 8 write R1 R1 - 8

5 Yes BNEZ R1, Loop write

6 Yes LD F0, 0(R1) write F0 Mem[#4] Reorder Buffer

7 Yes MULT F4, F0, F2 write F4 #6 X F2

8 Yes SD 0(R1), F4 write 0+Regs[R1] #7

9 Yes SUBI R1, R1, 8 write R1 #4 - 8

10 Yes BNEZ R1, Loop write

F0 F2 F4 F6 F8 F10 F12 ... F30

Reorder # 6 7

Busy yes no yes no no no no no

Example of Speculative State of Reorder Buffer

First loop

Second loop

Multiply has just reached commit, so other instructions can start committing

Page 28: HW Speculation

CS/EE 5810CS/EE 6810F00: 28

Renaming Registers

• Common variation of speculative design

• Reorder buffer keeps instruction information but not the result

• Extend register file with extra renaming registers to hold speculative results

• Rename register allocated at issue; result into rename register on execution complete; rename register into real register on commit

• Operands read either from register file (real or speculative) or via Common Data Bus

• Advantage: operands are always from single source (extended register file)

Page 29: HW Speculation

CS/EE 5810CS/EE 6810F00: 29

Dynamic Scheduling in PowerPC 604 and Pentium Pro

• Both In-order Issue, Out-of-order execution, In-order Commit

Pentium Pro more like a scoreboard since central control vs. distributed

Page 30: HW Speculation

CS/EE 5810CS/EE 6810F00: 30

Dynamic Scheduling in PowerPC 604 and Pentium Pro

Parameter PPC PPro

Max. instructions issued/clock 4 3

Max. instr. complete exec./clock 6 5

Max. instr. commited/clock 6 3

Window (Instrs in reorder buffer) 16 40

Number of reservations stations 12 20

Number of rename registers 8int/12FP 40

No. integer functional units (FUs) 2 2No. floating point FUs 1 1 No. branch FUs 1 1 No. complex integer FUs 1 0No. memory FUs 1 1 load +1 store

Q: How pipeline 1 to 17 byte x86 instructions?

Page 31: HW Speculation

CS/EE 5810CS/EE 6810F00: 31

Dynamic Scheduling in Pentium Pro

• PPro doesn’t pipeline 80x86 instructions

• PPro decode unit translates the Intel instructions into 72-bit micro-operations ( DLX)

• Sends micro-operations to reorder buffer & reservation stations

• Takes 1 clock cycle to determine length of 80x86 instructions + 2 more to create the micro-operations

•12-14 clocks in total pipeline ( 3 state machines)

• Many instructions translate to 1 to 4 micro-operations

• Complex 80x86 instructions are executed by a conventional microprogram (8K x 72 bits) that issues long sequences of micro-operations