Lec Jan29 2009

36
Anshul Kumar, CSE IITD CSL718 : Superscalar Processors CSL718 : Superscalar CSL718 : Superscalar Processors Processors Issue and Despatch 29th Jan, 2009

description

 

Transcript of Lec Jan29 2009

Page 1: Lec Jan29 2009

Anshul Kumar, CSE IITD

CSL718 : Superscalar Processors

CSL718 : Superscalar CSL718 : Superscalar ProcessorsProcessors

Issue and Despatch29th Jan, 2009

Page 2: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 2

Early proposals/prototypesEarly proposals/prototypesEarly proposals/prototypes

1982 1983 1984 1985 1986 1987 1988 1989

IBM

DEC

Stanford U

Kyushu U

Cheetah America project(4)

Multititan project(2)

Match(2) Torch(4)

SIMP(4) DSNS(4)

TermSuperscalar

Page 3: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 3

Commercial superscalarsCommercial Commercial superscalarssuperscalars

RISCs• Intel 960KA/KB ⇒ 960CA (3) 1989• IBM Power 1 RS/6000 (4) 1990• HP PA7000 ⇒ PA7100 (2) 1992• SUN SPARC ⇒ SuperSparc (3) 1992• DEC Alpha 21064(2) 1992• Motorola MC88100 ⇒ MC88110(2) 1993• Motorola PowerPC 601/603 (3) 1993• MIPS R4000 ⇒ R8000(4) 1994

Page 4: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 4

Commercial superscalarsCommercial Commercial superscalarssuperscalars

CISCs• Intel 80486 ⇒ Pentium (2) 1993• Motorola MC68040 ⇒ MC68060 (2) 1993• Gmicro Gmicro/100p ⇒

Gmicro 500 (2) 1993• AMD K5(2) – 4 RISC instr 1995• CYRIX M1 (2) 1995

Page 5: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 5

Tasks of superscalar processingTasks of superscalar processingTasks of superscalar processing

Parallel Parallel Preserving thedecoding instruction sequential and issue execution consistency of

instruction executionand

exception processing

Page 6: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 6

Superscalar decode and issueSuperscalar decode and issueSuperscalar decode and issue

I - cache

Instructionbuffer

Decode & Issue

IF D/I

ScalarIssue

I - cache

Instructionbuffer

Decode & Issue

IF D I

SuperscalarIssue

Page 7: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 7

Parallel DecodingParallel DecodingParallel Decoding

• Fetch multiple instructions in instruction buffer

• Decode multiple instructions in parallel – instruction window

• Possibly check dependencies among these as well as with the instructions already under execution

Page 8: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 8

Reducing decoding timeReducing decoding timeReducing decoding time

PrePre--decodingdecoding• Do partial decoding while

instructions are being loaded in I-cache

• Decoded information is appended to the instruction

• This includes instruction class, resources required etc.

Second level cacheor main memory

Pre-decode unit

I - cache

N bits/cycle

N + n bits/cycle

Page 9: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 9

Pre-decoding examplesPrePre--decoding examplesdecoding examples

Processor No. of predecode bitsPA 7200 (1995) 5PA 8000 (1996) 5PowerPC 620(1996) 7UltraSparc (1995) 4HAL PM1 (1995) 4AMD K5 (1995) 5 (per byte)R 10000 (1996) 4

Page 10: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 10

Blocking during issueBlocking Blocking during issueduring issue

EU EU EU

Decode Check & Issue

Instructionbuffer

issue window

Decode and issue instructions directly to EUs

Instructions may be blocked due to data dependency

Page 11: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 11

Non-blocking IssueNonNon--blockingblocking IssueIssue

Reservationstation

Dep. Checking/dispatch

EU

Reservationstation

Dep. Checking/dispatch

EU

Reservationstation

Dep. Checking/dispatch

EU

Decode & Issue

Instructionbuffer

From buffers dispatch to EUs

Decode and issue to buffers

Page 12: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 12

Handling of Issue BlockagesHandling of Issue BlockagesHandling of Issue Blockages

Preserving issue order Alignment of instruction issue

aligned unalignedin-order out of order

Page 13: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 13

Issue OrderIssue OrderIssue Order

cd abe

a

Issue windowInstructionsto be issued

Instructionsissued

cd abe

a

Issue windowInstructionsto be issued

Instructionsissued

Issue in strict program order Out of order Issue

c

Example: MC 88110, PowerPC 601

Independent instructionDependent instructionIssued instruction

Page 14: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 14

AlignmentAlignmentAlignment

cd abe

a

fixed windowcheckedin cycle 1

Aligned Issue Unaligned Issue

issuedin cycle 1

fgh

next window

cd be

b

checkedin cycle 2

issuedin cycle 2

fgh

de

d

checkedin cycle 3

issuedin cycle 3

fgh

c

cd abe

a

gliding window

fgh

cd be

b

fgh

defgh

c

def

Page 15: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 15

Design space in instruction issueDesign Design spacespace in instruction issuein instruction issue

Coping with Coping with Use of Handling of Issuefalse data unresolved RSs issue blockages ratedependencies control (2-6)

dependencies

no Registerrenaming wait speculative

blocking non-blocking

Page 16: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 16

Frequently used issue policiesFrequently used issue Frequently used issue policiespolicies

Traditional Traditional Traditional Traditionalscalar issue scalar issue scalar issue scalar issue

with RSs with RSs with spec. and renaming execution

CDC 6600 IBM 360/91i386MC68030R3000Sparc

I486MC68040R4000MicroSparc

in scalar processorsinin scalar processorsscalar processors

Page 17: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 17

Frequently used issue policiesFrequently used issue Frequently used issue policiespolicies

Straightforward Straightforward Straight forward Advancedsuperscalar superscalar superscalar superscalarissue issue with issue with issue

RSs renaming (renaming+RSs)

aligned unaligned (speculative execution in all)

PentiumPowerPC601PA7100SuperSparcAlpha21164

MC68060PA7200UltraSparc

MC88110R8000

PowerPC602

R10000PentiumProPowerPC602PA8000Sparc64Am29000K5

in super scalar processorsinin super scalar processorssuper scalar processors

Page 18: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 18

Design Space of Reservation StationsDesign Space of Design Space of Reservation StationsReservation Stations

Scope Layout of Operand fetch Instructionreservation policy dispatch schemestations

partial full

Page 19: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 19

Layout of Reservation StationsLayout of Layout of Reservation StationsReservation Stations

Type Number of Number of readbuffer entries and write ports

Stand combined withalone renaming and(RS) reordering

individual 2-4group 6-16central 20total 15-40

depends onno. of EUsconnected

Page 20: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 20

Reservation Stations (RS)Reservation Stations (RS)Reservation Stations (RS)

EU EU EU EU EU EU EU EU

RS RS RS RS RS

Individual RSs Group RSs Central RS

Page 21: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 21

Operand Fetch PoliciesOperand Fetch PoliciesOperand Fetch Policies

Issueboundfetch

Dispatchboundfetch

Page 22: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 22

Issue bound operand fetch (with single register file)

Issue bound operand fetchIssue bound operand fetch (with single register file)(with single register file)

EU EU

RS RS

EU EU

RS RS

Decode/issue

RF

instructiondata

Page 23: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 23

Dispatch bound operand fetch (with single register file)

Dispatch bound operand fetch Dispatch bound operand fetch (with single register file)(with single register file)

EU EU

RS RS

EU EU

RS RS

Decode/issueinstructiondata

RF

Page 24: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 24

Issue bound operand fetch (with multiple register files)

Issue bound operand fetchIssue bound operand fetch (with multiple register files)(with multiple register files)

EU EU

RS RS

EU EU

RS RS

Decode/issue

RF RF

instructiondata

Page 25: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 25

Dispatch bound operand fetch (with multiple register files)

Dispatch bound operand fetch Dispatch bound operand fetch (with multiple register files)(with multiple register files)

EU EU

RS RS

EU EU

RS RS

Decode/issueinstructiondata

RF RF

Page 26: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 26

Updating RFs and RSsUpdating Updating RFsRFs and and RSsRSs

EU EU

RS RS

EU EU

RS RS

Decode/issue

RF RF

instructiondata

Page 27: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 27

Instruction dispatch schemeInstruction dispatch schemeInstruction dispatch scheme

Dispatch Dispatch Checking Treatment ofpolicy rate operand empty RS

availability

single multipleinstr/ instr/cycle cycleIndividual RS Group or central RS

Page 28: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 28

Dispatch policyDispatch policyDispatch policy

Selection Arbitration Dispatchrule rule order

Rule for identifyinginstructions which areready for execution(data dependency check)

Rule for choosingone out of severalready instructions(earlier instruction has priority)

Page 29: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 29

Dispatch orderDispatch orderDispatch order

in-order partially out ofout of orderorder

RS RScheck check

Page 30: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 30

Checking availability of operandsChecking availability of operandsChecking availability of operands

Direct check of Check of explicit score-board bits status bits in RS

(usual for dispatch (usual for issuebound operand fetch) bound operand fetch)

control flow approach data flow approachFlynn’s terminology

Page 31: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 31

Score-boardScoreScore--boardboard

RegisterFile

101

10

012

Data status

Introduced with CDC6600

Page 32: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 32

Checking in dispatch bound fetchChecking in dispatch bound fetchChecking in dispatch bound fetch

RegisterFile

Reservationstation

OC Rs1 Rs2 Rd

EU

decodedinstruction

check V bits of sources

update Rdset V bitRs1,Rs2,Rd

reset V bit of Rd

OC(opcode)

Os1

Os2 (operand value)

result, Rd

Page 33: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 33

Checking in issue bound fetchChecking in issue bound fetchChecking in issue bound fetch

OC Os1/Is1 Vs1 Os2/Is2 Vs2 Rd

EU

decodedinstruction

OC, Os1, Os2, Rd

result, Rd

RegisterFile

update Rd, set V bitRs1,Rs2,Rdreset V bit of Rd

Os1

Os2 (operand value)

Reservation station check Vs1, Vs2

associative update ofIs1, Is2 with Rd, set Vs bits

Page 34: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 34

Treatment of an empty RSTreatment of an empty RSTreatment of an empty RS

Straight forward Bypassingapproach RS if empty

RS At least onecycle stay in RS

EU

RS

EUNx586 Sparc64PowerPc 604

Page 35: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 35

Approaches in dispatchingApproaches in dispatchingApproaches in dispatching

Straight forward Enhanced Advancedin order partially out of order out of ordersingle single multiple

instr/cycle instr/cycle instr/cycleindividual RSs individual RSs group/central RSs

Power1, PPC603 Power2 PM1, PentiumProNx586, Am29000 PPC604,620 PA8000, R10000

Page 36: Lec Jan29 2009

Anshul Kumar, CSE IITD slide 36

ReferenceReferenceReference1. D. Sima, T. Fountain, P. Kacsuk, "Advanced Computer

Architectures : A Design Space Approach", Addison Wesley, 1997.