10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose...

58
10 June 2015 1 Mill Computing, Inc. Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture

Transcript of 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose...

Page 1: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 1Mill Computing, Inc. Patents pending

One of a series…

Drinking from the Firehose

Compilation for a Belt Architecture

Page 2: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 2Mill Computing, Inc. Patents pending

Talks in this series

1. Encoding2. The Belt3. Memory4. Prediction5. Metadata6. Execution7. Security8. Specification9. Pipelining10.Compiling11. …

You are here

Slides and videos of other talks are at:

MillComputing.com/docs

Page 3: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 3Mill Computing, Inc. Patents pending

Caution!

Gross over-simplification!

This talk tries to convey an intuitive understanding to the non-specialist.

The reality is more complicated.

(we try not to over-simplify, but sometimes…)

Page 4: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 4Mill Computing, Inc. Patents pending

Specification

abstract Mill CPU architecture

family members Tin

Copper

Silver

Gold

The Mill is a family of member CPUs sharing an abstract operation set and micro-architecture.

specification driven

Members differ in concrete operation set and micro-architecture..

A designers describes a concrete member by writing a specification.

Page 5: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 5Mill Computing, Inc. Patents pending

Specification

abstract Mill CPU architecture

family members Tin

Silver

Gold

tools compiler

asmdebugge

rHWgensim

Toolchain software automatically creates system software, verification tests, documentation, and a hardware framework for the new member from the specification.

specification driven

Copper

data driven

Page 6: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 6Mill Computing, Inc. Patents pending

Late binding to family member

Mill compiles to the abstract target – the universal superset

Mill specializes to the concrete target – the executing family member

clang

LLVM middle

LLVM back

C++

genForm

specializer

prelinker

postlinker

genAsm

genassembler

conassembler

conForm

conAsm

CPU

target

This talk is mostly about the specializer

Page 7: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 7Mill Computing, Inc. Patents pending

Specializer inputs: member specification

Micro-architecture attributes:

functional unit populationsupported data sizesresource constraints

Operation attributes: (1000+)+: 1*: 3-: 1&: 1retn:

0

op latency

issue→retire latencyarg/result count, sizebit encoding

Large static data structure, dynamically linkedMechanically generated from ~2 page spec

Page 8: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 8Mill Computing, Inc. Patents pending

Specializer inputs: code

int foo(int a, b, c, d) { return (a-(b+c)) & ((b+c)*d);}

Static Single Assignment dataflow

define i32 @foo(i32 %a, i32 %b, i32 %c, i32 %d) {entry: %1 = add %b %c %2 = sub %a %1 %3 = mul %1 %d %4 = and %2 %3 ret %4}

a b dc

*-

&

retn

+

function args

Page 9: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 9Mill Computing, Inc. Patents pending

Substitution pass

Goal: replace unsupported ops with emulation code

Walk graphFor each op, check spec for supportReplace unsupported with inline functionInline may call out-of-line code

Only a subset of operations exist in hardwareFew members have native decimal, or quad

*-

&

retn

+

function args

call

Page 10: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 10Mill Computing, Inc. Patents pending

shiftmul

Wide issue

The Mill is wide-issue, like a VLIW or EPIC

mul shiftadd

PC

slot # 0 1 2

instruction

Instruction slots correspond to function pipelines

mult’er

shifter

adder

mult’er

shifter

adder

mult’er

shifter

adder

pipe # 0 1 2

Decode routes ops to matching pipes

add

Page 11: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 11Mill Computing, Inc. Patents pending

*

Exposed pipeline

Every operation has a fixed latencya+b – c*d

sub

+

-

a b c d

a+b ?

a+b – c*d

c*d

a+b

add mul

Page 12: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 12Mill Computing, Inc. Patents pending

Exposed pipeline

Every operation has a fixed latency

add mul

sub

+

-

a b c d

a+b

a+b – c*d

c*d

a+b

a+b – c*d

Who holds this?

*

Page 13: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 13Mill Computing, Inc. Patents pending

*

Exposed pipeline

Every operation has a fixed latency

add mul

sub -a+b – c*d

c*d

a+b

a+b – c*d

+

a b c d

Code is best when producers feed directly to consumers

Page 14: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 14Mill Computing, Inc. Patents pending

Latency pass

Goal: compute minimal dataflow latency as if hardware had infinite FU resources

Give schedule priority to longer latencyReduces overall schedule latency; faster execution

+: 1*: 3-: 1&: 1retn:

0

op specs

Walk graph

Look up latency in spec of each op

Mark each op with max argument latency

Mark each result with issue + op latency

0 0 0 0

2

5

1

4

Mark ops with issue cycleMark results with retire cycle

-

&

retn

+

function args

*

-1

0

1 1

14

5

-1

Page 15: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 15Mill Computing, Inc. Patents pending

Dependency count pass

Goal: count outstanding dependencies

Need to know how many consumers must be placed before producer op can be placed

-

&

retn

+

function args

*Mark each op with number of consumers

Enter no-consumer ops on worklist

work list

4

2

11

1

0

Page 16: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 16Mill Computing, Inc. Patents pending

Schedule pass

Goal: schedule producers so their results retire just before when consumers want them

-

&

retn

+

function args

*

work list

4

2

11

1

0

Take last-retiring op from worklist

Schedule it ahead of its consumers

Decrement the consumer count of theproducers of its arguments

If consumer count of arg producer becomes zero, enter producer on worklist

schedule:retn

0

0 0 0 0

2

1 1

4

5

# of unplacedconsumersretire cycle

Page 17: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 17Mill Computing, Inc. Patents pending

Schedule pass

Goal: schedule producers so their results retire just before when consumers want them

-

&

retn

+

function args

*

work list

4

2

11

0

0

Take longest-latency op from worklist

Schedule it ahead of its consumers

Decrement the consumer count of theproducers of its arguments

If consumer count of arg producer becomes zero, enter producer on worklist

schedule:retn

0

0 0 0 0

2

1 1

4

5

&

0

# of unplacedconsumersretire cycle

Page 18: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 18Mill Computing, Inc. Patents pending

Schedule pass

Goal: schedule producers so their results retire just before when consumers want them

-

&

retn

+

function args

*

work list

4

2

00

0

0

Take longest-latency op from worklist

Schedule it ahead of its consumers

Decrement the consumer count of theproducers of its arguments

If consumer count of arg producer becomes zero, enter producer on worklist

schedule:retn

1

0 0 0 0

2

1 1

4

5

&

3

*

# of unplacedconsumersretire cycle

Page 19: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 19Mill Computing, Inc. Patents pending

Schedule pass

Goal: schedule producers so their results retire just before when consumers want them

-

&

retn

+

function args

*

work list

3

1

00

0

0

Take longest-latency op from worklist

Schedule it ahead of its consumers

Decrement the consumer count of theproducers of its arguments

If consumer count of arg producer becomes zero, enter producer on worklist

schedule:retn

0

0 0 0 0

2

1 1

4

5

&

2

*

-

# of unplacedconsumersretire cycle

Page 20: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 20Mill Computing, Inc. Patents pending

Schedule pass

Goal: schedule producers so their results retire just before when consumers want them

-

&

retn

+

function args

*

work list

2

0

00

0

0

Take longest-latency op from worklist

Schedule it ahead of its consumers

Decrement the consumer count of theproducers of its arguments

If consumer count of arg producer becomes zero, enter producer on worklist

schedule:retn

0 0 0 0

2

1 1

4

5

&

1

* -

+

0

# of unplacedconsumersretire cycle

Page 21: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 21Mill Computing, Inc. Patents pending

Schedule pass

Goal: schedule producers so their results retire just before when consumers want them

-

&

retn

+

function args

*

work list

0

0

00

0

0

Take longest-latency op from worklist

Schedule it ahead of its consumers

Decrement the consumer count of theproducers of its arguments

If consumer count of arg producer becomes zero, enter producer on worklist

schedule:retn

0 0 0 0

2

1 1

4

5

& * - +

function args

function args

# of unplacedconsumersretire cycle

Page 22: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 22Mill Computing, Inc. Patents pending

Placement pass

Goal: place ops in instructions using limited FUs

schedule:retn & * - +

function args

tableau:branch

6543210

load ALU multcycle

FU

+: 1*: 3-: 1&: 1retn:

0

-

&

retn

+

function args

*

0

Page 23: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 23Mill Computing, Inc. Patents pending

Placement pass

Goal: place ops in instructions using limited FUs

schedule:

retn

* - +function args

tableau:branch

6543210

load ALU multcycle

FU

+: 1*: 3-: 1&: 1retn:

0

-

&

retn

+

function args

*

1

0&

Page 24: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 24Mill Computing, Inc. Patents pending

Placement pass

Goal: place ops in instructions using limited FUs

schedule:

retn

- +function args

tableau:branch

6543210

load ALU multcycle

FU

+: 1*: 3-: 1&: 1retn:

0

-

&

retn

+

function args

*4

1

0*

&

Page 25: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 25Mill Computing, Inc. Patents pending

Placement pass

Goal: place ops in instructions using limited FUs

schedule:

retn

- +function args

tableau:branch

6543210

load ALU multcycle

FU

+: 1*: 3-: 1&: 1retn:

0

-

&

retn

+

function args

*2

1

0

*

&

4

Page 26: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 26Mill Computing, Inc. Patents pending

Placement pass

Goal: place ops in instructions using limited FUs

schedule:

retn

function args

tableau:branch

6543210

load ALU multcycle

FU

+: 1*: 3-: 1&: 1retn:

0

-

&

retn

+

function args

*

5

4

1

2

0

*

&

-

+

Page 27: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 27Mill Computing, Inc. Patents pending

Placement pass

Goal: place ops in instructions using limited FUs

schedule:

retn

tableau:branch

6543210

load ALU multcycle

FU

+: 1*: 3-: 1&: 1retn:

0

-

&

retn

+

function args

*

6

4

1

2

5

0

*

&

-

+

function args

args

Page 28: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 28Mill Computing, Inc. Patents pending

Symex pass

After instructions have been populated and issue and retire cycles determined, the producer results must still be passed to the consumer arguments.

On a general register machine, they would be passed in registers

The Mill doesn’t have registers

The Mill has its own way to pass data between functional units.

Page 29: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 29Mill Computing, Inc. Patents pending

We call it the BeltLike a conveyor belt – a fixed length FIFO

5 8 35 38 33 5

adder

Functional units can read any position

3

Page 30: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 30Mill Computing, Inc. Patents pending

We call it the Belt

35 85 38 33

adder

adder

Functional units can read any position

8New results

drop on the front

Pushing the last off the end

3

Like a conveyor belt – a fixed length FIFO

Page 31: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 31Mill Computing, Inc. Patents pending

Multiple reads

Functional units can read any mix of belt positions

5 85 38 33

adder

8

adder adder

3 3355 3

Page 32: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 32Mill Computing, Inc. Patents pending

Multiple dropsAll results retiring in a cycle drop together

835 5838 3

adderadder adder

adderadder adder8 8 6

Page 33: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 33Mill Computing, Inc. Patents pending

Belt addressing

Belt operands are addressed by relative position

68 5 58388

b3 b5

“b3” is the fourth most recent value to drop to the belt“b5” is the sixth most recent value to drop to the belt

This is temporal addressing

add b3, b5 No result address!

Page 34: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 34Mill Computing, Inc. Patents pending

Temporal addressing

The temporal address of a datum changes with more drops

b38 3 3

5 5868 388

b6

Page 35: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 35Mill Computing, Inc. Patents pending

Symex pass

The issue schedule and op latency give retire order

retn

branch

6543210

load ALU multcycle

FU

-

&

retn

+

function args

*

*

&

-

+

args

Retire order is belt order

infinite belt

-&

*

+

ABCD

cycle: 15 4 0

Page 36: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 36Mill Computing, Inc. Patents pending

Symex pass

retn

branch

6543210

load ALU multcycle

FU

*

&

-

+

args

- &*+A B C D

cycle: 5 4 1 0

addmulnopsubandretn

012

-

&

retn

+

function args

*

Page 37: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 37Mill Computing, Inc. Patents pending

Symex pass

retn

branch

6543210

load ALU multcycle

FU

*

&

-

+

args

- &*+A B C D

cycle: 5 4 1 0

addmulnopsubandretn

2

b2

01

b1

-

&

retn

+

function args

*

2 1

Page 38: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 38Mill Computing, Inc. Patents pending

Symex pass

retn

branch

6543210

load ALU multcycle

FU

*

&

-

+

args

- &*+A B C D

cycle: 5 4 1 0

addmulnopsubandretn

b2

01

b1b0 b1

-

&

retn

+

function args

*

01

Page 39: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 39Mill Computing, Inc. Patents pending

Symex pass

retn

branch

6543210

load ALU multcycle

FU

*

&

-

+

args

- &*+A B C D

cycle: 5 4 1 0

addmulnopsubandretn

b2

04

b1

b4 b0

-

&

retn

+

function args

*

01234

b1b0

Page 40: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 40Mill Computing, Inc. Patents pending

Symex pass

retn

branch

6543210

load ALU multcycle

FU

*

&

-

+

args

- &*+A B C D

cycle: 5 4 1 0

addmulnopsubandretn

b2

01

b1

b1 b0

-

&

retn

+

function args

*

01

b1b0

b0b4

Page 41: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 41Mill Computing, Inc. Patents pending

Symex pass

retn

branch

6543210

load ALU multcycle

FU

*

&

-

+

args

- &*+A B C D

cycle: 5 4 1 0

addmulnopsubandretn

b2

0

b1

b0

-

&

retn

+

function args

*

0

b1b0

b0b4b0b1

Page 42: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 42Mill Computing, Inc. Patents pending

Symex pass

branch

181716151413

0

load ALU multcycle

FU

*

&

-

+

args

- &*+A B C D

cycle:

add b2 b1mul b0 b1nopsub b4 b0and b1 b0retn

17 16 14 13 0

retn b0

-

&

retn

+

function args

*

023

b23

But what if there isn’t a b23?

Page 43: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 43Mill Computing, Inc. Patents pending

Use it or lose it

Compiler schedules producers near to consumers

Nearly all one-use values consumed while on belt

Belt is Single-Assignment - no hazards – no renames

300 rename registers become 8/16/32 belt positions

But - long-lived values must be saved

Page 44: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 44Mill Computing, Inc. Patents pending

The scratchpad

88 3 3 68 388 3

belt

scratchpad

spill

3

fill

Frame local – each function has a new scratchpadFixed max size, must explicitly allocateStatic byte addressing, must be alignedThree cycle spill-to-fill latency

Page 45: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 45Mill Computing, Inc. Patents pending

Symex pass

branch

181716151413

load ALU multcycle

FU

*

&

-

+

args

- &*+A B C D

retn

-

&

retn

+

function args

*

Insert spill-fill ops

fill

spill

01

12

spill

fill

fill

0

b0retn

- and reschedule

Page 46: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 46Mill Computing, Inc. Patents pending

Symex pass

Added spill/fill ops may change the schedule so some other results need spill/fill too.

Add more spills/fills, and re-reschedule

Iteration is guaranteed to stop with a feasible schedule

Iteration limit has every producer spilled and a fill for every consumer, which is feasible.

In practice:

Most functions need no spills at allMore than one reschedule is very rare

Page 47: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 47Mill Computing, Inc. Patents pending

The load problem

load

add

shift

store

stall

You write:

addloadshiftstore

You get:

stall

stall

stallEvery architecture must deal with this problem.

Page 48: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 48Mill Computing, Inc. Patents pending

Every CPU’s goal – hide memory latency

General strategy:

Issue loads as early as possible- as soon as the address is known- or even earlier – aka prefetch

Find something else to do while waiting for data- hardware approach – dynamic scheduling

Tomasulo algorithm on IBM 360/91

- software approach – static schedulingexposed pipeline, delay slots

Ignore program order: issue operations as soon as their data is ready

Page 49: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 49Mill Computing, Inc. Patents pending

Mill “deferred loads”

load(

Generic Mill load operation:

address: 64-bit base; offset; optional scaled indexwidth: scalar 1/2/4/8/16 byte, or vector of samedelay: number of issue cycles before retire

load(…, …, 4)instructioninstructioninstructioninstructionconsumer

load issues here

data available here

<address>, <width>, <delay>)

retire is deferred for four instructionsload retires here

Page 50: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 50Mill Computing, Inc. Patents pending

Mill “deferred loads”

int foo(int a, b, int* p) { return a*b + *p;}

a b p

load

+

retn

* load

+

retn

*

function argsargs

stall

tableau:

543210

branch load ALU multcycle

FU

(assuming load latency == 1)

Page 51: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 51Mill Computing, Inc. Patents pending

Mill “deferred loads”

int foo(int a, b, int* p) { return a*b + *p;}

load

+

retn

*

function args

tableau:

543210

branch load ALU multcycle

FU

load

Page 52: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 52Mill Computing, Inc. Patents pending

Mill “deferred loads”

int foo(int a, b, int* p) { return a*b + *p;}

retire

+

retn

*

retire

+

retn

*

function args

tableau:

543210

branch load ALU multcycle

FU

issue

retire

What is the latency of “issue”?

Page 53: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 53Mill Computing, Inc. Patents pending

Mill “deferred loads”

int foo(int a, b, int* p) { return a*b + *p;}

retire

+

retn

*

retire

+

retn

*

function argsargs

tableau:

543210

branch load ALU multcycle

FU

issue

retire

Is it maxLatency?

issue

maxLatency

stall

Page 54: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 54Mill Computing, Inc. Patents pending

Mill “deferred loads”

int foo(int a, b, int* p) { return a*b + *p;}

retire

+

retn

*

retire

+

retn

*

function argsargs

tableau:

543210

branch load ALU multcycle

FU

issue

retire

issue

What we want is…

neededlatency

highest non-load cycle minus retire cycle

Page 55: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 55Mill Computing, Inc. Patents pending

Mill “deferred loads”

The algorithm:

Temporarily assign all “issue” as maxLatency

Perform latency pass normally

Schedule all ops except “issue” normallyretire

retn

*

retire

+

function args

issue0

-1

5

0

9

8

10

maxLatency = 8

Page 56: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 56Mill Computing, Inc. Patents pending

Mill “deferred loads”

The algorithm:

retire

retn

*

retire

+

function args

issue0

-1

5

0

9

8

10

maxLatency = 8

retire

+

retn

*

retire

+

retn

*

function args

issue

retire

543210

branch load ALU multcycle

FU

Temporarily assign all “issue” as maxLatency

Perform latency pass normally

Schedule all ops except “issue” normally

Page 57: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 57Mill Computing, Inc. Patents pending

Mill “deferred loads”

retire

retn

*

retire

+

function args

issue0

-1

5

0

9

8

10

maxLatency = 8

retire

+

retn

*

retire

+

retn

*

function args

issue

retire

+retn

*

543210

branch load ALU multcycle

FU

retire

When scheduling an “issue”, adjust latency to: cycle of highest placed op minus cycle of corresponding “retire” minus predicted cycle of “issue” - or to one, whichever is larger

4202

2 cyclelatency

issue

args

Page 58: 10 June 2015 1 Mill Computing, Inc.Patents pending One of a series… Drinking from the Firehose Compilation for a Belt Architecture.

10 June 2015 58Mill Computing, Inc. Patents pending

Want more?

Sign up for technical announcements, white papers, etc.:

MillComputing.com/mailing-list