instruction selection 745

17
Page ‹#› School of Computer Science Instruction Selection 15-745 Optimizing Compilers Spring 2007 1 School of Computer Science 2 School of Computer Science IR TempMap instruction selector register allocator Assem Assem instruction scheduler Compiler Backend 3 School of Computer Science Intermediate Representations Front end - produces an intermediate representation (IR) Middle end - transforms the IR into an equivalent IR that runs more efficiently Back end - transforms the IR into native code IR encodes the compiler’s knowledge of the program Middle end usually consists of several passes Front End Middle End Back End IR IR Source Code Target Code

Transcript of instruction selection 745

Page 1: instruction selection 745

Page ‹#›

School of Computer Science

Instruction Selection

15-745 Optimizing CompilersSpring 2007

1School of Computer Science

2School of Computer Science

IR

TempMap

instructionselector

registerallocator

Assem

Asseminstructionscheduler

Compiler Backend

3School of Computer Science

Intermediate Representations

Front end - produces an intermediate representation (IR)Middle end - transforms the IR into an equivalent IR that runs moreefficientlyBack end - transforms the IR into native code

IR encodes the compiler’s knowledge of the programMiddle end usually consists of several passes

FrontEnd

MiddleEnd

BackEnd

IR IRSourceCode

TargetCode

Page 2: instruction selection 745

Page ‹#›

4School of Computer Science

Intermediate RepresentationsDecisions in IR design affect the speed and efficiency ofthe compilerSome important IR properties

– Ease of generation– Ease of manipulation– Procedure size– Freedom of expression– Level of abstraction

The importance of different properties varies betweencompilers

– Selecting an appropriate IR for a compiler is critical

5School of Computer Science

Examples:Trees, DAGs

Examples:3 address codeStack machine code

Example:Control-flow graph

Types of IntermediateRepresentationsStructural

– Graphically oriented– Heavily used in source-to-source translators– Tend to be large

Linear– Pseudo-code for an abstract machine– Level of abstraction varies– Simple, compact data structures– Easier to rearrange

Hybrid– Combination of graphs and linear code

6School of Computer Science

Level of AbstractionThe level of detail exposed in an IR influences theprofitability and feasibility of different optimizations.

Two different representations of an array reference:

subscript

A i j

loadI 1 => r1sub rj, r1 => r2loadI 10 => r3mult r2, r3 => r4sub ri, r1 => r5add r4, r5 => r6loadI @A => r7add r7, r6 => r8load r8 => rAij

High level AST:Good for memory disambiguation Low level linear code:

Good for address calculation7

School of Computer Science

Level of AbstractionStructural IRs are usually considered high-levelLinear IRs are usually considered low-levelNot necessarily true:

+

*

10

j 1

- j 1

-

+

@A

load

Low level AST loadArray A,i,j

High level linear code

Page 3: instruction selection 745

Page ‹#›

8School of Computer Science

Abstract Syntax TreeAn abstract syntax tree is the procedure’s parsetree with the nodes for most non-terminal nodesremoved

-

x

2 y

*

x - 2 * y

When is an AST a good IR?

Structural IR

9School of Computer Science

Directed Acyclic GraphA directed acyclic graph (DAG) is an AST with a uniquenode for each value

Makes sharing explicitEncodes redundancy

x

2 y

*

-

z /

w

z ← x - 2 * yw ← x / 2

Same expression twice meansthat the compiler might arrangeto evaluate it just once!

Structural IR

10School of Computer Science

Pegasus IRPredicated ExplicitGAted SimpleUniform SSA

Structural IR used inCASH (Compiler forApplication SpecificHardware)

Structural IR

11School of Computer Science

Stack Machine CodeOriginally used for stack-based computers, now JavaExample:

x - 2 * y becomes

Advantages– Compact form– Introduced names are implicit, not explicit– Simple to generate and execute code

Useful where code is transmittedover slow communication links (the net )

push xpush 2push ymultiplysubtract

Implicit names take upno space, where explicitones do!

Linear IR

Page 4: instruction selection 745

Page ‹#›

12School of Computer Science

Three Address CodeThree address code has statements of the form:

x ← y op zWith 1 operator (op ) and, at most, 3 names (x, y, & z)

Example:z ← x - 2 * y becomes

Advantages:– Resembles many machines (RISC)– Compact form

t ← 2 * yz ← x - t

Linear IR

13School of Computer Science

Two Address CodeAllows statements of the form

x ← x op yHas 1 operator (op ) and, at most, 2 names (x and y)

Example:z ← x - 2 * y becomes

– Can be very compact– Destructive operations make reuse hard– Good model for machines with destructive ops (x86)

t1 ← 2t2 ← load yt2 ← t2 * t1z ← load xz ← z - t2

Linear IR

14School of Computer Science

Control-flow GraphModels the transfer of control in the procedureNodes in the graph are basic blocks

– Straight-line code– Either linear or structural representation

Edges in the graph represent control flow

Example if (x = y)

a ← 2b ← 5

a ← 3b ← 4

c ← a * b

Basic blocks —Maximal lengthsequences ofstraight-line code

Hybrid IR

15School of Computer Science

Using Multiple Representations

Repeatedly lower the level of the intermediate representation– Each intermediate representation is suited towards certain optimizations

Example: the Open64 compiler– WHIRL intermediate format

• Consists of 5 different IRs that are progressively more detailed– gcc

• but not explicit about it :-(

FrontEnd

MiddleEnd

BackEnd

IR 1 IR 3SourceCode

TargetCode

MiddleEnd

IR 2

Page 5: instruction selection 745

Page ‹#›

16School of Computer Science

Instruction Selection

IR instructionselector Assem

IR -> Assem

templates

17School of Computer Science

Instruction selection exampleSuppose we have

We can generate the x86 code…

…if c = 1, 2, or 4; otherwise…

MOVE(TEMP r, MEM(BINOP(TIMES,TEMP s,CONST c)))

movl %(,s,c), %r

imull $c, %s, %rmovl (%r), %r

18School of Computer Science

MOVE(TEMP r, MEM(BINOP(TIMES,TEMP s,CONST c)))MOVE(TEMP r, MEM(BINOP(TIMES,TEMP s,CONST c)))

Selection dependenciesWe can see that the selection of instructions candepend on the constantsThe context of an IR expression can also affect thechoice of instructionConsider

19School of Computer Science

For

we might sometimes want to generate

What context might cause us to do this?

testl $0,(,%s,c)je L

MEM(BINOP(TIMES,TEMP s,CONST c))

Example, cont’d

Page 6: instruction selection 745

Page ‹#›

20School of Computer Science

Instruction selection as tree matchingIn order to take context into account, instructionselectors often use pattern-matching on IR trees

– each pattern specifies what instructions to select

21School of Computer Science

If c is 1, 2, or 4

Sample tree-matching rulesIR pattern code cost

BINOP(PLUS,i,j) leal (i,j),r 1

BINOP(TIMES,i,j) movl j,r 2 imull i,r

BINOP(PLUS,i,CONST c) leal c(i),r 1

MEM(BINOP(PLUS,i,CONST c)) movl c(i),r 1

MOVE(MEM(BINOP(PLUS,i,j)),k) movl k,(i,j) 1

BINOP(TIMES,i,CONST c) leal (,i,c),r 1

BINOP(TIMES,i,CONST c) movl c,r 2 imull i,r

MEM(i) movl (i),r 1

MOVE(MEM(i),j) movl j,(i) 1

MOVE(MEM(i),MEM(j)) movl (j),t 2 movl t,(i)

22School of Computer Science

a[x] = *y;MOVE

MEM MEM

y+

MEM

+

EBP CONST a

leal $a(%ebp),r1movl (r1),r2leal (,x,$4),r3leal (r2,r3),r4movl (y),r5movl r5,(r4)

CONST 4

*

x

(assume a is aformal parameterpassed on the stack)

r1

r2 r3

r4

Tiling an IR tree, v.1

23School of Computer Science

Tiling an IR tree, v.2a[x] = *y;

MOVE

MEM MEM

y+

MEM

+

BP CONST a

movl $a(%ebp),r1leal (,x,4),r2movl (y),r3movl r3,(r1,r2)

CONST 4

*

x

r1 r2

r3

Page 7: instruction selection 745

Page ‹#›

24School of Computer Science

Tiling choicesIn general, for any given tree, many tilings arepossible

– each resulting in a different instruction sequenceWe can ensure pattern coverage by covering, at aminimum, all atomic IR trees

25School of Computer Science

The best tiling?We want the “lowest cost” tiling

– usually, the shortest sequence– but can also take into account cost/delay of each instruction

Optimum tiling– lowest-cost tiling

Locally Optimal tiling– no two adjacent tiles can be combined into one tile of lower

cost

26School of Computer Science

Locally optimal tilingsLocally optimal tiling is easy

– A simple greedy algorithm works extremelywell in practice:

– Maximal munch• start at root• use “biggest” match (in # of nodes)

– use cost to break ties

27School of Computer Science

Maximal munch MOVE

MEM MEM

y+

MEM

+

BP CONST a

CONST 4

*

x

Choose the largestpattern with lowestcost, i.e., the“maximal munch”

IR pattern code costMOVE(MEM(BINOP(PLUS,i,j)),k) movl k,(i,j) 1MOVE(MEM(i),j) movl j,(i) 1MOVE(MEM(i),MEM(j)) movl (j),t 2

movl t,(i)

Page 8: instruction selection 745

Page ‹#›

28School of Computer Science

Maximal munchMaximal munch does not necessarily produce theoptimum selection of instructionsBut:

– it is easy to implement– it tends to work well for current instruction-set

architectures

29School of Computer Science

Maximal munch is not optimumConsider what happens, for example, if two of ourrules are changed as follows:

IR pattern code cost

MEM(BINOP(PLUS,i,CONST c)) movl c,r 3 addl i,r movl (r),r

MOVE(MEM(BINOP(PLUS,i,j)),k) movl j,r 3 addl i,r movl k,(r)

30School of Computer Science

If c is 1, 2, or 4

Sample tree-matching rulesRule # IR pattern cost

0 TEMP t 0

1 CONST c 1

2 BINOP(PLUS,i,j) 1

3 BINOP(TIMES,i,j) 2

4 BINOP(PLUS,i,CONST c) 1

5 MEM(BINOP(PLUS,i,CONST c)) 3

6 MOVE(MEM(BINOP(PLUS,i,j)),k) 3

7 BINOP(TIMES,i,CONST c) 1

8 BINOP(TIMES,i,CONST c) 2

9 MEM(i) 1

10 MOVE(MEM(i),j) 1

11 MOVE(MEM(i),MEM(j)) 231

School of Computer Science

Tiling an IR tree, new rulesa[x] = *y;

MOVE

MEM MEM

y+

MEM

+

BP CONST a

movl $a,r1addl %ebp,r1movl (r1),r1leal (,x,4),r2movl (y),r3movl r2,r4addl r1,r4movl r3,(r4) CONST 4

*

x

r1 r2

r3

Page 9: instruction selection 745

Page ‹#›

32School of Computer Science

Optimum selectionTo achieve optimum instruction selection, we mustuse a more complex algorithm

– dynamic programmingIn contrast to maximal munch, the trees arematched bottom-up

33School of Computer Science

Dynamic programmingThe idea is fairly simple– Working bottom up…– Given the optimum tilings of all subtrees, generate

optimum tiling of the current tree• consider all tiles for the root of the current tree• sum cost of best subtree tiles and each tile• choose tile with minimum total cost

Second pass generates the code using the resultsfrom the bottom-up pass

34School of Computer Science

Bottom-up CG, pass 1MOVE

MEM MEM

CONST 99

+

%EBP

+

MEM

+

%EBP CONST a

CONST 4

(0,0) (1,1)

(4,1)

(9,2) (1,1)

(4,3)

(9,4)

(10,6),(11,6)

(9,2)

(2,1)

(0,0) (1,1)Note: (r,c)means rule r,total cost c

35School of Computer Science

Bottom-up CG, pass 2MOVE

MEM MEM

CONST 99

+

%EBP

+

MEM

+

%EBP CONST a

CONST 4

(0,0) (1,1)

(2,1)

(9,2) (1,1)

(4,3)

(9,4)

(10,6),(11,6)

(9,2)

(2,1)

(0,0) (1,1)Note: (r,c)means rule r,total cost c

Page 10: instruction selection 745

Page ‹#›

36School of Computer Science

Bottom-up code generationHow does the running time compare to maximalmunch?Memory usage?

37School of Computer Science

ToolsA lot of tools have been developed toautomatically generate instruction selectors

– TWIG [Aho,Ganapathi,Tjiang, ’86]– BURG [Fraser,Henry,Proebsting, ’92]– BEG [Emmelmann,Schroer,Landwehr, ’89]– …

These generate bottom-up instruction selectorsfrom tree-matching rule specifications

38School of Computer Science

Code-generator generatorsTwig, Burg, and Beg use the dynamicprogramming approach

– A lot of work has gone into making these cgg’shighly efficient

There are also grammar-based cgg’s that useLR(k) parsing to perform the tree-matching

39School of Computer Science

Tiling a DAGHow would you tile

&x4 8

+ +

t0 t1

ld ldmov &x -> t3add t3,4 -> t4load (t4) -> t0add t3,4 -> t5load (t5) -> t1

mov x+4 -> t0mov x+8 -> t1

Page 11: instruction selection 745

Page ‹#›

40School of Computer Science

Peephole MatchingBasic ideaCompiler can discover local improvements locally

– Look at a small set of adjacent operations– Move a “peephole” over code & search for improvement

Classic example was store followed by load

storeAI r1 ⇒ r0,8loadAI r0,8 ⇒ r15

storeAI r1 ⇒ r0,8i2i r1 ⇒ r15

Original code Improved code

41School of Computer Science

Peephole MatchingBasic ideaCompiler can discover local improvements locally

– Look at a small set of adjacent operations– Move a “peephole” over code & search for improvement

Classic example was store followed by loadSimple algebraic identities

addI r2,0 ⇒ r7mult r4,r7 ⇒ r10

mult r4,r2 ⇒ r10

Original code Improved code

42School of Computer Science

Peephole MatchingBasic ideaCompiler can discover local improvements locally

– Look at a small set of adjacent operations– Move a “peephole” over code & search for improvement

Classic example was store followed by loadSimple algebraic identitiesJump to a jump

jumpI → L10L10: jumpI → L11

L10: jumpI → L11

Original code Improved code

43School of Computer Science

Peephole MatchingImplementing itEarly systems used limited set of hand-coded patternsWindow size ensured quick processing

Modern peephole instruction selectors (Davidson)Break problem into three tasks

Apply symbolic interpretation & simplification systematically

ExpanderIR→LLIR

SimplifierLLIR→LLIR

MatcherLLIR→ASM

IR LLIR LLIR ASM

Page 12: instruction selection 745

Page ‹#›

44School of Computer Science

Peephole MatchingExpanderTurns IR code into a low-level IR (LLIR) such as RTLOperation-by-operation, template-driven rewritingLLIR form includes all direct effects (e.g., setting cc)Significant, albeit constant, expansion of size

ExpanderIR→LLIR

SimplifierLLIR→LLIR

MatcherLLIR→ASM

IR LLIR LLIR ASM

45School of Computer Science

Peephole MatchingSimplifierLooks at LLIR through window and rewrites itUses forward substitution, algebraic simplification, local constantpropagation, and dead-effect eliminationPerforms local optimization within window

This is the heart of the peephole system– Benefit of peephole optimization shows up in this step

ExpanderIR→LLIR

SimplifierLLIR→LLIR

MatcherLLIR→ASM

IR LLIR LLIR ASM

46School of Computer Science

Peephole MatchingMatcherCompares simplified LLIR against a library of patternsPicks low-cost pattern that captures effectsMust preserve LLIR effects, may add new ones (e.g., setcc)Generates the assembly code output

ExpanderIR→LLIR

SimplifierLLIR→LLIR

MatcherLLIR→ASM

IR LLIR LLIR ASM

47School of Computer Science

Example

Original IR Code

wt1xsub

t1Y2mult

ResultArg2Arg1OP

LLIR Code r10 ← 2 r11 ← @y r12 ← r0 + r11 r13 ← MEM(r12) r14 ← r10 x r13 r15 ← @x r16 ← r0 + r15 r17 ← MEM(r16) r18 ← r17 - r14 r19 ← @w r20 ← r0 + r19MEM(r20) ← r18

Expand

Page 13: instruction selection 745

Page ‹#›

48School of Computer Science

Example LLIR Code r10 ← 2 r11 ← @y r12 ← r0 + r11 r13 ← MEM(r12) r14 ← r10 x r13 r15 ← @x r16 ← r0 + r15 r17 ← MEM(r16) r18 ← r17 - r14 r19 ← @w r20 ← r0 + r19MEM(r20) ← r18

SimplifyLLIR Code

r13 ← MEM(r0+ @y) r14 ← 2 x r13 r17 ← MEM(r0 + @x) r18 ← r17 - r14

MEM(r0 + @w) ← r18

49School of Computer Science

Example

MatchLLIR Code

r13 ← MEM(r0+ @y) r14 ← 2 x r13 r17 ← MEM(r0 + @x) r18 ← r17 - r14

MEM(r0 + @w) ← r18

ILOC CodeloadAI r0,@y ⇒ r13multI 2 x r13 ⇒ r14loadAI r0,@x ⇒ r17sub r17 - r14 ⇒ r18storeAI r18 ⇒ r0,@w

50School of Computer Science

Steps of the Simplifier LLIR Code r10 ← 2 r11 ← @y r12 ← r0 + r11 r13 ← MEM(r12) r14 ← r10 x r13 r15 ← @x r16 ← r0 + r15 r17 ← MEM(r16) r18 ← r17 - r14 r19 ← @w r20 ← r0 + r19MEM(r20) ← r18

3-operation window

51School of Computer Science

Steps of the Simplifier LLIR Code r10 ← 2 r11 ← @y r12 ← r0 + r11 r13 ← MEM(r12) r14 ← r10 x r13 r15 ← @x r16 ← r0 + r15 r17 ← MEM(r16) r18 ← r17 - r14 r19 ← @w r20 ← r0 + r19MEM(r20) ← r18

r10 ← 2r11 ← @yr12 ← r0 + r11

Page 14: instruction selection 745

Page ‹#›

52School of Computer Science

Steps of the Simplifier LLIR Code r10 ← 2 r11 ← @y r12 ← r0 + r11 r13 ← MEM(r12) r14 ← r10 x r13 r15 ← @x r16 ← r0 + r15 r17 ← MEM(r16) r18 ← r17 - r14 r19 ← @w r20 ← r0 + r19MEM(r20) ← r18

r10 ← 2r11 ← @yr12 ← r0 + r11

r10 ← 2r12 ← r0 + @yr13 ← MEM(r12)

53School of Computer Science

Steps of the Simplifier LLIR Code r10 ← 2 r11 ← @y r12 ← r0 + r11 r13 ← MEM(r12) r14 ← r10 x r13 r15 ← @x r16 ← r0 + r15 r17 ← MEM(r16) r18 ← r17 - r14 r19 ← @w r20 ← r0 + r19MEM(r20) ← r18

r10 ← 2r12 ← r0 + @yr13 ← MEM(r12)

r10 ← 2r13 ← MEM(r0 + @y)r14 ← r10 x r13

54School of Computer Science

Steps of the Simplifier LLIR Code r10 ← 2 r11 ← @y r12 ← r0 + r11 r13 ← MEM(r12) r14 ← r10 x r13 r15 ← @x r16 ← r0 + r15 r17 ← MEM(r16) r18 ← r17 - r14 r19 ← @w r20 ← r0 + r19MEM(r20) ← r18

r13 ← MEM(r0 + @y)r14 ← 2 x r13r15 ← @x

r10 ← 2r13 ← MEM(r0 + @y)r14 ← r10 x r13

55School of Computer Science

Steps of the Simplifier LLIR Code r10 ← 2 r11 ← @y r12 ← r0 + r11 r13 ← MEM(r12) r14 ← r10 x r13 r15 ← @x r16 ← r0 + r15 r17 ← MEM(r16) r18 ← r17 - r14 r19 ← @w r20 ← r0 + r19MEM(r20) ← r18

r13 ← MEM(r0 + @y)r14 ← 2 x r13r15 ← @x

r14 ← 2 x r13r15 ← @xr16 ← r0 + r15

1st op it has rolledout of window

Page 15: instruction selection 745

Page ‹#›

56School of Computer Science

Steps of the Simplifier LLIR Code r10 ← 2 r11 ← @y r12 ← r0 + r11 r13 ← MEM(r12) r14 ← r10 x r13 r15 ← @x r16 ← r0 + r15 r17 ← MEM(r16) r18 ← r17 - r14 r19 ← @w r20 ← r0 + r19MEM(r20) ← r18

r14 ← 2 x r13r15 ← @xr16 ← r0 + r15

r14 ← 2 x r13r16 ← r0 + @xr17 ← MEM(r16)

57School of Computer Science

Steps of the Simplifier LLIR Code r10 ← 2 r11 ← @y r12 ← r0 + r11 r13 ← MEM(r12) r14 ← r10 x r13 r15 ← @x r16 ← r0 + r15 r17 ← MEM(r16) r18 ← r17 - r14 r19 ← @w r20 ← r0 + r19MEM(r20) ← r18

r14 ← 2 x r13r17 ← MEM(r0+@x)r18 ← r17 - r14

r14 ← 2 x r13r16 ← r0 + @xr17 ← MEM(r16)

58School of Computer Science

Steps of the Simplifier LLIR Code r10 ← 2 r11 ← @y r12 ← r0 + r11 r13 ← MEM(r12) r14 ← r10 x r13 r15 ← @x r16 ← r0 + r15 r17 ← MEM(r16) r18 ← r17 - r14 r19 ← @w r20 ← r0 + r19MEM(r20) ← r18

r17 ← MEM(r0+@x)r18 ← r17 - r14r19 ← @w

r14 ← 2 x r13r17 ← MEM(r0+@x)r18 ← r17 - r14

59School of Computer Science

Steps of the Simplifier LLIR Code r10 ← 2 r11 ← @y r12 ← r0 + r11 r13 ← MEM(r12) r14 ← r10 x r13 r15 ← @x r16 ← r0 + r15 r17 ← MEM(r16) r18 ← r17 - r14 r19 ← @w r20 ← r0 + r19MEM(r20) ← r18

r18 ← r17 - r14r19 ← @wr20 ← r0 + r19

r17 ← MEM(r0+@x)r18 ← r17 - r14r19 ← @w

Page 16: instruction selection 745

Page ‹#›

60School of Computer Science

Steps of the Simplifier LLIR Code r10 ← 2 r11 ← @y r12 ← r0 + r11 r13 ← MEM(r12) r14 ← r10 x r13 r15 ← @x r16 ← r0 + r15 r17 ← MEM(r16) r18 ← r17 - r14 r19 ← @w r20 ← r0 + r19MEM(r20) ← r18

r18 ← r17 - r14r20 ← r0 + @wMEM(r20) ← r18

r18 ← r17 - r14r19 ← @wr20 ← r0 + r19

61School of Computer Science

Steps of the Simplifier LLIR Code r10 ← 2 r11 ← @y r12 ← r0 + r11 r13 ← MEM(r12) r14 ← r10 x r13 r15 ← @x r16 ← r0 + r15 r17 ← MEM(r16) r18 ← r17 - r14 r19 ← @w r20 ← r0 + r19MEM(r20) ← r18

r18 ← r17 - r14MEM(r0 + @w) ← r18

r18 ← r17 - r14r20 ← r0 + @wMEM(r20) ← r18

62School of Computer Science

Steps of the Simplifier LLIR Code r10 ← 2 r11 ← @y r12 ← r0 + r11 r13 ← MEM(r12) r14 ← r10 x r13 r15 ← @x r16 ← r0 + r15 r17 ← MEM(r16) r18 ← r17 - r14 r19 ← @w r20 ← r0 + r19MEM(r20) ← r18

r18 ← r17 - r14MEM(r0 + @w) ← r18

r18 ← r17 - r14r20 ← r0 + @wMEM(r20) ← r18

63School of Computer Science

Example LLIR Code r10 ← 2 r11 ← @y r12 ← r0 + r11 r13 ← MEM(r12) r14 ← r10 x r13 r15 ← @x r16 ← r0 + r15 r17 ← MEM(r16) r18 ← r17 - r14 r19 ← @w r20 ← r0 + r19MEM(r20) ← r18

SimplifyLLIR Code

r13 ← MEM(r0+ @y) r14 ← 2 x r13 r17 ← MEM(r0 + @x) r18 ← r17 - r14

MEM(r0 + @w) ← r18

Page 17: instruction selection 745

Page ‹#›

64School of Computer Science

Making It All WorkDetailsLLIR is largely machine independent (RTL)Target machine described as LLIR → ASM patternActual pattern matching

– Use a hand-coded pattern matcher (gcc)– Turn patterns into grammar & use LR parser (VPO)

Several important compilers use this technologyIt seems to produce good portable instruction selectors

Key strength appears to be late low-level optimization

65School of Computer Science

MOVE

MEM MEM

y

SIMD InstructionsWe’ve assumed eachtile is connected

What about SIMD instructions?Ex. TI C62x add2

+

32 bits

subword parallelism

66School of Computer Science

Automatic VectorizationLoop level

Basic block level

for (i = 0; i < 1024; i++){ C[i] = A[i]*B[i];}

for (i = 0; i < 1024; i+=4){ C[i:i+3] = A[i:i+3]*B[i:i+3];}

x = a+b;y = c+d;

(x,y) = (a,c)+(b,d);

looking for independent, isomorphic operations