NCI Report: Zephyr

152
NCI Report: Zephyr PLDI NCI PLDI NCI Tutorial Tutorial University of Virginia University of Virginia Princeton University Princeton University

description

NCI Report: Zephyr. PLDI NCI Tutorial. University of Virginia Princeton University. Zephyr Goals. Goal Deliver high-quality, language-neutral tools for rapidly constructing compilers for experimental computing systems research How - PowerPoint PPT Presentation

Transcript of NCI Report: Zephyr

Page 1: NCI Report: Zephyr

NCI Report: Zephyr

PLDI NCI TutorialPLDI NCI Tutorial

University of VirginiaUniversity of Virginia

Princeton UniversityPrinceton University

Page 2: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 2

Zephyr Goals

• Goal– Deliver high-quality, language-

neutral tools for rapidly constructing compilers for experimental computing systems research

• How– Provide specification languages and

processors to automatically generate key compiler components•Don’t write code, write specifications!

Page 3: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 3

Zephyr Compilers

EDG C++Java

MachSUIF

SUIF-to-VPOBridge

VPO

lccEDG C++

Alpha

SUIF

Sparc MIPS X86Alpha X86

In terprocedura lanalysis

Para lle lizationand loca lity

optsO bject-oriented

optsScheduling

RegisterA llocation

Instruction se lectionRegister a llocation

Code motionM emory access

coalescingInduction variab le

e lim inationCSE

Loop unro llingIn lin ing

SUIF Zephyr

Page 4: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 4

Zephyr Building Blocks

• ASDL: Abstract Syntax Description Language

• VPO: Very Portable Optimizer• CSDL: Computer System

Description Language

Page 5: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 5

ASDL: Abstract Syntax Description Language

Parser

Lexer

Toke

ns

ASTSemanticAnalysis

AS

T

Translate IR OPT1

....

IR

IR OPTn

IR

CodeGen

AST IR

GlueGenerator

GlueDescription

Page 6: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 6

ASDL

• ASDL makes it easy to communicate complex recursive data structures

• ASDL and its tools provide – Concise descriptions of tree-like

structures, including ASTs and compiler (IRs)

– Automatic generation of data structure implementations and pickling functions for C, C++, Java, Standard ML, and Haskell.

– Graphical browsing and editing of data structures on disk.

Page 7: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 7

ASDL

• For more information about ASDL see:– Give reference here– Give URL here

Page 8: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 8

VPO: Very Portable Optimizer

• VPO is a retargetable optimizer that operates on a low-level, machine-independent representation called RTLs (register transfer lists)

• VPO is retargeted by providing a machine description (MD) of the target machine, and revising a few machine-dependent routines

• VPO is small, easily extended, and extremely effective

Page 9: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 9

History Lesson

• PO developed in 1981– Pioneered use of RTLs– Demonstrated ability to

do optimizations on low-level representation

• Development split in 1982– gcc development

• Richard Stallman and Len Tower

– VPO development• Many people at Uva

and a few industrial labs

P O

V P O gcc

Page 10: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 10

Register Transfer Lists• Based on Bell and Newell's ISP

notation• Machine-independent

representation of a machine-dependent operation

• Algorithms that manipulate RTLs are machine-independent

Page 11: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 11

Register Transfer Lists• While assembly language notations

may very, RTLs are very similar across architectures

ExampleRTL Machineadd %o1,%o2,%o2 SPARCaddu $10,$10,$9 MIPSar 10,9 IBM

in RTL each operation would be representedr[10] = r[10] + r[9];

Page 12: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 12

RTLs

• The form of RTLs are fixed• dst = src ; dst = src ; dst = src …

– The individual register transfers are performed in parallel

– Example• r[1] = r[1] + r[2] ; NZ = r[1] + r[2] ? 0

– VPO provides machine-independent primitives for operating on and manipulating RTLs• Obtain the sources and destinations• Obtain the memory locations read and written• Obtain the type of instruction (arithmetic,

branch, control transfer, etc.)

Page 13: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 13

RTLs

• Think of RTL as a machine-independent assembly language– For a machine X, each RTLx describes

an instruction in X’s instruction set (may be a synthetic instruction)

– RTLx should specify• instruction’s input and outputs• the transformation the instruction

makes on the machine state– VPO uses this information to

compute a dataflow graph

Page 14: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 14

Compilation with VPO

SourceCode

Front andMiddle Ends

VPO Mach MachineCode

RTL

You supply the front end and a simple code generator, we supply an optimizing back end

Page 15: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 15

Generating RTLX

• Translate IL ops to semantically equivalent sequences of instructions for the target machine– Generate RTL representation of

instructions, not assembly language– Do not worry about code quality

• Perform naïve, straightforward translation• Expose all computations (even effective

address computations) to VPO• Use virtual or pseudo registers for temporaries• VPO handles activation record and data

placement

Page 16: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 16

Generating RTLx

The C codeK = I + 1;

= <int,32>

ADDR K<local,32>

+ <int,32>

@ <int,32>

ADDR I<local,32>

CON 1<int,32>

IL SPARC RTLADDR int K r[33]=r[14]+K.;ADDR int I r[34]=r[14]+I.;@ int r[35]=M[r[34]]; r[34]CON int 1 r[36]=1;+ int r[37]=r[35]+r[36]; r[35]:r[36]= int M[r[33]]=r[37]; r[33]:r[37]

Page 17: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 17

VPO design rationale• All "traditional" optimizations performed

at the machine-level on a single representation—RTL– most optimizations are machine-dependent– better code is produced– instruction selection can be performed on

demand– avoids phase ordering problems– simplifies implementation of optimizations– easier to accommodate emerging

architectures– "plug and play" structure

Page 18: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 18

RTLs in VPO

• VPO optimization algorithm– repeat

apply code-improving transformationuntil fixed-point reached or exhausted registers

• Maintaining two invariants– Semantic invariant (S)

• Observable behavior of program unchanged (according to RTL semantics)

– Machine invariant (M)• Every RTL equivalent to one machine instruction

Page 19: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 19

VPO code improvements

• Each code-improving transformation is– machine-level, but– machine-independent

• Any semantics-preserving transformation is OK

• Preserve machine invariant (M) using machine description;– for each new RTL produced, ask MD if OK– if any is not target machine instruction,

roll back transformation

Page 20: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 20

Code improvement catalog

• Register assignment and allocation

• Common subexpression elimination

• Induction variable elimination

• Code motion• Constant propagation• Copy propagation• Memory access

coalescing

• Recurrence detection

• Instruction scheduling

• Dead code elimination

• Constant folding• Loop unrolling• Branch minimization• Evaluation order

determination

Page 21: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 21

VPO Optimizations

• Common subexpression elimination•Davidson, J. W. and Fraser, C. W.,

‘Eliminating Redundant Object Code,’ in Conference Record of the Ninth Annual ACM Symposium on Principles of Programming Languages, January 1982, pp. 128–132.

• Evaluation Order Determination•Davidson, J. W. , ‘A Retargetable Instruction

Reorganizer’, in Proceedings of the SIGPLAN ‘86 Symposium on Compiler Construction, 21(7), June 1986, pp. 23–241.

Page 22: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 22

VPO Optimizations

• Link-time optimization• Benitez, M. E. and Davidson, J. W., ‘A Portable

Global Optimizer and Linker’, in Proceedings of the SIGPLAN ‘88 Symposium on Programming Language Design and Implementation, June 1988, pp. 329—338.

• Memory access coalescing• Davidson, J. W. and Jinturkar, S., ‘Memory

Access Coalescing: A Technique for Eliminating Redundant Memory Accesses’, in Proceedings of the SIGPLAN ‘94 Symposium on Programming Language Design and Implementation, Orlando, FL, June 1994, pp. 186— 195.

Page 23: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 23

VPO Optimizations

• Code Motion• Benitez, M. E. and Davidson, J. W., ‘The

Advantages of Machine-Dependent Global Optimization’, in Proceedings of the 1994 Conference on Programming Languages and Systems Architectures, Zurich, Switzerland, March 1994, pp. 105–124.

• Loop Unrolling• Jinturkar, S. and Davidson, J. W., ‘Improving

Instruction-level Parallelism by Loop Unrolling and Dynamic Memory Disambiguation’, in Proceedings of the 28th Annual IEEE/ACM International Symposium on Microarchitecture, Ann Arbor, MI, November 1995, pp. 125–132.

Page 24: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 24

VPO Optimizations

• Branch mininization•F. Mueller and D. B. Whalley, ‘Avoiding

Conditional Branches by Code Replication’ in Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation, June 1995, pages 56-66.

•M. Yang, G. Uh, and D. Whalley, ‘Improving Performance by Branch Reordering’ in Proceedings of the SIGPLAN '98 Conference on Programming Language Design and Implementation, June 1998, pages 130-141.

Page 25: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 25

VPO Optimizations

• Recurrence detection and optimization

•Benitez, M. E. and Davidson, J. W., ‘Code Generation for Streaming: an Access/Execute Mechanism’, in Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April 1991, pp. 132–141.

Page 26: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 26

Building VPO

VPOGenerator

Eval. Order Determ.

ZIFLow Analysis &Transformation Libraries

VPOMIPS

CSDLSPARCSpecification

NewTransformation

CSDLMIPSSpecification

CSDLALPHASpecification

CSDLi486Specification

Register Allocation

Access Coalescing

Comm. Subexpr. Elim.

Eval. Order Determ.

Induction Var. Elim.

Instruction Scheduling

Code Motion

SSA Computation

Page 27: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 27

CSDL: Computing System Description Language

• Computing System Description Language– Modular system of components– Allows applications to customize a

description– Easily extensible for adding new

details– Reusable/application independent

Page 28: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 28

CSDL

CallingConvention

(CCL)

MemorySystem

Description(MSDL)

PipelineDescription(PLUNGE)

CSDL Core

InstructionRepresentation

(SLED)

Object-fileFormat

MemorySystem(MSDL)

CallingConvention

(CCL)PipelineDescription(PLUNGE)Pipeline

(PLUNGE)

InstructionSemantics

(l -RTL)

Page 29: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 29

Zephyr Compilers

• EDGSUIF-to-VPO Compiler– Five targets (SPARC, Pentium, Alpha,

MIPS, SimpleScalar)

TargetMachine

Code

EDG Front EndSourceCode ...

SUIF Pass 1 VPOSPARC

SUIF Passes

SUIF-to-LIRALIRA-to-SPARC

RTLSPARC

Page 30: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 30

Zephyr Compilers

• EDG-to-VPO C++ compiler– Funded by Edison Design group– Targeted to SPARC only– Compiles all benchmark suites (SPEC,

PGI, lcc)– Code generator (translator from EDG

intermediate representation to RTLs) provided as a literate program

Page 31: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 31

Zephyr Compilers

• lcc-to-VPO C compiler– Targeted to SPARC, X86, MIPS, ALPHA,

and SimpleScalar– Code generators (translators from LIRA

to target-machine RTLs) provided as literate programs

– Currently producing good code, some optimizations are not fully implemented/debugged

Page 32: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 32

SPEC results for SPARC

Benchmark Gcc –O Lcc vpolcc go 13.4 6.45 11.0 M88ksim 5.70 4.98 6.2 li 8.98 5.93 7.48 Compress 11.6 9.0 9.28 Ijpeg 8.79 5.54 8.6 Perl 12.3 9.2 10.2 Vortex 10.7 8.27 11.2

Page 33: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 33

Acknowledgements

• This work has been funded by:– Defense Advanced Research Projects

Agency– National Science Foundation– Panasonic AVC Labs– Edison Design Group

Page 34: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 34

Afternoon Schedule

Time Talk

1:30-2:00 ASDL: Dan Wang

2:00-2:55 Using Zephyr for PL Research: Kevin Scott The VPO Code Generation Interfaces LIRA: The lcc intermediate representation SUIF-to-LIRA

2:55-3:15 Using Zephyr for Architecture Research: Jason Hiser and Chris Milner Introduction Handling a target machine’s calling convention

3:15-3:30 Break

Page 35: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 35

Afternoon Schedule

Time Talk

3:30-4:30 Using Zephyr for Architecture Research (continued): Jason Hiser and Chris Milner Writing a VPO machine description (md.y) Writing a VPO register specification (regs.rt) EASE: Environment for Architecture Study and Evaluation Case Study: Targeting SimpleScalar

4:30-5:20 Using Zephyr for Optimization Research: Jack Davidson Introduction to VPO’s optimization structure Adding a new optimization to VPO

Page 36: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 36

Afternoon Schedule

Time Talk

5:20-5:40 Zephyr support tools: Raja Venkateswaran VET: Observing and debugging VPO VPOISO: Isolating optimization errors

5:40-6:00 Wrap up and Open Discussion

Page 37: NCI Report: Zephyr

Using Zephyr for Programming Language

ResearchKevin Scott

University of Virginia

Page 38: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 38

Overview

• Zephyr organization and philosophy• VPO code generation interfaces• Adding a new front-end to Zephyr:

– Using the Lira intermediate representation

– With a custom code expander using the VPO code generation interfaces

• Language related issues in retargeting Zephyr

• Q & A

Page 39: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 39

What is Zephyr?

• Set of tools for generating and optimizing RTL programs– VPO (Very Portable Optimizer)

• SPARC, Alpha, x86, MIPS, SimpleScalar (PISA)

– Code Expanders• Turn a front-end’s IR into RTLs

– Glue for hooking front-ends up to VPO• VPO code generation interfaces• Lira IR

– Debugging tools• VET – interface for controlling and visualizing

VPO transformations• vpoiso – isolates optimizer bugs

Page 40: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 40

National Compiler Infrastructure

SML/NJ EDG C++ Ada95DEC

FORTRANJava

MachSUIF

SUIF-to-VPOBridge

VPO

lccEDG C++IBM C++

VisualAge

Alpha

SUIF

Sparc MIPS X86Alpha X86

Interproceduralanalysis

Parallelizationand locality

optsObject-oriented

optsScheduling

RegisterAllocation

Instruction selectionRegister allocation

Code motionMemory access

coalescingInduction variable

eliminationCSE

Loop unrollingInlining

SUIFInfrastructure

ZephyrInfrastructure

Optional Item

Page 41: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 41

Why use Zephyr?

• You’re a language researcher– Easy to hook a front-end up to VPO– Relatively little effort required to get

multiple targets– VPO is a very good optimizer

•Wide range of existing operations•Leverage work of others contributing new

optimizations to VPO– Let’s you concentrate on front-end

issues– Less work than writing a VPO-quality

optimizer yourself

Page 42: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 42

Front Ends

Zephyr Organization

lccEDG SUIF

SPARC MIPS

Alpha x86

Lira code expanders

VPO

EDG code expanders

SPARC

VPOi and VPOasm

VPCC

SPARC

x86

CVM code expanders

MIPS

Page 43: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 43

Four Front Ends

• VPCC – A K&R C compiler– IR is code for a C virtual machine (CVM)– Deprecated in favor of lcc front-end

• EDG – Edison Design Group C/C++– Very flexible IR

• Lcc – Retargetable C compiler– Simple backend emits Lira, an IR based on

lcc trees

• SUIF 2.1– High level optimizations and analyses– suif2lira pass transforms SUIF IR into Lira

Page 44: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 44

Code Expanders

• CVM Code Expanders– SPARC, x86, MIPS– Generate encoded RTL files directly –

don’t use VPOi or VPOasm

• EDG Code Expanders– SPARC– First expander to use VPOi and

VPOasm interfaces

Page 45: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 45

Lira Code Expanders

• Targets– SPARC– X86– Alpha– MIPS32– MIPS64 and SimpleScalar (PISA)

• Input Lira code specialized for target• Output encoded RTLs for VPO• All use the VPOi and VPOasm

interfaces

Page 46: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 46

VPOi

• VPOi provides a C interface for:– Creating RTLs– Sending RTLs to VPO for optimization

• Abstracts away specifics of:– RTL representation– How RTLs are sent to VPO

• RTL creation routines can be semi-automatically generated from a machine specification

Page 47: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 47

VPOasm

• VPOasm provides a C interface for sending assembly language statements to VPO.

• Allows a code expander to:– Change segments– Define symbols– Initialize storage locations– Specify alignments for code or data

Page 48: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 48

More on VPOi and VPOasm

• Why use these interfaces?– Simpler than writing out VPO encoded RTL

files manually.– Can get some of the implementation for

free if doing a new target architecture.– Allows us to change RTL and assembly

language representations w/o fouling you up. Much.

• Reference manual for VPOi and VPOasm:– http://www.cs.virginia.edu/zephyr/vpoi

Page 49: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 49

VPOi and VPOasm caveats

• Interfaces are written in C.– Bad if you’re writing a code expander in

languages with no mechanism for calling C functions.

• Interfaces are relatively rigid.– Suppose you want to communicate

something to the optimizer that doesn’t look like an RTL or assembly language.

• Interfaces have only been tested on C/C++ front ends.– Might have to change to accommodate new

language features…

Page 50: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 50

Lira

• Simple IR based on lcc trees• Targets a stack-oriented virtual

machine• Two types of entities in a Lira file:

– Instructions– Directives

Page 51: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 51

Lira Instructions

• Instruction is composed of:– Operator (33)

– Type• F (float), I (signed integer), U (unsigned integer),

P (pointer), V (void), B (aggregate)

– Size• 1, 2, 4, 8, …

– Auxiliary info

CALLGEMODADDCVF

ARGEQLSHNEGBCOM

NEASGNDIVINDIRCNST

LABELLTSUBBXORCVUADDRL

JUMPLERSHBORCVPADDRG

RETGTMULBANDCVIADDRF

Page 52: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 52

Lira Instruction Example

• C Fragmentint a;

a = a + 10;

• Lira Translation

ADDRGP4 “a”

INDIRI4

CNSTI4 10

ADDI4

ADDRGP “a”

ASGNI4

Page 53: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 53

Lira Directives

• Change program segments with:– code, data, bss, lit

• Specify alignment with:– align

• Control symbol visibility with:– import, export

• Initialize storage locations with:– bytes, string, address, skip

Page 54: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 54

Lira Directives (cont)

• Indicate procedure boundaries with:– proc, endproc

• Describe procedure locals and parameters with:– local, param

• Describe source coordinates with:– file, line

Page 55: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 55

Lira Directive Example

• Reserving storage for a global int “a”-bss-export a-align 4+LABELI4 “a”-skip 4

Page 56: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 56

The truth about Lira

• Lira can be emitted from lcc using a postorder walk of lcc trees. Almost.

• Typical case:ADDI4

INDIRI4

ADDRGP4 “a”

CNSTI4 10

ADDRGP4 “a”

INDIRI4

CNSTI4 10

ADDI4

Page 57: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 57

The truth about Lira (cont)

• Sometimes, we don’t do a postorder traversal:

ADDI4

INDIRI4

ADDRGP4 “a”

CNSTI4 10

ADDRGP4 “a”

INDIRI4

CNSTI4 10

ADDI4

ADDRGP “a”

ASGNI4

ADDRGP4 “a”

INDIRI4

Page 58: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 58

The truth about Lira (cont)

• A Lira program is specialized to the compilation target.– Types, sizes and alignments are

target specific– Front-end must generate appropriate

target dependent code for accessing the components of aggregates (arrays and structs)

Page 59: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 59

Lira Code Expander

• Structured for simplicity.• Code is generated by a big switch

statement.• Two passes made over the input.

– First gather symbol information.– Second generates code.

• SPARC expander is about 1800 lines of C. Close of ½ of the code is machine independent or easily reused on new targets.

Page 60: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 60

Retargeting Lira code expander

• Three big tasks:– Modify dumptree to map Lira ops

onto RTLs for the new target. Easiest of the three since there is substantial opportunity for cut & paste coding.

– Modify sp_call to emit target dependent RTLs. On the SPARC we emit the following when the caller returns a struct:VPOi_rtl(ST(tmp_loc, sp_plus(r[14], SP_OFS-4)),

VPOi_locSetBuild(tmp_loc, 0));

Page 61: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 61

Retargeting Lira code expander

• Modify setup_frame to:– Use right offsets for parameters and

locals.– Emit RTLs to do target dependent

frame setup on procedure entry. For procedures returning a struct on the SPARC, we emit:

VPOi_rtl(LD(sp_plus(r[30], SP_OFS-4),tmpreg), 0);

locaddr = sp_plus_ra(r[30], locals.t[0].sym, 0);

VPOi_rtl(ST(tmpreg, Rtl_fetch(locaddr, 32)),

VPOi_locSetBuild(locaddr, tmpreg, 0));

Page 62: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 62

Why use Lira?

• Lira is a pretty good intermediate language for C-like languages. (Thanks to Chris Fraser and Dave Hanson!)– Abstracts away specifics of a target’s calling

sequence! Left to code expander to implement.

• Separating Lira from lcc means that we can reuse the Lira code expanders for front-ends other than lcc. E.g., SUIF.

• Very easy to write a Lira code expander.

Page 63: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 63

Lira References

• “A Retargetable C Compiler: Design and Implementation”

• Lcc version 4.1 code generation interfaces– http://www.cs.princeton.edu/software/lcc/pkg/doc/4.

html

• More on the way…

Page 64: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 64

Adding a front-end to Zephyr

• Is your language C-like? – If yes then consider writing code to

map your IR onto Lira. This gets you all of Lira’s targets almost for free.

– If no then you might need to write a code expander for each target you want to support.

Page 65: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 65

Adding a front-end to Zephyr

• Is my target already supported?– If yes then you’re golden.– If no then you may have to do one or

more of the following:•Create VPOi and VPOasm interfaces for

your target. This can be partially automated.

•Write a Lira code expander for the new target, or

•Write a custom code expander for the new target.

•Port VPO to the new target.

Page 66: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 66

Adding a front-end using Lira

• Difficulty depends on your IR.– Trivial for lcc – almost same IR!– Pretty easy for SUIF. E.g.

void Translator::trans(BinaryExpression exp) { int lira_op;

translate(exp->get_source1()); translate(exp->get_source2()); switch(op_map(exp->get_opcode())) {

case SOP_add: lira_op = LIRA_ADD; break;...

} emitter->emit(lira_op, lira_map_ty(exp->get_result_type());}

Page 67: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 67

Where can I find out more?

• Should be releasing suif2lira as a literate program around July 1.– Good starting point for someone

familiar with SUIF wanting to hook up a front-end with Lira.

• Literate source for SPARC and x86 Lira code expanders will be available immediately after PLDI.

Page 68: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 68

Adding a front-end using a custom code expander

• Difficulty again depends on your IR.

• Refer to EDG SPARC code expander:– http://www.cs.virginia.edu/zephyr/dist/edg-sparc-1.0.pdf

Page 69: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 69

Language issues in retargeting Zephyr

• Calling convention– In addition to emitting RTLs to

properly handle language calling conventions on function calls and function entry, also need to consider fixentry in VPO.

– fixentry finalizes a procedure’s prologue after optimization is complete.

– More in next talk.

Page 70: NCI Report: Zephyr

Using Zephyr for Architecture Research

Jason Hiser and Chris Milner

University of Virginia

Page 71: NCI Report: Zephyr

A Brief Introduction to Zephyr and Architectural

ResearchJason Hiser

University of Virginia

Page 72: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 72

Roadmap

• Handling a machine’s calling convention– Jason

• Break– Coffee!

• Writing a VPO machine description and Writing a VPO register description– Chris Milner

• Case Study: Targeting SimpleScalar– Jason

Page 73: NCI Report: Zephyr

Handling a Machine’s Calling Conventionfixentry fun (regs.c)

Jason HiserUniversity of Virginia

Page 74: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 74

Introduction To regs.c

• Fixentry: The main routine of regs.c – Responsibilities of fixentry

• Parameters, external and global data used in fixentry

• Other functions: regarg, initmap, map, transfer, leaf

Page 75: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 75

Responsibilities of Fixentry

• Calculate stack space needed – outgoing parameters, spill locations,

local variables, saved registers, and incoming parameters

• Emit function prologue – Adjust stack pointer– save return address, and saved

registers– add RTLs for local equates

Page 76: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 76

Fixentry Responsibilities (continued)

• Create and maintain a “mapping” from the registers used to the actual hardware registers

• Save/restore necessary registers and incoming parameters to stack

• Emit function epilogue (including code to restore saved registers)

Page 77: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 77

Not the responsibility of Fixentry

• Perform any optimization• Insert spill code• Make decisions about register

usability• Emit assembly code for any

instructions• Setup registers/stack for making

a function call• Allocate global data

Page 78: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 78

Extern Variables (Where fixentry gets its data)

• struct bblock *top List of basic blocks in current function

• struct locuse *locs local variables and parameters

• int isused[MAXREGS] which registers are used and which

aren’t• int varargs is this a variable

argument function?

Page 79: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 79

Parameters to Fixentry

• struct list *ptr the RTLs in the current function

• struct blist *retb the basic blocks that need epilogue code

Page 80: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 80

Global Variables

• int gpregmap[] The “mapping” of the general purpose registers

• int fpregmap[] The “mapping” of the float registers

• int spilloff Information to the code emitter

about where to place spill variables

Page 81: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 81

Calculating Stack Space

• Loop through RTLs and find out how much space is needed for outgoing params

• Loop through temps and calculate spill space needed

• Loop through locals and calculate local space needed

Page 82: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 82

Calculating Stack Space (cont.)

• Loop through registers and find out which ones need to be saved

• Determine space needed for incoming parameters (register params only)

Page 83: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 83

Emitting Prologue and Epilogue

• Prologue– Emit code to adjust stack pointer– Emit code to spill return address and

saved regs

• Epilogue– For each exit block

•Restore spilled registers•Restore stack pointer• Jump to return address

Page 84: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 84

Register Map

• Register allocator determines what variables are in which register– Fixentry needs to put these variables

in the proper register.

• Fixentry attempts to map registers so no movements are necessary, overriding the allocator assignment policy– If it can’t, register to register moves

are necessary

Page 85: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 85

Other Functions of regs.c

• regarg Boolean function returns true if a local variable is an argument, and enters the

function in a register• initmap Initializes the gpregmap

and fpregmap• map Returns the mapping for a

register

Page 86: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 86

Other Functions of regs.c(continued)

• transfer Creates a transfer RTL from two machine

locations (memory, register, or spill)

• leaf Boolean function determines if a function is a leaf

Page 87: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 87

Summary

• Fixentry is the main portion of regs.c

• Fixentry is responsible for – function prologue– function epilogue – register mapping to avoid register to

register moves

• Regs.c also contains a few functions to let other areas know about the mapping.

Page 88: NCI Report: Zephyr

Using Zephyr for Architecture Research

(continued)

Jason Hiser and Chris Milner

University of Virginia

Page 89: NCI Report: Zephyr

Writing a VPOMachine Specification

Chris MilnerUniversity of Virginia

Page 90: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 90

Outline of talk

• Structure of VPO• Machine descriptions• How to construct the descriptions• Getting machine dependent

information for machine independent transformations– combiner– loop (and other) transformations– scheduler

• EASE

Page 91: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 91

Structure of VPO

C Code

ma

chin

e in

de

pe

nd

en

t so

urc

e

C CodeCSE

C Codestrength

reduction

C Codedead codeelimination

...

C Codesimp.c

Registerdescription

reg.rt

C Codertl.c

machine dependent source

Instructiondescription

md.y

InstructionProcessor

yyfast

C Codesched.c

machineindependent

combiner()

loop_strength()

machinedependent

inst_is_legal()

is_basic()

VPO optimizer

C Code

C Compiler

Pipelinedescription

pipe.pg

RegisterProcessor

regtool

PipelineProcessor(real soon now)

C Code

Page 92: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 92

VPO

• “Machine independent” transformations on low level “machine dependent” intermediate form (register transfer lists)

• Retargeted portion assists in:– recognizing legal RTLs– converting and inserting RTLs to

assist transformations– picking apart RTLs to get information

Page 93: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 93

Role of Machine Descriptions

• md.y - legal instructions– maintains VPO invariant– YACC grammars

• regs.rt - register file– register types– alignment– size– ABI

Page 94: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 94

md.y

• RTL recognizer– Workhorse– RTLs come from combiner (at compile

time)– ours are not usual table driven ones

but directly executable (yyfast)

• How do you do it?– Work from existing ones (derive

Alpha from MIPS); or, – construct one anew

Page 95: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 95

Sample machine

• Subset SIMPLESCALAR– e.g. student project on FPGA– load/store– chars, half words and words– constants must be loaded into

registers– add, and, not, sll, sra, srl– branch on less than, branch on

equal,jump, call, return

Page 96: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 96

Constructing md.y (continued)

• Operands - registers%token REG0 REG1 REG2

(scanner converts ‘b’‘[‘‘1’’]’ to REG0)

reg: REG0

| REG1

| REG2

Page 97: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 97

Constructing md.y (continued)

• Operands - memory%token BMEM WMEM RMEM (scanner converts ‘B’‘[‘ to BMEM )

mem: BMEM reg ‘]’

| WMEM reg ‘]’

| RMEM reg ‘]’

Page 98: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 98

Constructing md.y (continued)

• Operands - misc%token PC RT ST (used for call and return)

%token LOCAL GLOBAL CON LBL

expr: LOCAL

| GLOBAL

| CON

| LBL

Page 99: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 99

Constructing md.y (continued)

• Operations%left ‘=‘ ‘+’ ‘&’ ‘”’ ‘{‘ ‘}’

%nonassoc ‘~’ ‘,’

rhs : reg ‘+’ reg

| reg ‘&’ reg

| reg ‘{‘ reg

| reg ‘}’ reg

| reg ‘”’ reg

Page 100: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 100

Constructing md.y (continued)

• Binary operationsbinops: reg ‘=‘ rhs

• Unary operationnot: reg ‘=‘ ‘~’ rhs

Page 101: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 101

Constructing md.y (continued)

• Load, load immediate and storel : reg ‘=‘ mem

li: reg ‘=‘ expr

s : mem ‘=‘ reg

si: expr ‘=‘ reg (FORTRAN)

Page 102: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 102

Constructing md.y (continued)

• Branchbb: PC ‘=‘ reg ‘:’ reg

| PC ‘=‘ reg ‘<‘ reg • jump call and returnjmp: PC ‘=‘ reg

jal: ST ‘=‘ expr

ret: PC ‘=‘ RT

Page 103: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 103

Constructing md.y (continued)

• All instructionsinst: bb | jmp | jal | ret

| binst | not

| l | li | s

• Now, we need some glue and some checking

Page 104: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 104

Glue for parser

• Build up semantic records• Found in isem.c

– addr() - record for addressing modereg: REG0 {$$=addr(BYTE,BREGISTER…)}

– memref() - record for memory access– brecord() - record for binary op– rrecord() - record for relational op– same() - ensure records are same

Page 105: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 105

Semantic routines

• inst.c– each instruction or instruction class

has a routine– routine checks for legal operands– is responsible for emitting legal asm– e.g. bb() -

•on MIPS check the semantics for compare and branch

• right hand operand immediate, use immediate form of instruction

• records instruction type

Page 106: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 106

Structure of VPO(again)

C Code

ma

chin

e in

de

pe

nd

en

t so

urc

e

C CodeCSE

C Codestrength

reduction

C Codedead codeelimination

...

C Codesimp.c

Registerdescription

reg.rt

C Codertl.c

machine dependent source

Instructiondescription

md.y

InstructionProcessor

yyfast

C Codesched.c

machineindependent

combiner()

loop_strength()

machinedependent

inst_is_legal()

is_basic()

VPO optimizer

C Code

C Compiler

Pipelinedescription

pipe.pg

RegisterProcessor

regtool

PipelineProcessor(real soon now)

C Code

Page 107: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 107

regs.rt

• TYPES– basic types of registers on the

machine– byte,half,word,float,double– BTREG, WTREG, RTREG, FTREG,

DTREG

• CODES– condition codes – IC,FC,etc.

Page 108: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 108

regs.rt(continued)

• CLASS – general_purpose, float, spill– number – scratch – reserve

Page 109: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 109

regs.rt(continued)

• CLASS (continued) – type

•alignment (even-odd register pairs)•size - how many to allocate•invariant - mark as invariant for loops

– e.g. fp and sp•memchar, regchar - give it a different name

•stack, fifo - tells the allocator about them

Page 110: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 110

regs.rt for MIPS

types BTREG, WTREG, RTREG, FTREG, DTREG

codes FC

class = general_purpose

number = 32

scratch = 2..15, 24, 25

reserve = 0, 1, 26, 27, 28, 29, 31

(notes: MIPS - reg 0 is zero, reg 1 is asm reg,reg 26,27 are used by os, reg 28 is gp,reg 29 is sp, reg 31 is return address)

Page 111: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 111

regs.rt for MIPS (continued)

type = RTREG

alignment = 1

size = 1

invariant = 28, 29

endtype

type = BTREG, WTREG

alignment = 1

size = 1

endtype

Page 112: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 112

regs.rt for MIPS (continued)

class = floating_point

number = 16

scratch = 0..9

type = FTREG, DTREG

alignment = 1

size = 1

endtype

endclass

Page 113: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 113

regs.rt for MIPS (continued)

class = SPILL

number = 32

type = BTREG, WTREG, RTREG, FTREG

alignment = 1

size = 1

endtype

type = DTREG

alignment = 2

size = 2

endtype

endclass

Page 114: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 114

Structure of VPO(again)

C Code

ma

chin

e in

de

pe

nd

en

t so

urc

e

C CodeCSE

C Codestrength

reduction

C Codedead codeelimination

...

C Codesimp.c

Registerdescription

reg.rt

C Codertl.c

machine dependent source

Instructiondescription

md.y

InstructionProcessor

yyfast

C Codesched.c

machineindependent

combiner()

loop_strength()

machinedependent

inst_is_legal()

is_basic()

VPO optimizer

C Code

C Compiler

Pipelinedescription

pipe.pg

RegisterProcessor

regtool

PipelineProcessor(real soon now)

C Code

Page 115: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 115

Other files

• simp.c - helps the combiner• sched.c - machine specific

portion of scheduling

• rtl.c - routines to find machine idioms in

transformations

Page 116: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 116

simp.c

• Combine RTLs in machine dependent way

• e.g. SPARC 1 r[35]=~r[35]

2 {1} r[33]=r[33]&r[35]

combines tor[33]=r[33]&~r[35]

semantically ok, but not an instructioncomp() makes machine idiom substitution

r[33]=r[33] ANDNOT r[35]

Page 117: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 117

simp.c(continued)

• e.g. SPARC constants 4095 is biggest immediate1 r[40]=4095

2 {1} r[41]=r[40]+13

combines and folds tor[41]=4108

comp() converts to r[41]=HI[4108]

r[41]=r[41]|LO[4108]

Page 118: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 118

rtl.c

• Manipulate– reverse() - reverse a branch– don’t_bother_with() - tell cse to ignore

• Predicates– is_call(), is_rjmp(), ismem(), writes_mem()

– is_pc(),

• Pick apart– findlabel(), usetype()

Page 119: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 119

rtl.c(continued)

• Insert code to help transformations– store(), load()– multconst()

•add series of shifts and adds

– locsub() - substitute reg for mem•SPARC has sign extend on load•no single sign extend move•have to insert shifts to do sign extend

Page 120: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 120

rtl.c(continued)

r[1] = 0

r[9] = r[14] + a

L32:

r[8] = r[1]*4

R[r[8]+r[9]]=0

r[1]=r[1]+1

IC=r[1]?100

PC=IC<0,L32

• regular induction variable• induced expression• basic induction variable

•Assist loop strength reduction•might be one instruction or several

Page 121: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 121

sched.c

• SPARC - yes, MIPS - no• Scheduler uses mostly machine

independent list scheduling algorithm

• keeps machine specific dependencies straight

• helps avoid hazards

Page 122: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 122

sched.c(continued)

• md_sets_uses– what an instruction does– what an instruction is blocked by– reads can slide past read, not past

writesrtl->does |= READS

rtl->blocks |= WRITES

– writes cannot slide past anythingrtl->does |= WRITES

rtl->blocks |= WRITES | READS

Page 123: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 123

sched.c(continued)

• md_sets_uses– condition code users can’t slide past

one another rtl->does |= ICWRITES

rtl->blocks |= ICWRITES | ICREAD

and rtl->does |= ICREADS

rtl->blocks |= ICWRITES | ICREAD

– calls are treated conservatively•assume codes, floats and memory written

Page 124: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 124

sched.c(continued)

• sched_adv()– relative advantage or disadvantage

of scheduling this instructions next– relative to last instruction scheduled– e.g. SPARC

•space out float instructions•avoid consecutive stores•make consecutive instructions

independent

Page 125: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 125

EASE

• EASE: Environment for Architecture Study and Experimentation– VPO includes a facility for obtaining

•Measurements of instruction usage• Instruction cache traces•Data cache traces•precise timing

– VPO provides facilities for emulating architectures•Can extend existing architectures

Page 126: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 126

EASE(continued)

• Use control-flow graph to insert instrumentation code

• Low overhead (10 to 15%)

• Cache traces generated on the fly (no need to store)

Bump Counter

Bump Counter

BasicBlocks

Page 127: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 127

EASE(continued)

• Emulation of new architecture features– Add new

instructions to machine description

– Generate code and optimize as if new features exist

– In last step of VPO, emit code to emulate new features

r [ 3] = r [ 3] + r [ 2]

r [ 5] = r [ 5] + ( r [ 3] * r [ 2] )

add r2, r3, r3

mul r3, r2, r1add r1, r5, r5

VPOMachLast Step

VPOMachLast Step

Page 128: NCI Report: Zephyr

Case Study: Targeting SimpleScalar

Jason HiserUniversity of Virginia

Page 129: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 129

Introduction

• What is SimpleScalar? Why use it?

• Why use VPO with SimpleScalar?– SimpleScalar comes with gcc, why

not use that?

• Experiences in porting VPO to SimpleScalar

• Research with SimpleScalar and VPO

Page 130: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 130

What is SimpleScalar?

• SimpleScalar is a functional simulator designed for use with architectural research– sim-safe -- a simple, fast simulator– sim-bpred -- measures branch

predictor statistics– sim-cache -- measures cache

statistics– sim-outorder -- models a multi-issue,

out of order superscalar processor

Page 131: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 131

Why Use SimpleScalar?

• Easy to model many common architectural features.– hybrid branch predictors,arbitrarily many

functional units, much more

• Extendible instruction set -- PISA– Allows any instruction to be “annotated”

•easy to create new instructions or add fields to old ones

• Comes with GNU tools for SimpleScalar– gcc, gas, gld, glibc, etc.

Page 132: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 132

Why VPO and SimpleScalar?(Why not use gcc?)

• gcc does not generate instruction annotations

• difficult to write new optimizations to take advantage of new instructions

• just building gcc can be a challenge

Page 133: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 133

Why VPO and SimpleScalar?(continued)

• Easily build VPO on any machine you can build SimpleScalar

• Describe new instructions in machine description and optimizer will automatically use them when beneficial

• New optimizations can consult the machine description to see if architectural support is available– allows portability of optimizations

Page 134: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 134

Experiences with Porting VPO to SimpleScalar

• PISA is basically MIPS– changes to some instruction formats– dmfc1 appears to be broken, negu not

available, branch if (not) equal to zero instructions don’t exist

• Change instruction format in inst.c• When compiling for SimpleScalar

tell the machine description that negu, beqz, bneqz and dmfc1 are not available

Page 135: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 135

Research with SimpleScalar and VPO at UVa

• Idea– Compiler managed on-chip memory can

provide performance and power benefits

• Framework– Add instructions to move data to/from

on-chip memory from/to registers• to VPO (in md.y, inst.c)• to SimpleScalar (machine.def)

– Add optimization to promote variables from cache to on-chip memory

Page 136: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 136

Summary

• SimpleScalar is a versatile functional simulator

• Porting VPO isn’t difficult– SimpleScalar target soon to be

included with VPO

• VPO and SimpleScalar make a great vehicle for architectural research

Page 137: NCI Report: Zephyr

Using Zephyr for Optimization Research

Jack DavidsonUniversity of Virginia

Page 138: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 138

VPO Logical Structure

VPOGenerator

Eval. Order Determ.

ZIFLow Analysis &Transformation Libraries

VPOMIPS

CSDLSPARCSpecification

NewTransformation

CSDLMIPSSpecification

CSDLALPHASpecification

CSDLi486Specification

Register Allocation

Access Coalescing

Comm. Subexpr. Elim.

Eval. Order Determ.

Induction Var. Elim.

Instruction Scheduling

Code Motion

SSA Computation

Page 139: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 139

Actual Structure

VPO

lib SPARC MIPS X86 ALPHA

Page 140: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 140

VPO Program Representation

TOP

BASIC BLOCK

BASIC BLOCK

i

BASIC BLOCK

i

LIST (RTL struct)

LIST

LIST

RTLCOSTINST TYPEUSESSETSDEF/USE

PREDSIDOMSDOMNEST LVLUSESDEFSOUTSPHIREGSTATE

Page 141: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 141

VPO Optimizations

• Review vpo.h

Page 142: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 142

VPO Optimization Algorithm

repeatapply code-improving

transformationuntil fixed-point reached or exhausted registers

• Maintaining two invariants– Semantic invariant (S)

• Observable behavior of program unchanged (according to RTL semantics)

– Machine invariant (M)• Every RTL equivalent to one machine instruction

Page 143: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 143

VPO code optimization

• Each code-improving transformation is– machine-level, but– machine-independent

• Any semantics-preserving transformation is OK

• Preserve machine invariant (M) using machine description;– for each new RTL produced, ask MD if OK– if any is not target machine instruction,

roll back transformation

Page 144: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 144

VPO Optimization Driver

• Review vpo.c

Page 145: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 145

Adding a new optimization

• Determine where in optimize to insert the function– What analyses does the optimization

need?•Control-flow optimizations usually come

first as they need very little data-flow information

•Data-flow optimizations follow: code motion, induction-variable elimination, common subexpression elimination

– Does the optimization operate on a single basic block or does it operate across basic blocks?

Page 146: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 146

Adding a new optimization

• Browse controlflow.c/fix_control_flow()

• Browse cdmotion.c/code_motion()

Page 147: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 147

Semantic Safe Points

• A semantic safe point is a point in the optimization process where the code satisfies the M and S invariants– Code can be emitted at any semantic

safe point and it should run correctly– Can insert new optimization between

any semantic semantic-safe point

Page 148: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 148

Debugging the compiler

SourceCode

Front andMiddle Ends

VPO Mach MachineCode

RTL

Trans n..........Trans 4Trans 3Trans 2Trans 1

Page 149: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 149

VET-VPO Examination Tool

• Allows transformations to be observed– Observe data structure (control-flow

graph)– Set a break point at a transformation– Set a break point at a phase– Replay a transformation

Page 150: NCI Report: Zephyr

VET and VPOISO

Raja VenkateswaranUVA

Page 151: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 151

VET

• VET -> VPO Examination Tool• GUI for viewing optimizations• By Phase and By transformation• Ability to revert to previous

phases• Wide range of user options

Page 152: NCI Report: Zephyr

6/16/2000 PLDI NCI Tutorial 152

VPOISO

• Tool for isolating optimizer bugs

• Uses binary search to find the first transformation error

• Works by comparing against the correct output