Recap Roman Manevich Mooly Sagiv. Outline Subjects Studied Questions & Answers.

40
Recap Roman Manevich Mooly Sagiv

Transcript of Recap Roman Manevich Mooly Sagiv. Outline Subjects Studied Questions & Answers.

Recap

Roman Manevich

Mooly Sagiv

Outline

• Subjects Studied

• Questions & Answers

• input

– program text (file)

• output

– sequence of tokens

• Read input file

• Identify language keywords and standard identifiers

• Handle include files and macros

• Count line numbers

• Remove whitespaces

• Report illegal symbols

• [Produce symbol table]

Lexical Analysis (Scanning)

Summary

• For most programming languages lexical analyzers can be easily constructed automatically

• Exceptions:– Fortran– PL/1

• Lex/Flex/Jlex are useful beyond compilers

• input– Sequence of tokens

• output– Abstract Syntax Tree

• Report syntax errors• unbalanced parenthesizes

• [Create “symbol-table” ]• [Create pretty-printed version of the program]• In some cases the tree need not be generated

(one-pass compilers)

Syntax Analysis (Parsing)

Pushdown Automaton

control

parser-table

input

stack

$

$u t w

V

Efficient Parsers • Pushdown automata

• Deterministic

• Report an error as soon as the input is not a prefix of a valid program

• Not usable for all context free grammars

cup

context free grammar

tokens parser

“Ambiguity errors”

parse tree

Kinds of Parsers

• Top-Down (Predictive Parsing) LL– Construct parse tree in a top-down matter– Find the leftmost derivation– For every non-terminal and token predict the next production– Preorder tree traversal

• Bottom-Up LR– Construct parse tree in a bottom-up manner– Find the rightmost derivation in a reverse order– For every potential right hand side and token decide when a production

is found – Postorder tree traversal

Top-Down Parsing

1

t1 t2

input

5

4

3

2

Bottom-Up Parsing

t1 t2 t4 t5 t6 t7 t8

input

1 2

3

Example Grammar for Predictive LL Top-Down Parsing

expression digit | ‘(‘ expression operator expression ‘)’

operator ‘+’ | ‘*’

digit ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’

Example Grammar for Predictive LL Top-Down Parsing

expression digit | ‘(‘ expression operator expression ‘)’

operator ‘+’ | ‘*’

digit ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’

static int Parse_Expression(Expression **expr_p) {

Expression *expr = *expr_p = new_expression() ;

/* try to parse a digit */

if (Token.class == DIGIT) {

expr->type=‘D’; expr->value=Token.repr –’0’; get_next_token();

return 1; }

/* try parse parenthesized expression */

if (Token.class == ‘(‘) {

expr->type=‘P’; get_next_token();

if (!Parse_Expression(&expr->left)) Error(“missing expression”);

if (!Parse_Operator(&expr->oper)) Error(“missing operator”);

if (Token.class != ‘)’) Error(“missing )”);

get_next_token();

return 1; }

return 0;

}

Parsing Expressions

• Try every alternative production– For P A1 A2 … An | B1 B2 … Bm

– If A1 succeeds• Call A2

• If A2 succeeds – Call A3

• If A2 fails report an error

– Otherwise try B1

• Recursive descent parsing• Can be applied for certain grammars• Generalization: LL1 parsing

int P(...) {

/* try parse the alternative P A1 A2 ... An */

if (A1(...)) {

if (!A2()) Error(“Missing A2”);

if (!A3()) Error(“Missing A3”);

..

if (!An()) Error(Missing An”);

return 1;

}/* try parse the alternative P B1 B2 ... Bm */ if (B1(...)) { if (!B2()) Error(“Missing B2”); if (!B3()) Error(“Missing B3”); .. if (!Bm()) Error(Missing Bm”); return 1; } return 0;

Predictive Parser for Arithmetic Expressions

• Grammar

• C-code?

1 E E + T2 E T3 T T * F4 T F5 F id6 F (E)

Bottom-Up Syntax Analysis

• Input– A context free grammar– A stream of tokens

• Output– A syntax tree or error

• Method– Construct parse tree in a bottom-up manner– Find the rightmost derivation in (reversed order)– For every potential right hand side and token decide when a

production is found– Report an error as soon as the input is not a prefix of valid

program

Constructing LR(0) parsing table

• Add a production S’ S$• Construct a finite automaton accepting “valid

stack symbols”• States are set of items A

– The states of the automaton becomes the states of parsing-table

– Determine shift operations– Determine goto operations– Determine reduce operations– Report an error when conflicts arise

1: S E$

4: E T6: E E + T

10: T i12: T (E)

5: E T T

11: T i i

2: S E $7: E E + TE

13: T ( E)4: E T6: E E + T10: T i12: T (E)

(

(

15: T (E) )

14: T (E )7: E E + T

E

7: E E + T10: T i12: T (E)

+

+

8: E E + T

T

2: S E $ $

i

i

1: S E$

4: E T6: E E + T

10: T i12: T (E)

5: E T T

11: T i i

2: S E $7: E E + TE

13: T ( E)4: E T6: E E + T10: T i12: T (E)

(

(

15: T (E) )

14: T (E )7: E E + T

E

7: E E + T10: T i12: T (E)

+

+

8: E E + T

T

2: S E $ $

i

i

Parsing “(i)$”

Summary (Bottom-Up)• LR is a powerful technique• Generates efficient parsers• Generation tools exit LALR(1)

– Bison, yacc, CUP

• But some grammars need to be tuned– Shift/Reduce conflicts– Reduce/Reduce conflicts– Efficiency of the generated parser

Summary (Parsing)

• Context free grammars provide a natural way to define the syntax of programming languages

• Ambiguity may be resolved• Predictive parsing is natural

– Good error messages

– Natural error recovery

– But not expressive enough

• But LR bottom-up parsing is more expressible

Abstract Syntax• Intermediate program representation

• Defines a tree - Preserves program hierarchy

• Generated by the parser

• Declared using an (ambiguous) context free grammar (relatively flat)– Not meant for parsing

• Keywords and punctuation symbols are not stored (Not relevant once the tree exists)

• Big programs can be also handled (possibly via virtual memory)

Semantic Analysis

• Requirements related to the “context” in which a construct occurs

• Examples– Name resolution– Scoping– Type checking– Escape

• Implemented via AST traversals

• Guides subsequent compiler phases

Abstract InterpretationStatic analysis

• Automatically identify program properties– No user provided loop invariants

• Sound but incomplete methods– But can be rather precise

• Non-standard interpretation of the program operational semantics

• Applications– Compiler optimization– Code quality tools

• Identify potential bugs• Prove the absence of runtime errors• Partial correctness

Basic Compiler Phases

Overall Structure

Techniques Studied

• Simple code generation

• Basic blocks

• Global register allocation

• Activation records

• Object Oriented

• Assembler/Linker/Loader

Two Phase SolutionDynamic ProgrammingSethi & Ullman

• Bottom-up (labeling)– Compute for every subtree

• The minimal number of registers needed (weight)

• Top-Down– Generate the code using labeling by preferring

“heavier” subtrees (larger labeling)

“Global” Register Allocation• Input:

– Sequence of machine code instructions(assembly)

• Unbounded number of temporary registers

• Output– Sequence of machine code instructions

(assembly)– Machine registers – Some MOVE instructions removed– Missing prologue and epilogue

Graph Coloring with Coalescing

Build: Construct the interference graph

Simplify: Recursively remove non MOVE nodes

with less than K neighbors; Push removed nodes into stack

Potential-Spill: Spill some nodes and remove nodes

Push removed nodes into stack

Select: Assign actual registers (from simplify/spill stack)

Actual-Spill: Spill some potential spills and repeat the process

Coalesce: Conservatively merge unconstrained MOV related nodes with fewer than K “heavy” neighbors

Freeze: Give-Up Coalescing on some low-degree MOV related nodes

A Typical Stack Framehigher addresses

previous frame

current frame

lexical pointer

argument 1

argument 2

dynamic link

return address

temporaries

argument 2

argument 1

outgoing parameters

lower addressesnext frame

frame size

frame pointer

stack

pointer

outgoing parameters

registers

locals

administrative

Heap Memory Management

• Part of the runtime system

• Utilities for dynamic memory allocation

• Utilities for automatic memory reclamation– Garbage Colletion

Garbage Collection

• Techniques– Mark and sweep– Copying collection– Reference counting

• Modes– Generational– Incremental vs. Stop the world

argv

argc

return address

In mainbefore foo(argv[2])

fpsp

1000

996

992

988

984

980

989

988987

983

979

975

abcdefgh0

data segment

5000

argv

argc

return address

5000

return address

buf[2]

buf[1]

buf[0]

inside

foo(argv[2])

fp

sp

1000

996

992

988

984

980

989

988987

983

979

975

abcdefgh0

data segment

5000

argv

argc

return address

5000

return address

buf[2]

buf[1]

buf[0]

str : 5000

buf : 988

before strcpy

fp

sp

1000

996

992

988

984

980

989

988987

983

979

975

abcdefgh0

data segment

5000

argv

argc

return address

h 0 X X

return address d f g

buf[2]=c

buf[1]=b

buf[0]=a

str : 5000

buf : 988

return address

previous fp

inside strcpy

sp

fp

1000

996

992

988

984

980

989

988987

983

979

975

abcdefgh0

data segment

5000

argv

argc

return address

h 0 X X

return address d f g

buf[2]=c

buf[1]=b

buf[0]=a

str : 5000

buf : 988

return from strcpy

fp

sp

1000

996

992

988

984

980

989

988987

983

979

975

abcdefgh0

data segment

5000

argv

argc

return address

h 0 X X

return address d f g

buf[2]=c

buf[1]=b

buf[0]=a

Return from foowhere to?

1000

996

992

988

984

980

989

988987

983

979

975

fp

sp

abcdefgh0

data segment

5000