Recap Roman Manevich Mooly Sagiv. Outline Subjects Studied Questions & Answers.
-
Upload
horatio-haynes -
Category
Documents
-
view
221 -
download
0
Transcript of Recap Roman Manevich Mooly Sagiv. Outline Subjects Studied Questions & Answers.
• input
– program text (file)
• output
– sequence of tokens
• Read input file
• Identify language keywords and standard identifiers
• Handle include files and macros
• Count line numbers
• Remove whitespaces
• Report illegal symbols
• [Produce symbol table]
Lexical Analysis (Scanning)
Summary
• For most programming languages lexical analyzers can be easily constructed automatically
• Exceptions:– Fortran– PL/1
• Lex/Flex/Jlex are useful beyond compilers
• input– Sequence of tokens
• output– Abstract Syntax Tree
• Report syntax errors• unbalanced parenthesizes
• [Create “symbol-table” ]• [Create pretty-printed version of the program]• In some cases the tree need not be generated
(one-pass compilers)
Syntax Analysis (Parsing)
Efficient Parsers • Pushdown automata
• Deterministic
• Report an error as soon as the input is not a prefix of a valid program
• Not usable for all context free grammars
cup
context free grammar
tokens parser
“Ambiguity errors”
parse tree
Kinds of Parsers
• Top-Down (Predictive Parsing) LL– Construct parse tree in a top-down matter– Find the leftmost derivation– For every non-terminal and token predict the next production– Preorder tree traversal
• Bottom-Up LR– Construct parse tree in a bottom-up manner– Find the rightmost derivation in a reverse order– For every potential right hand side and token decide when a production
is found – Postorder tree traversal
Example Grammar for Predictive LL Top-Down Parsing
expression digit | ‘(‘ expression operator expression ‘)’
operator ‘+’ | ‘*’
digit ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’
Example Grammar for Predictive LL Top-Down Parsing
expression digit | ‘(‘ expression operator expression ‘)’
operator ‘+’ | ‘*’
digit ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’
static int Parse_Expression(Expression **expr_p) {
Expression *expr = *expr_p = new_expression() ;
/* try to parse a digit */
if (Token.class == DIGIT) {
expr->type=‘D’; expr->value=Token.repr –’0’; get_next_token();
return 1; }
/* try parse parenthesized expression */
if (Token.class == ‘(‘) {
expr->type=‘P’; get_next_token();
if (!Parse_Expression(&expr->left)) Error(“missing expression”);
if (!Parse_Operator(&expr->oper)) Error(“missing operator”);
if (Token.class != ‘)’) Error(“missing )”);
get_next_token();
return 1; }
return 0;
}
Parsing Expressions
• Try every alternative production– For P A1 A2 … An | B1 B2 … Bm
– If A1 succeeds• Call A2
• If A2 succeeds – Call A3
• If A2 fails report an error
– Otherwise try B1
• Recursive descent parsing• Can be applied for certain grammars• Generalization: LL1 parsing
int P(...) {
/* try parse the alternative P A1 A2 ... An */
if (A1(...)) {
if (!A2()) Error(“Missing A2”);
if (!A3()) Error(“Missing A3”);
..
if (!An()) Error(Missing An”);
return 1;
}/* try parse the alternative P B1 B2 ... Bm */ if (B1(...)) { if (!B2()) Error(“Missing B2”); if (!B3()) Error(“Missing B3”); .. if (!Bm()) Error(Missing Bm”); return 1; } return 0;
Predictive Parser for Arithmetic Expressions
• Grammar
• C-code?
1 E E + T2 E T3 T T * F4 T F5 F id6 F (E)
Bottom-Up Syntax Analysis
• Input– A context free grammar– A stream of tokens
• Output– A syntax tree or error
• Method– Construct parse tree in a bottom-up manner– Find the rightmost derivation in (reversed order)– For every potential right hand side and token decide when a
production is found– Report an error as soon as the input is not a prefix of valid
program
Constructing LR(0) parsing table
• Add a production S’ S$• Construct a finite automaton accepting “valid
stack symbols”• States are set of items A
– The states of the automaton becomes the states of parsing-table
– Determine shift operations– Determine goto operations– Determine reduce operations– Report an error when conflicts arise
1: S E$
4: E T6: E E + T
10: T i12: T (E)
5: E T T
11: T i i
2: S E $7: E E + TE
13: T ( E)4: E T6: E E + T10: T i12: T (E)
(
(
15: T (E) )
14: T (E )7: E E + T
E
7: E E + T10: T i12: T (E)
+
+
8: E E + T
T
2: S E $ $
i
i
1: S E$
4: E T6: E E + T
10: T i12: T (E)
5: E T T
11: T i i
2: S E $7: E E + TE
13: T ( E)4: E T6: E E + T10: T i12: T (E)
(
(
15: T (E) )
14: T (E )7: E E + T
E
7: E E + T10: T i12: T (E)
+
+
8: E E + T
T
2: S E $ $
i
i
Parsing “(i)$”
Summary (Bottom-Up)• LR is a powerful technique• Generates efficient parsers• Generation tools exit LALR(1)
– Bison, yacc, CUP
• But some grammars need to be tuned– Shift/Reduce conflicts– Reduce/Reduce conflicts– Efficiency of the generated parser
Summary (Parsing)
• Context free grammars provide a natural way to define the syntax of programming languages
• Ambiguity may be resolved• Predictive parsing is natural
– Good error messages
– Natural error recovery
– But not expressive enough
• But LR bottom-up parsing is more expressible
Abstract Syntax• Intermediate program representation
• Defines a tree - Preserves program hierarchy
• Generated by the parser
• Declared using an (ambiguous) context free grammar (relatively flat)– Not meant for parsing
• Keywords and punctuation symbols are not stored (Not relevant once the tree exists)
• Big programs can be also handled (possibly via virtual memory)
Semantic Analysis
• Requirements related to the “context” in which a construct occurs
• Examples– Name resolution– Scoping– Type checking– Escape
• Implemented via AST traversals
• Guides subsequent compiler phases
Abstract InterpretationStatic analysis
• Automatically identify program properties– No user provided loop invariants
• Sound but incomplete methods– But can be rather precise
• Non-standard interpretation of the program operational semantics
• Applications– Compiler optimization– Code quality tools
• Identify potential bugs• Prove the absence of runtime errors• Partial correctness
Techniques Studied
• Simple code generation
• Basic blocks
• Global register allocation
• Activation records
• Object Oriented
• Assembler/Linker/Loader
Two Phase SolutionDynamic ProgrammingSethi & Ullman
• Bottom-up (labeling)– Compute for every subtree
• The minimal number of registers needed (weight)
• Top-Down– Generate the code using labeling by preferring
“heavier” subtrees (larger labeling)
“Global” Register Allocation• Input:
– Sequence of machine code instructions(assembly)
• Unbounded number of temporary registers
• Output– Sequence of machine code instructions
(assembly)– Machine registers – Some MOVE instructions removed– Missing prologue and epilogue
Graph Coloring with Coalescing
Build: Construct the interference graph
Simplify: Recursively remove non MOVE nodes
with less than K neighbors; Push removed nodes into stack
Potential-Spill: Spill some nodes and remove nodes
Push removed nodes into stack
Select: Assign actual registers (from simplify/spill stack)
Actual-Spill: Spill some potential spills and repeat the process
Coalesce: Conservatively merge unconstrained MOV related nodes with fewer than K “heavy” neighbors
Freeze: Give-Up Coalescing on some low-degree MOV related nodes
A Typical Stack Framehigher addresses
previous frame
current frame
lexical pointer
argument 1
argument 2
dynamic link
return address
temporaries
argument 2
argument 1
outgoing parameters
lower addressesnext frame
frame size
frame pointer
stack
pointer
outgoing parameters
registers
locals
administrative
Heap Memory Management
• Part of the runtime system
• Utilities for dynamic memory allocation
• Utilities for automatic memory reclamation– Garbage Colletion
Garbage Collection
• Techniques– Mark and sweep– Copying collection– Reference counting
• Modes– Generational– Incremental vs. Stop the world
argv
argc
return address
In mainbefore foo(argv[2])
fpsp
1000
996
992
988
984
980
989
988987
983
979
975
abcdefgh0
data segment
5000
argv
argc
return address
5000
return address
buf[2]
buf[1]
buf[0]
inside
foo(argv[2])
fp
sp
1000
996
992
988
984
980
989
988987
983
979
975
abcdefgh0
data segment
5000
argv
argc
return address
5000
return address
buf[2]
buf[1]
buf[0]
str : 5000
buf : 988
before strcpy
fp
sp
1000
996
992
988
984
980
989
988987
983
979
975
abcdefgh0
data segment
5000
argv
argc
return address
h 0 X X
return address d f g
buf[2]=c
buf[1]=b
buf[0]=a
str : 5000
buf : 988
return address
previous fp
inside strcpy
sp
fp
1000
996
992
988
984
980
989
988987
983
979
975
abcdefgh0
data segment
5000
argv
argc
return address
h 0 X X
return address d f g
buf[2]=c
buf[1]=b
buf[0]=a
str : 5000
buf : 988
return from strcpy
fp
sp
1000
996
992
988
984
980
989
988987
983
979
975
abcdefgh0
data segment
5000