Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of...

22
Translator Architecture Code Generator Parser Tokenizer string of characters (source code) string of tokens abstract program string of integers (object code) ? Parser

Transcript of Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of...

Translator Architecture

CodeGenerator

ParserTokenizer

string ofcharacters

(source code)

string oftokens

abstractprogram

string ofintegers

(object code)

?

Parser

Here’s the Plan

Design a context-free grammar to specify (syntactically) valid BL programs

Use the grammar to implement a recursive descent parser (i.e., an algorithm to parse BL programs and construct the corresponding Program object)

Specifying Syntax with CFGs

Context-free grammars (cfg) Some syntactically valid real

constants: 37.044 615.22E16 99241. 18.E-93

Rewrite Rules for Real Constants

<real constant> <digit seqnce> . <digit seqnce> |<digit seqnce> . <digit seqnce> <exponent> |<digit seqnce> . |<digit seqnce> . <exponent>

<exponent> E <digit seqnce> |E + <digit seqnce> |E - <digit seqnce>

<digit seqnce> <digit> <digit seqnce> |<digit>

<digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

can be rewritten as

or

Four Components of a CFG Nonterminal symbols

<real constant>, <exponent>,<digit seqnce>, <digit>

Terminal symbols 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, E, +, -, .

Start symbol <real constant>

Rewrite rules on previous slide

A Derivation of 5.6E10

<real constant>

We begin with the start symbol…

… and choose one of its rewrite rules:

<real constant> <digit seqnce> . <digit seqnce> |<digit seqnce> . <digit seqnce> <exponent> |<digit seqnce> . |<digit seqnce> . <exponent>

A Derivation of 5.6E10

<real constant> <digit seqnce> . <digit seqnce> <exponent>

Now we choose one of the nonterminal symbols…

… and choose one of its rewrite rules:

<digit seqnce> <digit> <digit seqnce> |<digit>

A Derivation of 5.6E10

<real constant> <digit seqnce> . <digit seqnce> <exponent> <digit> . <digit seqnce> <exponent>

Again we choose one of the nonterminal symbols…

… and choose one of its rewrite rules:

<digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

A Derivation of 5.6E10

<real constant> <digit seqnce> . <digit seqnce> <exponent> <digit> . <digit seqnce> <exponent> 5 . <digit seqnce> <exponent>

5 . <digit> <exponent> 5 . 6 <exponent> 5 . 6 E <digit seqnce> 5 . 6 E <digit> <digit seqnce> 5 . 6 E 1 <digit seqnce> 5 . 6 E 1 <digit> 5 . 6 E 1 0

If it’s possible to find such a derivation, we write:<real constant> 5.6E10*

A Derivation Tree for 5.6E10<real constant>

<digit seqnce> <digit seqnce>

<digit seqnce>

<digit seqnce>

<digit> <digit>

<digit>

<digit>

<exponent>.

E

5 6

1

0

Find a Derivation Tree for 5.E3 The derivation tree:

Can you find a derivation tree for .6E10?

<real constant>

<digit seqnce> <exponent>.

<digit> <digit seqnce>E

<digit>

3

5

Why not?

Language Generated by a CFG Definition: Let G = (Nonterminals,

Terminals, <start symbol>, Rewrite Rules) be a context-free grammar. The language generated (or specified) by G is denoted L(G) and is defined as:

*L(G) = {x: string of Terminals (<start symbol> x)}

Another Example: A CFG for Boolean Expressions

<bool exp> T |F |NOT ( <bool exp> ) |( <bool exp> AND <bool exp> ) |( <bool exp> OR <bool exp> )

CFG for Boolean Expressions What are the nonterminal symbols?

What are the terminal symbols?

What is the start symbol?

How many rewrite rules?

<bool exp>

T, F, AND, OR, NOT, (, )

<bool exp>

5

A Derivation Tree for NOT((T OR F))

<bool exp>

NOT ( )<bool exp>

<bool exp> <bool exp>OR( )

T F

Find a Derivation Tree for ((T OR NOT (F)) AND T)

<bool exp>

<bool exp> <bool exp>AND( )

T<bool exp> <bool exp>OR( )

T NOT ( )<bool exp>

F

A Famous Context-free Grammar

<expression> <expression> <addop> <term> |

<term>

<term> <term> <multop> <factor> |<factor>

<factor> ( <expression> ) |<digit seqnce>

<addop> + | -

<multop> * | DIV | MOD

<digit seqnce> <digit> <digit seqnce> |

<digit>

<digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

What’s So Special About This CFG?

Find a derivation tree for 4 + 6 * 2<expression>

<addop> <term><expression>

<term>

<factor>

<digit seqnce>

<digit>

4

+ <term><multop> <factor>

<factor>

<digit>

6

<digit seqnce>

* <digit seqnce>

<digit>

2

What’s So Special Continued…

Find the derivationtree for(4 + 6) * 2

<expression>

<term>

<factor>

<digit seqnce>

<digit>

4

<term>

<term><multop> <factor>

<factor>

* <digit seqnce>

<digit>

2

<expression>( )

<expression><addop><term>

<digit>

6

<digit seqnce>

<factor>+

How About These Rewrite Rules?

<expression> <expression> <op> <expression> |

( <expression> ) |<digit seqnce>

<op> + | - | * | DIV | MOD

<digit seqnce> <digit> <digit seqnce> |

<digit>

<digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

New Rules Continued…

Find a derivation tree for 4 + 6 * 2<expression>

<op> <expression><expression>

<digit seqnce>

<digit>

4

+<expression><op><expression>

<digit seqnce>

6

<digit>

*<digit seqnce>

<digit>

2

New Rules Continued…

How about this tree for 4 + 6 * 2?<expression>

<op><expression> <expression>

<digit seqnce>

<digit>

2

*<expression><op><expression>

<digit seqnce>

4

<digit>

+<digit seqnce>

<digit>

6