CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian...

61
CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Chapter 4: Syntax Analysis Part 1: Grammar Concepts Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT 06269-3155 [email protected] http://www.engr.uconn.edu/~steve (860) 486 - 4818 Material for course thanks to: Laurent Michel Aggelos Kiayias Robert LeBarre

Transcript of CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian...

Page 1: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.1

CSE4100

Chapter 4: Syntax AnalysisChapter 4: Syntax AnalysisPart 1: Grammar ConceptsPart 1: Grammar Concepts

Prof. Steven A. DemurjianComputer Science & Engineering Department

The University of Connecticut371 Fairfield Way, Unit 2155

Storrs, CT [email protected]

http://www.engr.uconn.edu/~steve(860) 486 - 4818

Material for course thanks to:Laurent MichelAggelos KiayiasRobert LeBarre

Page 2: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.2

CSE4100

Syntax Analysis - ParsingSyntax Analysis - Parsing

An overview of parsing : Functions & An overview of parsing : Functions & ResponsibilitiesResponsibilities

Context Free Grammars: Concepts & TerminologyContext Free Grammars: Concepts & Terminology Writing and Designing GrammarsWriting and Designing Grammars Resolving Grammar Problems / DifficultiesResolving Grammar Problems / Difficulties Top-Down Parsing Top-Down Parsing

Recursive Descent & Predictive LL Bottom-Up ParsingBottom-Up Parsing

LR & LALR Key Issues in Error HandlingKey Issues in Error Handling Concluding Remarks/Looking AheadConcluding Remarks/Looking Ahead

Page 3: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.3

CSE4100

An Overview of ParsingAn Overview of Parsing

Why are Grammars to formally describe Languages Important ?

1. Precise, easy-to-understand representations

2. Compiler-writing tools can take grammar and generate a compiler

3. allow language to be evolved (new statements, changes to statements, etc.) Languages are not static, but are constantly upgraded to add new features or fix “old” ones

ADA ADA9x, C++ Adds: Templates, exceptions,

How do grammars relate to parsing process ?

Page 4: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.4

CSE4100

Parsing During CompilationParsing During Compilation

interm repres

errors

lexical analyzer

parserrest of

front end

symbol table

source program

parse treeget next

token

token

regular expressions

• also technically part or parsing

• includes augmenting info on tokens in source, type checking, semantic analysis

• uses a grammar to check structure of tokens• produces a parse tree• syntactic errors and recovery• recognize correct syntax• report errors

Page 5: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.5

CSE4100

Parsing ResponsibilitiesParsing Responsibilities

Syntax Error Identification / Handling

Recall typical error types:

Lexical : Misspellings

Syntactic : Omission, wrong order of tokens

Semantic : Incompatible types

Logical : Infinite loop / recursive call

Majority of error processing occurs during syntax analysis

NOTE: Not all errors are identifiable !! Which ones?

Page 6: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.6

CSE4100

Key issues in Error HandlingKey issues in Error Handling

DetectionDetection Actual Detection Finding position at which they occur

ReportingReporting Clear / accurate presentation

RecoveryRecovery How to skip over to continue and find later errors Cannot impact compilation of correct programs

Page 7: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.7

CSE4100

What are some Typical Errors ?What are some Typical Errors ?#include<stdio.h>

int f1(int v)

{ int i,j=0;

for (i=1;i<5;i++)

{ j=v+f2(i) }

return j; }

int f2(int u)

{ int j;

j=u+f1(u*u)

return j; }

int main()

{ int i,j=0;

for (i=1;i<10;i++)

{ j=j+i*I; printf(“%d\n”,i);

printf("%d\n",f1(j));

return 0;

}

Which are “easy” to recover from? Which are “hard” ?

As reported by MS VC++

'f2' undefined; assuming extern returning intsyntax error : missing ';' before ‘}‘syntax error : missing ';' before ‘return‘fatal error : unexpected end of file found

Page 8: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.8

CSE4100

Error Recovery StrategiesError Recovery Strategies

Panic Mode – Discard tokens until a “synchro” token is found ( end, “;”, “}”, etc. ) -- Decision of designer -- Problems: skip input miss declaration – causing more errors miss errors in skipped material -- Advantages: simple suited to 1 error per statementPhase Level – Local correction on input -- “,” ”;” – Delete “,” – insert “;” -- Also decision of designer -- Not suited to all situations -- Used in conjunction with panic mode to allow less input to be skipped

Page 9: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.9

CSE4100

Error Recovery Strategies – (2)Error Recovery Strategies – (2)

Error Productions: -- Augment grammar with rules -- Augment grammar used for parser construction / generation -- Add a rule for := in C assignment statements Report error but continue compile -- Self correction + diagnostic messages -- What are other rules ? (supported by yacc)Global Correction: -- Adding / deleting / replacing symbols is chancy – may do many changes ! -- Algorithms available to minimize changes costly - key issuesWhat do you think of each approach? What approach do compilers you’ve used take ?

Page 10: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.10

CSE4100

Motivating GrammarsMotivating Grammars

• Regular Expressions

Basis of lexical analysis

Represent regular languages

• Context Free Grammars

Basis of parsing

Represent language constructs

Characterize context free languages

Reg. Lang. CFLs

EXAMPLE: anbn , n 1 : Is it regular ?

Page 11: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.11

CSE4100

Context Free LanguagesContext Free Languages

Regular LanguagesRegular Languages

Token StructureToken Structure

Scanner GenerationScanner Generation

Context Free Context Free LanguagesLanguages

Sentence StructureSentence Structure

Parser GenerationParser Generation

Context Free Context Free LanguagesLanguages

Reg. Reg. LanguagLanguageses

Page 12: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.12

CSE4100

Context Free Grammars : Context Free Grammars : Concepts & TerminologyConcepts & Terminology

Definition: A Context Free Grammar, CFG, is described by T, NT, S, PR, where:

T: Terminals / tokens of the language

NT: Non-terminals to denote sets of strings generatable by the grammar & in the language

S: Start symbol, SNT, which defines all strings of the language

PR: Production rules to indicate how T and NT are combines to generate valid strings of the language.

PR: NT (T | NT)*

Like a Regular Expression / DFA / NFA, a Context Free Grammar is a mathematical model !

Page 13: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.13

CSE4100

Context Free Grammars : A First LookContext Free Grammars : A First Look

assign_stmt id := expr ;

expr term operator term

term id

term real

term integer

operator +

operator -

What do “BLUE” symbols represent?

Derivation: A sequence of grammar rule applications and substitutions that transform a starting non-term into a collection of terminals / tokens.

Simply stated: Grammars / production rules allow us to “rewrite” and “identify” correct syntax.

What do “BLACK” symbols represent?

Page 14: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.14

CSE4100

How is Grammar Used ?How is Grammar Used ?

Given the rules on the previous slide, suppose

id := real + int;

is input. Is it syntactically correct? How do we know?

expr is represented as: expr term operator term

Is this accurate / complete?

expr expr operator term

expr term

How does this affect the derivations which are possible?

Page 15: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.15

CSE4100

Derivation and Parse TreeDerivation and Parse Tree

Let’s derive: id := id + real – integer ;

What’s its parse tree ?

Page 16: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.16

CSE4100

Derivation ExampleDerivation Example

Let’s derive: Let’s derive: id := id + real – integer ;id := id + real – integer ;assign_stmt assign_stmt → id := expr ;

→ id := expr ; expr → expr operator term

→ id := expr operator term; expr → expr operator term

→ id := expr operator term operator term; expr → term

→ id := term operator term operator term; term → id

→ id := id operator term operator term; operator → +

→ id := id + term operator term; term → real

→ id := id + real operator term; operator → -

→ id := id + real - term; term → integer

→ id := id + real - integer; assign_stmt → id := expr ;expr → expr operator termexpr → termterm → idterm → realterm → integeroperator → +operator → -

Page 17: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.17

CSE4100

Example GrammarExample Grammar

expr expr op expr

expr ( expr )

expr - expr

expr id

op +

op -

op *

op /

op

Black : NT

Blue : T

expr : S

9 Production rules

To simplify / standardize notation, we offer a synopsis of terminology.

Page 18: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.18

CSE4100

Example Grammar - TerminologyExample Grammar - Terminology

Terminals: a,b,c,+,-,punc,0,1,…,9, blue strings

Non Terminals: A,B,C,S, black strings

T or NT: X,Y,Z

Strings of Terminals: u,v,…,z in T*

Strings of T / NT: , in ( T NT)*

Alternatives of production rules:

A 1; A 2; …; A k; A 1 | 2 | … | 1

First NT on LHS of 1st production rule is designated as start symbol !

E E A E | ( E ) | -E | id

A + | - | * | / |

Page 19: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.19

CSE4100

Grammar ConceptsGrammar Concepts

A step in a derivation is zero or one action that replaces a NT with the RHS of a production rule.

EXAMPLE: E -E (the means “derives” in one step) using the production rule: E -E

EXAMPLE: E E A E E * E E * ( E )

DEFINITION: derives in one step

derives in one step

derives in zero steps

+

*

EXAMPLES: A if A is a production rule

1 2 … n 1 n ; for all

If and then

* *

**

Page 20: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.20

CSE4100

How does this relate to Languages?How does this relate to Languages?

Let G be a CFG with start symbol S. Then S W (where W has no non-terminals) represents the language generated by G, denoted L(G). So WL(G) S W.

+

+

W : is a sentence of G

When S (and may have NTs) it is called a sentential form of G.

*

EXAMPLE: id * id is a sentence

Here’s the derivation:

E E A E E * E id * E id * id

E id * idSentential forms

Page 21: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.21

CSE4100

lm

Leftmost and Rightmost DerivationsLeftmost and Rightmost Derivations

Leftmost: Replace the leftmost non-terminal symbol

E E A E id A E id * E id * id

Rightmost: Replace the leftmost non-terminal symbol

E E A E E A id E * id id * idrm

lmlmlmlm

rmrmrm

Important Notes: A

If A , what’s true about ?

If A , what’s true about ?rm

Derivations: Actions to parse input can be represented pictorially in a parse tree.

Page 22: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.22

CSE4100

Examples of LM / RM DerivationsExamples of LM / RM Derivations

E E A E | ( E ) | -E | id

A + | - | * | / |

A leftmost derivation of : id + id * id

A rightmost derivation of : id + id * id

Page 23: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.23

CSE4100

Derivations & Parse TreeDerivations & Parse Tree

E

EE A

E

EE A

*

E

EE A

id *

E

EE A

id id*

E * E

E E A E

id * E

id * id

Page 24: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.24

CSE4100

Parse Trees and DerivationsParse Trees and Derivations

Consider the expression grammar:

E E+E | E*E | (E) | -E | id

Leftmost derivations of id + id * id

E E + E E + E id + E

E

EE +

id

E

EE *

id + E id + E * E

E

EE +

id

E

EE +

Page 25: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.25

CSE4100

Parse Tree & Derivations - continuedParse Tree & Derivations - continued

E

EE *

id + E * E id + id * E

E

EE +

id

id

id + id * E id + id * id E

EE *

E

EE +

id

idid

Page 26: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.26

CSE4100

Alternative Parse Tree & DerivationAlternative Parse Tree & Derivation

E E * E

E + E * E

id + E * E

id + id * E

id + id * id

E

E E+

E

E E*

id

id id

WHAT’S THE ISSUE HERE ?

Page 27: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.27

CSE4100

Example Pascal Grammar in YaccExample Pascal Grammar in Yacc

%term T_AND T_ARRAY T_BEGIN T_CASE T_CONST T_DIV T_DO T_RANGE T_TO T_ELSE T_END T_FILE T_FOR T_FUNCTION /* T_GOTO */ T_IDENTIFIER T_IF T_IN T_INTEGER T_LABEL T_MOD T_NOT T_REAL T_OF T_OR /* T_PACKED */ T_NIL T_PROCEDURE T_PROGRAM T_RECORD T_REPEAT T_SET T_STRING T_THEN T_DOWNTO T_TYPE T_UNTIL T_VAR T_WHILE /* T_WITH */

T_COMMA T_PERIOD T_ASSIGN T_COLON T_LPAREN T_RPAREN T_LBRACK T_RBRACK T_UPARROW T_SEMI

%binary T_LT T_EQ T_GT T_GE T_LE T_NE T_IN%left T_PLUS T_MINUS T_OR%left UNARYSIGN%left T_MULT T_RDIV T_DIV T_MOD T_AND%left T_NOT

%%

Page 28: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.28

CSE4100

Example Pascal Grammar in YaccExample Pascal Grammar in Yacc

Start : Program_Header Declarations Block T_PERIOD

Program_Header : T_PROGRAM T_IDENTIFIER T_LPAREN Id_List T_RPAREN T_SEMI

Block : T_BEGIN Stt_List T_END

Declarations : Const_Def_Part Type_Def_Part Var_Decl_Part Routine_Decl_Part

Const_Def_Part : T_CONST Const_Def_List | /* epsilon */ ;

Const_Def_List : Const_Def | Const_Def_List Const_Def ;

Const_Def : T_IDENTIFIER T_EQ Constant T_SEMI

Type_Def_Part : T_TYPE Type_Def_List | /* epsilon */

Type_Def_List : Type_Def | Type_Def_List Type_Def

Page 29: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.29

CSE4100

Example Pascal Grammar in YaccExample Pascal Grammar in Yacc

Type_Def : T_IDENTIFIER T_EQ Type T_SEMI

Var_Decl_Part : T_VAR Var_Decl_List | /* epsilon */

Var_Decl_List : Var_Decl_List Var_Decl | Var_Decl

Var_Decl : Id_List T_COLON Type T_SEMI

Routine_Decl_Part : Routine_Decl_Part Routine_Decl | /* epsilon */

Routine_Decl : Routine_Head T_IDENTIFIER T_SEMI | Routine_Head Declarations Block T_SEMI

Routine_Head : T_PROCEDURE T_IDENTIFIER Params T_SEMI | T_FUNCTION T_IDENTIFIER Params Function_Type T_SEMI

Params : T_LPAREN Params_List T_RPAREN | /* epsilon */

Param_Group : Id_List T_COLON Type | T_VAR Id_List T_COLON Type

Page 30: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.30

CSE4100

Example Pascal Grammar in YaccExample Pascal Grammar in Yacc

Function_Type : T_COLON Type | /* epsilon */

Params_List : Param_Group | Params_List T_SEMI Param_Group

Constant : T_STRING | Number | T_PLUS Number | T_MINUS Number

Number : T_IDENTIFIER | T_INTEGER | T_REAL

Const_List : Constant | Const_List T_COMMA Constant

Type : Simple_Type | T_UPARROW T_IDENTIFIER | Structured_Type

Simple_Type : T_IDENTIFIER | T_LPAREN Id_List T_RPAREN | Constant T_RANGE Constant

Structured_Type : T_ARRAY T_LBRACK Simple_Type_List T_RBRACK T_OF Type | T_SET T_OF Simple_Type | T_RECORD Field_List T_END

Simple_Type_List : Simple_Type | Simple_Type_List T_COMMA Simple_Type

Page 31: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.31

CSE4100

Example Pascal Grammar in YaccExample Pascal Grammar in Yacc

Field_List : Fixed_Part /* Variant_part */

Fixed_Part : Field | Fixed_Part T_SEMI Field

Field : /* epsilon */ | Id_List T_COLON Type

Variant_part : / * epsilon * / | T_CASE T_IDENTIFIER T_OF Variant_List | T_CASE T_IDENTIFIER T_COLON T_IDENTIFIER T_OF Variant_List

Variant_List : Variant | Variant_List T_SEMI Variant

Variant : / * epsilon * / | Const_List T_COLON T_LPAREN Field_List T_RPAREN

Stt_List : Statement | Stt_List T_SEMI Statement

Case_Stt_List : Case_Statement | Case_Stt_List T_SEMI Case_Statement

Case_Statement : Const_List T_COLON Statement | /* epsilon */

Page 32: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.32

CSE4100

Example Pascal Grammar in YaccExample Pascal Grammar in Yacc

Statement : /* epsilon */ | T_IDENTIFIER | T_IDENTIFIER T_LPAREN Expr_List T_RPAREN | Variable T_ASSIGN Expression | T_BEGIN Stt_List T_END | T_CASE Expression T_OF Case_Stt_List T_END | T_WHILE Expression T_DO Statement | T_REPEAT Stt_List T_UNTIL Expression | T_FOR Variable T_ASSIGN Expression T_TO Expression T_DO

Statement | T_FOR Variable T_ASSIGN Expression T_DOWNTO Expression T_DO

Statement | T_IF Expression T_THEN Statement | T_IF Expression T_THEN Statement T_ELSE Statement

Expression : Expression relop Expression %prec T_LT | T_PLUS Expression %prec UNARYSIGN | T_MINUS Expression %prec UNARYSIGN | Expression addop Expression %prec T_PLUS | Expression divop Expression %prec T_MULT | T_NIL | T_STRING | T_INTEGER | T_REAL | Variable | T_IDENTIFIER T_LPAREN Expr_List T_RPAREN | T_LPAREN Expression T_RPAREN | negop Expression %prec T_NOT | T_LBRACK Element_List T_RBRACK | T_LBRACK T_RBRACK

Page 33: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.33

CSE4100

Example Pascal Grammar in YaccExample Pascal Grammar in Yacc

Element_List : Element | Element_List T_COMMA Element

Element : Expression | Expression T_RANGE Expression

Variable : T_IDENTIFIER | Variable T_LBRACK Expr_List T_RBRACK | Variable T_PERIOD T_IDENTIFIER | Variable T_UPARROW

Expr_List : Expression | Expr_List T_COMMA Expression

relop : T_EQ | T_LT | T_GT | T_NE | T_LE | T_GE | T_IN

addop : T_PLUS | T_MINUS | T_OR

divop : T_MULT | T_RDIV | T_DIV | T_MOD | T_AND ;

negop : T_NOT

Page 34: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.34

CSE4100

Resolving Grammar Problems/DifficultiesResolving Grammar Problems/Difficulties

Regular Expressions : Basis of Lexical Analysis

Reg. Expr. generate/represent regular languages

Reg. Languages smallest, most well defined class

of languages

Context Free Grammars: Basis of Parsing

CFGs represent context free languages

CFLs contain more powerful languages

Reg. Lang. CFLsanbn – CFL that’s not regular!

Try to write DFA/NFA for it !

Page 35: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.35

CSE4100

Resolving Problems/Difficulties – (2)Resolving Problems/Difficulties – (2)

Since Reg. Lang. Context Free Lang., it is possible to go from reg. expr. to CFGs via NFA.

start0 3b21 ba

a

b

Recall: (a | b)*abb

Page 36: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.36

CSE4100

Resolving Problems/Difficulties – (3)Resolving Problems/Difficulties – (3)

Construct CFG as follows:

1. Each State I has non-terminal Ai : A0, A1, A2, A3

2. If then Ai a Aj

3. If then Ai Aj

4. If I is an accepting state, Ai : A3

5. If I is a starting state, Ai is the start symbol : A0

i ja

i jb

: A0 aA0, A0 aA1

: A0 bA0, A1 bA2

: A2 bA3

T={a,b}, NT={A0, A1, A2, A3}, S = A0

PR ={ A0 aA0 | aA1 | bA0 ; A1 bA2 ; A2 3A3 ; A3 }

Page 37: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.37

CSE4100

How Does This CFG Derive Strings ?How Does This CFG Derive Strings ?

start0 3b21 ba

a

b

vs. A0 aA0, A0 aA1

A0 bA0, A1 bA2

A2 bA3, A3

How is abaabb derived in each ?

Page 38: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.38

CSE4100

Regular Expressions vs. CFGsRegular Expressions vs. CFGs

Regular expressions for lexical syntax

1. CFGs are overkill, lexical rules are quite simple and straightforward

2. REs – concise / easy to understand

3. More efficient lexical analyzer can be constructed

4. RE for lexical analysis and CFGs for parsing promotes modularity, low coupling & high cohesion. Why?

CFGs : Match tokens “(“ “)”, begin / end, if-then-else, whiles, proc/func calls, …

Intended for structural associations between tokens !

Are tokens in correct order ?

Page 39: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.39

CSE4100

Resolving Grammar Difficulties : Resolving Grammar Difficulties : MotivationMotivation

1. Humans write / develop grammars

2. Different parsing approaches have different needs

Top-Down vs. Bottom-Up

• For: 1 remove “errors”

• For: 2 put / redesign grammar- ambiguity- -moves- cycles- left recursion- left factoring

Grammar Problems

LL(k)Recursive

LR(k)LR(k)LALR(k)LALR(k)

Page 40: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.40

CSE4100

Resolving Problems: Ambiguous Resolving Problems: Ambiguous GrammarsGrammars

Consider the following grammar segment:stmt if expr then stmt

| if expr then stmt else stmt

| other (any other statement)

What’s problem here ?

Consider the Program:

Else must match to previous then.

Structure indicates parse subtree for expression.

if e1

then if e2

then s1

else s2

Page 41: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.41

CSE4100

Resulting Parse TreeResulting Parse Tree

Easy caseEasy case Else must match to previous then.

if e1

then s1

else if e2

then s2

else s3

Page 42: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.42

CSE4100

Example : What Happens with this string?Example : What Happens with this string?

If E1 then if E2 then S1 else S2

How is this parsed ?

if E1 then

if E2 then

S1

else

S2

if E1 then

if E2 then

S1

else

S2

vs.

What’s the issue here ?

Page 43: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.43

CSE4100

Parse Trees for ExampleParse Trees for Example

Form 1: Form 2:

What’s the issue here ?

if e1

then if e2

then s1

else s2

Page 44: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.44

CSE4100

Removing AmbiguityRemoving Ambiguity

Take Original Grammar:

stmt if expr then stmt

| if expr then stmt else stmt

| other (any other statement)

Revise to remove ambiguity:

stmt matched_stmt | unmatched_stmt

matched_stmt if expr then matched_stmt else matched_stmt | other

unmatched_stmt if expr then stmt

| if expr then matched_stmt else unmatched_stmt

How does this grammar work ?

Page 45: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.45

CSE4100

Resolving Difficulties : Left RecursionResolving Difficulties : Left Recursion

A left recursive grammar has rules that support the derivation : A A, for some .+

Top-Down parsing can’t reconcile this type of grammar, since it could consistently make choice which wouldn’t allow termination.

A A A A … etc. A A |

Take left recursive grammar:

A A | To the following:

A’ A’

A’ A’ |

Page 46: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.46

CSE4100

Why is Left Recursion a Problem ?Why is Left Recursion a Problem ?

Consider: E E + T | T T T * F | F F ( E ) | id

Derive : id + id + id

E E + T

How can left recursion be removed ?

E E + T | T What does this generate?

E E + T T + T

E E + T E + T + T T + T + T

How does this build strings ?

What does each string have to start with ?

Page 47: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.47

CSE4100

Resolving Difficulties : Left Recursion (2)Resolving Difficulties : Left Recursion (2)

Informal Discussion:

Take all productions for A and order as:

A A1 | A2 | … | Am | 1 | 2 | … | n

Where no i begins with A.

Now apply concepts of previous slide:

A 1A’ | 2A’ | … | nA’

A’ 1A’ | 2A’ | … | m A’ | For our example: E E + T | T

T T * F | F

F ( E ) | id

E TE’ E’ + TE’ | T FT’

T’ * FT’ | F ( E ) | id

Page 48: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.48

CSE4100

Resolving Difficulties : Left Recursion (3)Resolving Difficulties : Left Recursion (3)

Problem: If left recursion is two-or-more levels deep, this isn’t enough

S Aa | b

A Ac | Sd | S Aa Sda

Algorithm:Input: Grammar G with no cycles or -productions

Output: An equivalent grammar with no left recursion

1. Arrange the non-terminals in some order A1,A2,…An

2. for i := 1 to n do begin

for j := 1 to i – 1 do begin

replace each production of the form Ai Aj

by the productions Ai 1 | 2 | … | k

where Aj 1|2|…|k are all current Aj productions; end

eliminate the immediate left recursion among Ai productions end

Page 49: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.49

CSE4100

Using the AlgorithmUsing the Algorithm

Apply the algorithm to: A1 A2a | b

A2 A2c | A1d |

i = 1

For A1 there is no left recursion

i = 2

for j=1 to 1 do

Take productions: A2 A1 and replace with

A2 1 | 2 | … | k |

where A1 1 | 2 | … | k are A1 productions

in our case A2 A1d becomes A2 A2ad | bdWhat’s left: A1 A2a | b

A2 A2 c | A2 ad | bd | Are we done ?

Page 50: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.50

CSE4100

Using the Algorithm (2)Using the Algorithm (2)

No ! We must still remove A2 left recursion !

A1 A2a | b

A2 A2 c | A2 ad | bd |

Recall:

A A1 | A2 | … | Am | 1 | 2 | … | n

A 1A’ | 2A’ | … | nA’

A’ 1A’ | 2A’ | … | m A’ |

Apply to above case. What do you get ?

Page 51: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.51

CSE4100

Removing Difficulties : Removing Difficulties : -Moves-Moves

Very Simple: A and B uAv implies add B uv to the grammar G.

Why does this work ?

E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id

Examples:

A1 A2 a | b

A2 bd A2’ | A2’

A2’ c A2’ | bd A2’ |

Page 52: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.52

CSE4100

Removing Difficulties : CyclesRemoving Difficulties : Cycles

How would cycles be removed ?

S SS | ( S ) |

Has: S SS S

S

Page 53: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.53

CSE4100

Removing Difficulties : Left FactoringRemoving Difficulties : Left Factoring

Problem : Uncertain which of 2 rules to choose:

stmt if expr then stmt else stmt

| if expr then stmt

When do you know which one is valid ?

What’s the general form of stmt ?

A 1 | 2 : if expr then stmt

1: else stmt 2 :

Transform to:

A A’

A’ 1 | 2

EXAMPLE:

stmt if expr then stmt rest

rest else stmt |

Page 54: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.54

CSE4100

Resolving Grammar Problems : Resolving Grammar Problems : Another LookAnother Look

• Forcing Matching Within Grammar

- if-then-else else must go with most recent if

stmt if expr then stmt | if expr then stmt else stmt | other

stmt matched_stmt | unmatched_stmt

matched_stmt if expr then matched_stmt else matched_stmt | other

unmatched_stmt if expr then stmt

| if expr then matched_stmt else unmatched_stmt

Leads to:

Page 55: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.55

CSE4100

Resolving Grammar Problems : Resolving Grammar Problems : Another Look (2)Another Look (2)

• Left Factoring - Know correct choice in advance !

where_queries where are_does declarations of symbol

| where are_does “search_option” appear

Rules given in Assignment or

where_queries where rest_where

rest_where are declarations of symbol

| does “search_option” appear

Page 56: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.56

CSE4100

Resolving Grammar Problems : Resolving Grammar Problems : Another Look (3)Another Look (3)

• Left Recursion - Avoid infinite loop when choosing rules

A A | A A’

A’ A’ | EXAMPLE:

E E + T | T

T T * F | F

F ( E ) | id

E TE’ E’ + TE’ | T FT’

T’ * FT’ | F ( E ) | id

If we use the original production rules on input: id + id (with leftmost):

E E + T E + T + T E + T + T + T T + T + T + … + T

If we use the transformed production rules on input: id + id (with leftmost):

E TE’ T + TE’ id + TE’ id + id E’ id + id

*

not assured

**

Page 57: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.57

CSE4100

Resolving Grammar Problems : Resolving Grammar Problems : Another Look (4)Another Look (4)

• Removing -Moves

E TE’ E’ + TE’ |

E TE TE’E’ + TE’E’ + T

Is there still a problem ?

Page 58: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.58

CSE4100

Resolving Grammar ProblemsResolving Grammar Problems

Note: Not all aspects of a programming language can be represented by context free grammars / languages.

Examples:

1. Declaring ID before its use

2. Valid typing within expressions (Type Rules)

3. Parameters in definition vs. in call

4. Method Overloading/hiding

These features are call context-sensitive and define yet another language class, CSL.

Reg. Lang. CFLs CSLs

Page 59: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.59

CSE4100

Context-Sensitive Languages - ExamplesContext-Sensitive Languages - Examples

Examples:

L1 = { wcw | w is in (a | b)* } : Declare before use

L2 = { an bm cn dm | n 1, m 1 }

an bm : formal parameter

cn dm : actual parameter

What does it mean when L is context-sensitive ?

Page 60: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.60

CSE4100

How do you show a Language is a CFL?How do you show a Language is a CFL?

L3 = { w c wR | w is in (a | b)* }

L4 = { an bm cm dn | n 1, m 1 }

L5 = { an bn cm dm | n 1, m 1 }

L6 = { an bn | n 1 }

Page 61: CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian Computer Science & Engineering Department The University.

CH4p1.61

CSE4100

SolutionsSolutions

L3 = { w c wR | w is in (a | b)* }

L4 = { an bm cm dn | n 1, m 1 }

L5 = { an bn cm dm | n 1, m 1 }

L6 = { an bn | n 1 }

S a S a | b S b | c

S a S d | a A d

A b A c | bc

S XY

X a X b | ab

Y c Y d | cd

S a S b | ab