SB Syntax analysis How ScriptBasic performs Syntax analysis.
CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian...
-
Upload
elizabeth-simon -
Category
Documents
-
view
238 -
download
0
Transcript of CH4p1.1 CSE4100 Chapter 4: Syntax Analysis Part 1: Grammar Concepts Prof. Steven A. Demurjian...
CH4p1.1
CSE4100
Chapter 4: Syntax AnalysisChapter 4: Syntax AnalysisPart 1: Grammar ConceptsPart 1: Grammar Concepts
Prof. Steven A. DemurjianComputer Science & Engineering Department
The University of Connecticut371 Fairfield Way, Unit 2155
Storrs, CT [email protected]
http://www.engr.uconn.edu/~steve(860) 486 - 4818
Material for course thanks to:Laurent MichelAggelos KiayiasRobert LeBarre
CH4p1.2
CSE4100
Syntax Analysis - ParsingSyntax Analysis - Parsing
An overview of parsing : Functions & An overview of parsing : Functions & ResponsibilitiesResponsibilities
Context Free Grammars: Concepts & TerminologyContext Free Grammars: Concepts & Terminology Writing and Designing GrammarsWriting and Designing Grammars Resolving Grammar Problems / DifficultiesResolving Grammar Problems / Difficulties Top-Down Parsing Top-Down Parsing
Recursive Descent & Predictive LL Bottom-Up ParsingBottom-Up Parsing
LR & LALR Key Issues in Error HandlingKey Issues in Error Handling Concluding Remarks/Looking AheadConcluding Remarks/Looking Ahead
CH4p1.3
CSE4100
An Overview of ParsingAn Overview of Parsing
Why are Grammars to formally describe Languages Important ?
1. Precise, easy-to-understand representations
2. Compiler-writing tools can take grammar and generate a compiler
3. allow language to be evolved (new statements, changes to statements, etc.) Languages are not static, but are constantly upgraded to add new features or fix “old” ones
ADA ADA9x, C++ Adds: Templates, exceptions,
How do grammars relate to parsing process ?
CH4p1.4
CSE4100
Parsing During CompilationParsing During Compilation
interm repres
errors
lexical analyzer
parserrest of
front end
symbol table
source program
parse treeget next
token
token
regular expressions
• also technically part or parsing
• includes augmenting info on tokens in source, type checking, semantic analysis
• uses a grammar to check structure of tokens• produces a parse tree• syntactic errors and recovery• recognize correct syntax• report errors
CH4p1.5
CSE4100
Parsing ResponsibilitiesParsing Responsibilities
Syntax Error Identification / Handling
Recall typical error types:
Lexical : Misspellings
Syntactic : Omission, wrong order of tokens
Semantic : Incompatible types
Logical : Infinite loop / recursive call
Majority of error processing occurs during syntax analysis
NOTE: Not all errors are identifiable !! Which ones?
CH4p1.6
CSE4100
Key issues in Error HandlingKey issues in Error Handling
DetectionDetection Actual Detection Finding position at which they occur
ReportingReporting Clear / accurate presentation
RecoveryRecovery How to skip over to continue and find later errors Cannot impact compilation of correct programs
CH4p1.7
CSE4100
What are some Typical Errors ?What are some Typical Errors ?#include<stdio.h>
int f1(int v)
{ int i,j=0;
for (i=1;i<5;i++)
{ j=v+f2(i) }
return j; }
int f2(int u)
{ int j;
j=u+f1(u*u)
return j; }
int main()
{ int i,j=0;
for (i=1;i<10;i++)
{ j=j+i*I; printf(“%d\n”,i);
printf("%d\n",f1(j));
return 0;
}
Which are “easy” to recover from? Which are “hard” ?
As reported by MS VC++
'f2' undefined; assuming extern returning intsyntax error : missing ';' before ‘}‘syntax error : missing ';' before ‘return‘fatal error : unexpected end of file found
CH4p1.8
CSE4100
Error Recovery StrategiesError Recovery Strategies
Panic Mode – Discard tokens until a “synchro” token is found ( end, “;”, “}”, etc. ) -- Decision of designer -- Problems: skip input miss declaration – causing more errors miss errors in skipped material -- Advantages: simple suited to 1 error per statementPhase Level – Local correction on input -- “,” ”;” – Delete “,” – insert “;” -- Also decision of designer -- Not suited to all situations -- Used in conjunction with panic mode to allow less input to be skipped
CH4p1.9
CSE4100
Error Recovery Strategies – (2)Error Recovery Strategies – (2)
Error Productions: -- Augment grammar with rules -- Augment grammar used for parser construction / generation -- Add a rule for := in C assignment statements Report error but continue compile -- Self correction + diagnostic messages -- What are other rules ? (supported by yacc)Global Correction: -- Adding / deleting / replacing symbols is chancy – may do many changes ! -- Algorithms available to minimize changes costly - key issuesWhat do you think of each approach? What approach do compilers you’ve used take ?
CH4p1.10
CSE4100
Motivating GrammarsMotivating Grammars
• Regular Expressions
Basis of lexical analysis
Represent regular languages
• Context Free Grammars
Basis of parsing
Represent language constructs
Characterize context free languages
Reg. Lang. CFLs
EXAMPLE: anbn , n 1 : Is it regular ?
CH4p1.11
CSE4100
Context Free LanguagesContext Free Languages
Regular LanguagesRegular Languages
Token StructureToken Structure
Scanner GenerationScanner Generation
Context Free Context Free LanguagesLanguages
Sentence StructureSentence Structure
Parser GenerationParser Generation
Context Free Context Free LanguagesLanguages
Reg. Reg. LanguagLanguageses
CH4p1.12
CSE4100
Context Free Grammars : Context Free Grammars : Concepts & TerminologyConcepts & Terminology
Definition: A Context Free Grammar, CFG, is described by T, NT, S, PR, where:
T: Terminals / tokens of the language
NT: Non-terminals to denote sets of strings generatable by the grammar & in the language
S: Start symbol, SNT, which defines all strings of the language
PR: Production rules to indicate how T and NT are combines to generate valid strings of the language.
PR: NT (T | NT)*
Like a Regular Expression / DFA / NFA, a Context Free Grammar is a mathematical model !
CH4p1.13
CSE4100
Context Free Grammars : A First LookContext Free Grammars : A First Look
assign_stmt id := expr ;
expr term operator term
term id
term real
term integer
operator +
operator -
What do “BLUE” symbols represent?
Derivation: A sequence of grammar rule applications and substitutions that transform a starting non-term into a collection of terminals / tokens.
Simply stated: Grammars / production rules allow us to “rewrite” and “identify” correct syntax.
What do “BLACK” symbols represent?
CH4p1.14
CSE4100
How is Grammar Used ?How is Grammar Used ?
Given the rules on the previous slide, suppose
id := real + int;
is input. Is it syntactically correct? How do we know?
expr is represented as: expr term operator term
Is this accurate / complete?
expr expr operator term
expr term
How does this affect the derivations which are possible?
CH4p1.15
CSE4100
Derivation and Parse TreeDerivation and Parse Tree
Let’s derive: id := id + real – integer ;
What’s its parse tree ?
CH4p1.16
CSE4100
Derivation ExampleDerivation Example
Let’s derive: Let’s derive: id := id + real – integer ;id := id + real – integer ;assign_stmt assign_stmt → id := expr ;
→ id := expr ; expr → expr operator term
→ id := expr operator term; expr → expr operator term
→ id := expr operator term operator term; expr → term
→ id := term operator term operator term; term → id
→ id := id operator term operator term; operator → +
→ id := id + term operator term; term → real
→ id := id + real operator term; operator → -
→ id := id + real - term; term → integer
→ id := id + real - integer; assign_stmt → id := expr ;expr → expr operator termexpr → termterm → idterm → realterm → integeroperator → +operator → -
CH4p1.17
CSE4100
Example GrammarExample Grammar
expr expr op expr
expr ( expr )
expr - expr
expr id
op +
op -
op *
op /
op
Black : NT
Blue : T
expr : S
9 Production rules
To simplify / standardize notation, we offer a synopsis of terminology.
CH4p1.18
CSE4100
Example Grammar - TerminologyExample Grammar - Terminology
Terminals: a,b,c,+,-,punc,0,1,…,9, blue strings
Non Terminals: A,B,C,S, black strings
T or NT: X,Y,Z
Strings of Terminals: u,v,…,z in T*
Strings of T / NT: , in ( T NT)*
Alternatives of production rules:
A 1; A 2; …; A k; A 1 | 2 | … | 1
First NT on LHS of 1st production rule is designated as start symbol !
E E A E | ( E ) | -E | id
A + | - | * | / |
CH4p1.19
CSE4100
Grammar ConceptsGrammar Concepts
A step in a derivation is zero or one action that replaces a NT with the RHS of a production rule.
EXAMPLE: E -E (the means “derives” in one step) using the production rule: E -E
EXAMPLE: E E A E E * E E * ( E )
DEFINITION: derives in one step
derives in one step
derives in zero steps
+
*
EXAMPLES: A if A is a production rule
1 2 … n 1 n ; for all
If and then
* *
**
CH4p1.20
CSE4100
How does this relate to Languages?How does this relate to Languages?
Let G be a CFG with start symbol S. Then S W (where W has no non-terminals) represents the language generated by G, denoted L(G). So WL(G) S W.
+
+
W : is a sentence of G
When S (and may have NTs) it is called a sentential form of G.
*
EXAMPLE: id * id is a sentence
Here’s the derivation:
E E A E E * E id * E id * id
E id * idSentential forms
CH4p1.21
CSE4100
lm
Leftmost and Rightmost DerivationsLeftmost and Rightmost Derivations
Leftmost: Replace the leftmost non-terminal symbol
E E A E id A E id * E id * id
Rightmost: Replace the leftmost non-terminal symbol
E E A E E A id E * id id * idrm
lmlmlmlm
rmrmrm
Important Notes: A
If A , what’s true about ?
If A , what’s true about ?rm
Derivations: Actions to parse input can be represented pictorially in a parse tree.
CH4p1.22
CSE4100
Examples of LM / RM DerivationsExamples of LM / RM Derivations
E E A E | ( E ) | -E | id
A + | - | * | / |
A leftmost derivation of : id + id * id
A rightmost derivation of : id + id * id
CH4p1.23
CSE4100
Derivations & Parse TreeDerivations & Parse Tree
E
EE A
E
EE A
*
E
EE A
id *
E
EE A
id id*
E * E
E E A E
id * E
id * id
CH4p1.24
CSE4100
Parse Trees and DerivationsParse Trees and Derivations
Consider the expression grammar:
E E+E | E*E | (E) | -E | id
Leftmost derivations of id + id * id
E E + E E + E id + E
E
EE +
id
E
EE *
id + E id + E * E
E
EE +
id
E
EE +
CH4p1.25
CSE4100
Parse Tree & Derivations - continuedParse Tree & Derivations - continued
E
EE *
id + E * E id + id * E
E
EE +
id
id
id + id * E id + id * id E
EE *
E
EE +
id
idid
CH4p1.26
CSE4100
Alternative Parse Tree & DerivationAlternative Parse Tree & Derivation
E E * E
E + E * E
id + E * E
id + id * E
id + id * id
E
E E+
E
E E*
id
id id
WHAT’S THE ISSUE HERE ?
CH4p1.27
CSE4100
Example Pascal Grammar in YaccExample Pascal Grammar in Yacc
%term T_AND T_ARRAY T_BEGIN T_CASE T_CONST T_DIV T_DO T_RANGE T_TO T_ELSE T_END T_FILE T_FOR T_FUNCTION /* T_GOTO */ T_IDENTIFIER T_IF T_IN T_INTEGER T_LABEL T_MOD T_NOT T_REAL T_OF T_OR /* T_PACKED */ T_NIL T_PROCEDURE T_PROGRAM T_RECORD T_REPEAT T_SET T_STRING T_THEN T_DOWNTO T_TYPE T_UNTIL T_VAR T_WHILE /* T_WITH */
T_COMMA T_PERIOD T_ASSIGN T_COLON T_LPAREN T_RPAREN T_LBRACK T_RBRACK T_UPARROW T_SEMI
%binary T_LT T_EQ T_GT T_GE T_LE T_NE T_IN%left T_PLUS T_MINUS T_OR%left UNARYSIGN%left T_MULT T_RDIV T_DIV T_MOD T_AND%left T_NOT
%%
CH4p1.28
CSE4100
Example Pascal Grammar in YaccExample Pascal Grammar in Yacc
Start : Program_Header Declarations Block T_PERIOD
Program_Header : T_PROGRAM T_IDENTIFIER T_LPAREN Id_List T_RPAREN T_SEMI
Block : T_BEGIN Stt_List T_END
Declarations : Const_Def_Part Type_Def_Part Var_Decl_Part Routine_Decl_Part
Const_Def_Part : T_CONST Const_Def_List | /* epsilon */ ;
Const_Def_List : Const_Def | Const_Def_List Const_Def ;
Const_Def : T_IDENTIFIER T_EQ Constant T_SEMI
Type_Def_Part : T_TYPE Type_Def_List | /* epsilon */
Type_Def_List : Type_Def | Type_Def_List Type_Def
CH4p1.29
CSE4100
Example Pascal Grammar in YaccExample Pascal Grammar in Yacc
Type_Def : T_IDENTIFIER T_EQ Type T_SEMI
Var_Decl_Part : T_VAR Var_Decl_List | /* epsilon */
Var_Decl_List : Var_Decl_List Var_Decl | Var_Decl
Var_Decl : Id_List T_COLON Type T_SEMI
Routine_Decl_Part : Routine_Decl_Part Routine_Decl | /* epsilon */
Routine_Decl : Routine_Head T_IDENTIFIER T_SEMI | Routine_Head Declarations Block T_SEMI
Routine_Head : T_PROCEDURE T_IDENTIFIER Params T_SEMI | T_FUNCTION T_IDENTIFIER Params Function_Type T_SEMI
Params : T_LPAREN Params_List T_RPAREN | /* epsilon */
Param_Group : Id_List T_COLON Type | T_VAR Id_List T_COLON Type
CH4p1.30
CSE4100
Example Pascal Grammar in YaccExample Pascal Grammar in Yacc
Function_Type : T_COLON Type | /* epsilon */
Params_List : Param_Group | Params_List T_SEMI Param_Group
Constant : T_STRING | Number | T_PLUS Number | T_MINUS Number
Number : T_IDENTIFIER | T_INTEGER | T_REAL
Const_List : Constant | Const_List T_COMMA Constant
Type : Simple_Type | T_UPARROW T_IDENTIFIER | Structured_Type
Simple_Type : T_IDENTIFIER | T_LPAREN Id_List T_RPAREN | Constant T_RANGE Constant
Structured_Type : T_ARRAY T_LBRACK Simple_Type_List T_RBRACK T_OF Type | T_SET T_OF Simple_Type | T_RECORD Field_List T_END
Simple_Type_List : Simple_Type | Simple_Type_List T_COMMA Simple_Type
CH4p1.31
CSE4100
Example Pascal Grammar in YaccExample Pascal Grammar in Yacc
Field_List : Fixed_Part /* Variant_part */
Fixed_Part : Field | Fixed_Part T_SEMI Field
Field : /* epsilon */ | Id_List T_COLON Type
Variant_part : / * epsilon * / | T_CASE T_IDENTIFIER T_OF Variant_List | T_CASE T_IDENTIFIER T_COLON T_IDENTIFIER T_OF Variant_List
Variant_List : Variant | Variant_List T_SEMI Variant
Variant : / * epsilon * / | Const_List T_COLON T_LPAREN Field_List T_RPAREN
Stt_List : Statement | Stt_List T_SEMI Statement
Case_Stt_List : Case_Statement | Case_Stt_List T_SEMI Case_Statement
Case_Statement : Const_List T_COLON Statement | /* epsilon */
CH4p1.32
CSE4100
Example Pascal Grammar in YaccExample Pascal Grammar in Yacc
Statement : /* epsilon */ | T_IDENTIFIER | T_IDENTIFIER T_LPAREN Expr_List T_RPAREN | Variable T_ASSIGN Expression | T_BEGIN Stt_List T_END | T_CASE Expression T_OF Case_Stt_List T_END | T_WHILE Expression T_DO Statement | T_REPEAT Stt_List T_UNTIL Expression | T_FOR Variable T_ASSIGN Expression T_TO Expression T_DO
Statement | T_FOR Variable T_ASSIGN Expression T_DOWNTO Expression T_DO
Statement | T_IF Expression T_THEN Statement | T_IF Expression T_THEN Statement T_ELSE Statement
Expression : Expression relop Expression %prec T_LT | T_PLUS Expression %prec UNARYSIGN | T_MINUS Expression %prec UNARYSIGN | Expression addop Expression %prec T_PLUS | Expression divop Expression %prec T_MULT | T_NIL | T_STRING | T_INTEGER | T_REAL | Variable | T_IDENTIFIER T_LPAREN Expr_List T_RPAREN | T_LPAREN Expression T_RPAREN | negop Expression %prec T_NOT | T_LBRACK Element_List T_RBRACK | T_LBRACK T_RBRACK
CH4p1.33
CSE4100
Example Pascal Grammar in YaccExample Pascal Grammar in Yacc
Element_List : Element | Element_List T_COMMA Element
Element : Expression | Expression T_RANGE Expression
Variable : T_IDENTIFIER | Variable T_LBRACK Expr_List T_RBRACK | Variable T_PERIOD T_IDENTIFIER | Variable T_UPARROW
Expr_List : Expression | Expr_List T_COMMA Expression
relop : T_EQ | T_LT | T_GT | T_NE | T_LE | T_GE | T_IN
addop : T_PLUS | T_MINUS | T_OR
divop : T_MULT | T_RDIV | T_DIV | T_MOD | T_AND ;
negop : T_NOT
CH4p1.34
CSE4100
Resolving Grammar Problems/DifficultiesResolving Grammar Problems/Difficulties
Regular Expressions : Basis of Lexical Analysis
Reg. Expr. generate/represent regular languages
Reg. Languages smallest, most well defined class
of languages
Context Free Grammars: Basis of Parsing
CFGs represent context free languages
CFLs contain more powerful languages
Reg. Lang. CFLsanbn – CFL that’s not regular!
Try to write DFA/NFA for it !
CH4p1.35
CSE4100
Resolving Problems/Difficulties – (2)Resolving Problems/Difficulties – (2)
Since Reg. Lang. Context Free Lang., it is possible to go from reg. expr. to CFGs via NFA.
start0 3b21 ba
a
b
Recall: (a | b)*abb
CH4p1.36
CSE4100
Resolving Problems/Difficulties – (3)Resolving Problems/Difficulties – (3)
Construct CFG as follows:
1. Each State I has non-terminal Ai : A0, A1, A2, A3
2. If then Ai a Aj
3. If then Ai Aj
4. If I is an accepting state, Ai : A3
5. If I is a starting state, Ai is the start symbol : A0
i ja
i jb
: A0 aA0, A0 aA1
: A0 bA0, A1 bA2
: A2 bA3
T={a,b}, NT={A0, A1, A2, A3}, S = A0
PR ={ A0 aA0 | aA1 | bA0 ; A1 bA2 ; A2 3A3 ; A3 }
CH4p1.37
CSE4100
How Does This CFG Derive Strings ?How Does This CFG Derive Strings ?
start0 3b21 ba
a
b
vs. A0 aA0, A0 aA1
A0 bA0, A1 bA2
A2 bA3, A3
How is abaabb derived in each ?
CH4p1.38
CSE4100
Regular Expressions vs. CFGsRegular Expressions vs. CFGs
Regular expressions for lexical syntax
1. CFGs are overkill, lexical rules are quite simple and straightforward
2. REs – concise / easy to understand
3. More efficient lexical analyzer can be constructed
4. RE for lexical analysis and CFGs for parsing promotes modularity, low coupling & high cohesion. Why?
CFGs : Match tokens “(“ “)”, begin / end, if-then-else, whiles, proc/func calls, …
Intended for structural associations between tokens !
Are tokens in correct order ?
CH4p1.39
CSE4100
Resolving Grammar Difficulties : Resolving Grammar Difficulties : MotivationMotivation
1. Humans write / develop grammars
2. Different parsing approaches have different needs
Top-Down vs. Bottom-Up
• For: 1 remove “errors”
• For: 2 put / redesign grammar- ambiguity- -moves- cycles- left recursion- left factoring
Grammar Problems
LL(k)Recursive
LR(k)LR(k)LALR(k)LALR(k)
CH4p1.40
CSE4100
Resolving Problems: Ambiguous Resolving Problems: Ambiguous GrammarsGrammars
Consider the following grammar segment:stmt if expr then stmt
| if expr then stmt else stmt
| other (any other statement)
What’s problem here ?
Consider the Program:
Else must match to previous then.
Structure indicates parse subtree for expression.
if e1
then if e2
then s1
else s2
CH4p1.41
CSE4100
Resulting Parse TreeResulting Parse Tree
Easy caseEasy case Else must match to previous then.
if e1
then s1
else if e2
then s2
else s3
CH4p1.42
CSE4100
Example : What Happens with this string?Example : What Happens with this string?
If E1 then if E2 then S1 else S2
How is this parsed ?
if E1 then
if E2 then
S1
else
S2
if E1 then
if E2 then
S1
else
S2
vs.
What’s the issue here ?
CH4p1.43
CSE4100
Parse Trees for ExampleParse Trees for Example
Form 1: Form 2:
What’s the issue here ?
if e1
then if e2
then s1
else s2
CH4p1.44
CSE4100
Removing AmbiguityRemoving Ambiguity
Take Original Grammar:
stmt if expr then stmt
| if expr then stmt else stmt
| other (any other statement)
Revise to remove ambiguity:
stmt matched_stmt | unmatched_stmt
matched_stmt if expr then matched_stmt else matched_stmt | other
unmatched_stmt if expr then stmt
| if expr then matched_stmt else unmatched_stmt
How does this grammar work ?
CH4p1.45
CSE4100
Resolving Difficulties : Left RecursionResolving Difficulties : Left Recursion
A left recursive grammar has rules that support the derivation : A A, for some .+
Top-Down parsing can’t reconcile this type of grammar, since it could consistently make choice which wouldn’t allow termination.
A A A A … etc. A A |
Take left recursive grammar:
A A | To the following:
A’ A’
A’ A’ |
CH4p1.46
CSE4100
Why is Left Recursion a Problem ?Why is Left Recursion a Problem ?
Consider: E E + T | T T T * F | F F ( E ) | id
Derive : id + id + id
E E + T
How can left recursion be removed ?
E E + T | T What does this generate?
E E + T T + T
E E + T E + T + T T + T + T
How does this build strings ?
What does each string have to start with ?
CH4p1.47
CSE4100
Resolving Difficulties : Left Recursion (2)Resolving Difficulties : Left Recursion (2)
Informal Discussion:
Take all productions for A and order as:
A A1 | A2 | … | Am | 1 | 2 | … | n
Where no i begins with A.
Now apply concepts of previous slide:
A 1A’ | 2A’ | … | nA’
A’ 1A’ | 2A’ | … | m A’ | For our example: E E + T | T
T T * F | F
F ( E ) | id
E TE’ E’ + TE’ | T FT’
T’ * FT’ | F ( E ) | id
CH4p1.48
CSE4100
Resolving Difficulties : Left Recursion (3)Resolving Difficulties : Left Recursion (3)
Problem: If left recursion is two-or-more levels deep, this isn’t enough
S Aa | b
A Ac | Sd | S Aa Sda
Algorithm:Input: Grammar G with no cycles or -productions
Output: An equivalent grammar with no left recursion
1. Arrange the non-terminals in some order A1,A2,…An
2. for i := 1 to n do begin
for j := 1 to i – 1 do begin
replace each production of the form Ai Aj
by the productions Ai 1 | 2 | … | k
where Aj 1|2|…|k are all current Aj productions; end
eliminate the immediate left recursion among Ai productions end
CH4p1.49
CSE4100
Using the AlgorithmUsing the Algorithm
Apply the algorithm to: A1 A2a | b
A2 A2c | A1d |
i = 1
For A1 there is no left recursion
i = 2
for j=1 to 1 do
Take productions: A2 A1 and replace with
A2 1 | 2 | … | k |
where A1 1 | 2 | … | k are A1 productions
in our case A2 A1d becomes A2 A2ad | bdWhat’s left: A1 A2a | b
A2 A2 c | A2 ad | bd | Are we done ?
CH4p1.50
CSE4100
Using the Algorithm (2)Using the Algorithm (2)
No ! We must still remove A2 left recursion !
A1 A2a | b
A2 A2 c | A2 ad | bd |
Recall:
A A1 | A2 | … | Am | 1 | 2 | … | n
A 1A’ | 2A’ | … | nA’
A’ 1A’ | 2A’ | … | m A’ |
Apply to above case. What do you get ?
CH4p1.51
CSE4100
Removing Difficulties : Removing Difficulties : -Moves-Moves
Very Simple: A and B uAv implies add B uv to the grammar G.
Why does this work ?
E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id
Examples:
A1 A2 a | b
A2 bd A2’ | A2’
A2’ c A2’ | bd A2’ |
CH4p1.52
CSE4100
Removing Difficulties : CyclesRemoving Difficulties : Cycles
How would cycles be removed ?
S SS | ( S ) |
Has: S SS S
S
CH4p1.53
CSE4100
Removing Difficulties : Left FactoringRemoving Difficulties : Left Factoring
Problem : Uncertain which of 2 rules to choose:
stmt if expr then stmt else stmt
| if expr then stmt
When do you know which one is valid ?
What’s the general form of stmt ?
A 1 | 2 : if expr then stmt
1: else stmt 2 :
Transform to:
A A’
A’ 1 | 2
EXAMPLE:
stmt if expr then stmt rest
rest else stmt |
CH4p1.54
CSE4100
Resolving Grammar Problems : Resolving Grammar Problems : Another LookAnother Look
• Forcing Matching Within Grammar
- if-then-else else must go with most recent if
stmt if expr then stmt | if expr then stmt else stmt | other
stmt matched_stmt | unmatched_stmt
matched_stmt if expr then matched_stmt else matched_stmt | other
unmatched_stmt if expr then stmt
| if expr then matched_stmt else unmatched_stmt
Leads to:
CH4p1.55
CSE4100
Resolving Grammar Problems : Resolving Grammar Problems : Another Look (2)Another Look (2)
• Left Factoring - Know correct choice in advance !
where_queries where are_does declarations of symbol
| where are_does “search_option” appear
Rules given in Assignment or
where_queries where rest_where
rest_where are declarations of symbol
| does “search_option” appear
CH4p1.56
CSE4100
Resolving Grammar Problems : Resolving Grammar Problems : Another Look (3)Another Look (3)
• Left Recursion - Avoid infinite loop when choosing rules
A A | A A’
A’ A’ | EXAMPLE:
E E + T | T
T T * F | F
F ( E ) | id
E TE’ E’ + TE’ | T FT’
T’ * FT’ | F ( E ) | id
If we use the original production rules on input: id + id (with leftmost):
E E + T E + T + T E + T + T + T T + T + T + … + T
If we use the transformed production rules on input: id + id (with leftmost):
E TE’ T + TE’ id + TE’ id + id E’ id + id
*
not assured
**
CH4p1.57
CSE4100
Resolving Grammar Problems : Resolving Grammar Problems : Another Look (4)Another Look (4)
• Removing -Moves
E TE’ E’ + TE’ |
E TE TE’E’ + TE’E’ + T
Is there still a problem ?
CH4p1.58
CSE4100
Resolving Grammar ProblemsResolving Grammar Problems
Note: Not all aspects of a programming language can be represented by context free grammars / languages.
Examples:
1. Declaring ID before its use
2. Valid typing within expressions (Type Rules)
3. Parameters in definition vs. in call
4. Method Overloading/hiding
These features are call context-sensitive and define yet another language class, CSL.
Reg. Lang. CFLs CSLs
CH4p1.59
CSE4100
Context-Sensitive Languages - ExamplesContext-Sensitive Languages - Examples
Examples:
L1 = { wcw | w is in (a | b)* } : Declare before use
L2 = { an bm cn dm | n 1, m 1 }
an bm : formal parameter
cn dm : actual parameter
What does it mean when L is context-sensitive ?
CH4p1.60
CSE4100
How do you show a Language is a CFL?How do you show a Language is a CFL?
L3 = { w c wR | w is in (a | b)* }
L4 = { an bm cm dn | n 1, m 1 }
L5 = { an bn cm dm | n 1, m 1 }
L6 = { an bn | n 1 }
CH4p1.61
CSE4100
SolutionsSolutions
L3 = { w c wR | w is in (a | b)* }
L4 = { an bm cm dn | n 1, m 1 }
L5 = { an bn cm dm | n 1, m 1 }
L6 = { an bn | n 1 }
S a S a | b S b | c
S a S d | a A d
A b A c | bc
S XY
X a X b | ab
Y c Y d | cd
S a S b | ab