1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a...
-
date post
21-Dec-2015 -
Category
Documents
-
view
234 -
download
4
Transcript of 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a...
![Page 1: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/1.jpg)
1
Lexical Analysis
Cheng-Chia Chen
![Page 2: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/2.jpg)
2
Outline
1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens3. Regular expressions (RE)4. Use regular expressions in lexical specification5. Finite automata (FA)
» DFA and NFA» from RE to NFA» from NFA to DFA» from DFA to optimized DFA
6. Lexical-analyzer generators
![Page 3: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/3.jpg)
3
1. The goal and niche of lexical analysis
Source Tokens
Interm.Language
Lexicalanalysis
Parsing
CodeGen.
MachineCode
Optimization
(token stream)
(char stream)
Goal of lexical analysis: breaking the input into individual words or “tokens”
![Page 4: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/4.jpg)
4
Lexical Analysis What do we want to do? Example:
if (i == j)Z = 0;
elseZ = 1;
The input is just a sequence of characters:
\tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1;
Goal: Partition input string into substrings» And determine the categories (token types) to which
the substrings belong
![Page 5: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/5.jpg)
5
2. Lexical Tokens
What’s a token ? Token attributes Normal token and special tokens Example of tokens and special tokens.
![Page 6: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/6.jpg)
6
What’s a token? a sequence of characters that can be treated as a unit in
the grammar of a PL. Output of lexical analysis is a stream of tokens
Tokens are partitioned into categories called token types. ex:» In English:
– book, students, like, help, strong,… : token- noun, verb, adjective, … : token type
» In a programming language:– student, var34, 345, if, class, “abc” … : token– ID, Integer, IF, WHILE, Whitespace, … : token type
Parser relies on the token type instead of token distinctions to analyze:» var32 and var1 are treated the same,» var32(ID), 32(Integer) and if(IF) are treated differently.
![Page 7: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/7.jpg)
7
Token attributes token type :
» category of the token; used by syntax analysis.» ex: identifier, integer, string, if, plus, …
token value : » semantic value used in semantic analysis.» ex: [integer, 26], [string, “26”]
token lexeme (member, text): » textual content of a token» [while, “while”], [identifier, “var23”], [plus, “+”],…
positional information: » start/end line/position of the textual content in the source
program.
![Page 8: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/8.jpg)
8
Notes on Token attributes
Token types affect syntax analysis Token values affect semantic analysis lexeme and positional information affect error
handling Only token type information must be supplied by the
lexical analyzer. Any program performing lexical analysis is called a
scanner (lexer, lexical analyzer).
![Page 9: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/9.jpg)
9
Aspects of Token types Language view: A token type is the set of all
lexemes of all its token instances. » ID = {a, ab, … } – {if, do,…}.» Integer = { 123, 456, …}» IF = {if}, WHILE={while}; » STRING={“abc”, “if”, “WHILE”,…}
Pattern (regular expression): a rule defining the language of all instances of a token type.» WHILE: w h i l e» ID: letter (letters | digits )*» ArithOp: + | - | * | /
![Page 10: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/10.jpg)
10
Lexical Analyzer: Implementation
An implementation must do two things:
1. Recognize substrings corresponding to lexemes of tokens
2. Determine token attributes1. type is necessary
2. value depends on the type/application,
3. lexeme/positional information depends on applications (eg: debug or not).
![Page 11: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/11.jpg)
11
Example input lines:
\tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1;
Token-lexeme pairs returned by the lexer:» [Whitespace, “\t”]» [if, - ]» [OpenPar, “(“] » [Identifier, “i”]» [Relation, “==“]» [Identifier, “j”]» …
![Page 12: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/12.jpg)
12
Normal Tokens and special Tokens
Kinds of tokens» normal tokens: needed for later syntax
analysis and must be passed to parser. » special tokens
– skipped tokens (or nontoken): – do not contribute to parsing,– discarded by the scanner.
Examples: Whitespace, Comments» why need them ?
Question: What happens if we remove all whitespace and all comments prior to scanning?
![Page 13: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/13.jpg)
13
Lexical Analysis in FORTRAN
FORTRAN rule: Whitespace is insignificant
E.g., VAR1 is the same as VA R1
Footnote: FORTRAN whitespace rule motivated by inaccuracy of punch card operators
![Page 14: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/14.jpg)
14
A terrible design! Example
Consider» DO 5 I = 1,25» DO 5 I = 1.25
The first is DO 5 I = 1 , 25 The second is DO5I = 1.25
Reading left-to-right, cannot tell if DO5I is a variable or DO stmt. until after “,” is reached
![Page 15: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/15.jpg)
15
Lexical Analysis in FORTRAN. Lookahead.
Two important points:
1. The goal is to partition the string. This is implemented by reading left-to-right, recognizing one token at a time
2. “Lookahead” may be required to decide where one token ends and the next token begins
» Even our simple example has lookahead issues
i vs. if = vs. ==
![Page 16: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/16.jpg)
16
Some token types of a typical PL
Type Examples
ID foo n14 last
NUM 73 0 00 515 082
REAL 66.1 .5 10. 1e67 1.5e-10
IF if
COMMA ,
NOTEQ !=
LPAREN (
RPAREN )
![Page 17: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/17.jpg)
17
Some Special Tokens
1,5 are skipped. 2,3 need preprocess, 4 need to be expanded.
1. comment /* … */
// …
2. preprocessor directive
#include <stdio.h>
3. #define NUMS 5,6
4. macro NUMS
5.blank,tabs,newlines \t \n
![Page 18: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/18.jpg)
18
3. Regular expressions and Regular Languages
![Page 19: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/19.jpg)
19
The geography of lexical tokens
ID: var1, last5,…
REAL12.35
2.4 e –10…
NUM23 56 0 000
IF:ifLPAREN
(
RPAREN)
special tokens : \t \n /* … */
the set of all strings
![Page 20: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/20.jpg)
20
Issues
Definition problem:» how to define (formally specify) the set of
strings(tokens) belonging to a token type ?» => regular expressions
(Recognition problem) » How to determine which set (token type) a
input string belongs to?» => DFA!
![Page 21: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/21.jpg)
21
Languages
Def. Let be a set of symbols (or characters). A language over is a set of strings of characters
drawn from ( is called the alphabet )
![Page 22: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/22.jpg)
22
Examples of Languages Alphabet = English
characters Language = English
words
Not every string on English characters is an English word» likes, school,…» beee,yykk,…
Alphabet = ASCII Language = C programs
Note: ASCII character set is different from English character set
![Page 23: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/23.jpg)
23
Regular Expressions A language (metaLanguage) for representing (or defining)
languages(sets of words) Definition: If is an alphabet. The set of regular
expression(RegExpr) over is defined recursively as follows:» (Atomic RegExpr) : 1. any symbol c is a RegExpr.» 2. (empty string) is a RegExpr.» (Compound RegExpr): if A and B are RegExpr, then so
are
3. (A | B) (alternation)
4. (A B) (concatenation)
5. A* (repetition)
![Page 24: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/24.jpg)
24
Semantics (Meaning) of regular expressions
For each regular expression A, we use L(A) to express the language defined by A.
I.e. L is the function:
L: RegExpr() the set of Languages over
with
L(A) = the language denoted by RegExpr A
The meaning of RegExpr can be made clear by explicitly defining L.
![Page 25: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/25.jpg)
25
Atomic Regular Expressions
1. Single symbol: c
L(c) = { c } (for any c ) 2. Epsilon (empty string):
L() = {}
![Page 26: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/26.jpg)
26
Compound Regular Expressions
3. alternation ( or union or choice)
L( (A | B) ) = { s | s L(A) or s L(B) } 4. Concatenation: AB (where A and B are reg. exp.)
L((A B)) =L(A) L(B)
=def { | L(A) and L(B) } Note:
» Parentheses enclosing (A|B) and (AB) can be omitted if there is no worries of confusion.
» MN (set concatenation) and (string concatenation) will be abbreviated to AB and , respectively.
» AA and L(A) L(A) are abbreviated as A2 and L(A)2, respectively.
![Page 27: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/27.jpg)
27
Examples
if | then | else { if, then, else} 0 | 1 | … | 9 { 0, 1, …, 9 } (0 | 1) (0 | 1) { 00, 01, 10, 11 }
![Page 28: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/28.jpg)
28
More Compound Regular Expressions
5. repetition ( or Iteration): A*
L(A*) = { } L(A) L(A)L(A) L(A)3 …
Examples:» 0* : {, 0, 00, 000, …}» 10* : strings starting with 1 and followed by 0’s.» (0|1)* 0 : Binary even numbers.» (a|b)*aa(a|b)*: strings of a’s and b’s containing
consecutive a’s.» b*(abb*)*(a|) : strings of a’s and b’s with no
consecutive a’s.
![Page 29: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/29.jpg)
29
Example: Keyword
» Keyword: else or if or begin …
else | if | begin | …
![Page 30: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/30.jpg)
30
Example: Integers
Integer: a non-empty string of digits
( 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ) ( 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 )*
problem: reuse complicated expression improvement: define intermediate reg. expr.
digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
number = digit digit*
Abbreviation: A+ = A A*
![Page 31: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/31.jpg)
31
Regular Definitions
Names for regular expressions
» d1 =r1
» d2 =r2
» ...
» dn =rn
where ri over alphabet {d1, d2, ..., d i-1}
note: Recursion is not allowed.
![Page 32: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/32.jpg)
32
Example
» Identifier: strings of letters or digits, starting with a letter
digit = 0 | 1 | ... | 9
letter = A | … | Z | a | … | z
identifier = letter (letter | digit) *
» Is (letter* | digit*) the same ?
![Page 33: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/33.jpg)
33
Example: Whitespace
Whitespace: a non-empty sequence of blanks, newlines, CRNL and tabs
WS = (\ | \t | \n | \r\n )+
![Page 34: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/34.jpg)
34
Example: Email Addresses
Consider [email protected]
= letters [ { ., @ }
name = letter+
address = name ‘@’ name (‘.’ name)*
![Page 35: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/35.jpg)
35
Notational Shorthands
One or more instances» r+ = r r*» r* = (r+ |
Zero or one instance» r? = (r |
Character classes» [abc] = a | b | c» [a-z] = a | b | ... | z» [ac-f] = a | c | d | e | f» [^ac-f] = – [ac-f]
![Page 36: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/36.jpg)
36
Summary
Regular expressions describe many useful languages
Regular languages are a language specification» We still need an implementation
problem: Given a string s and a rexp R, is
( )?s L R
![Page 37: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/37.jpg)
37
4. Use Regular expressions in lexical
specification
![Page 38: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/38.jpg)
38
Goal
Specifying lexical structure using regular expressions
![Page 39: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/39.jpg)
39
Regular Expressions in Lexical Specification
Last lecture: the specification of all lexemes in a token type using regular expression.
But we want a specification of all lexemes of all token types in a programming language.» Which may enable us to partition the input into
lexemes
We will adapt regular expressions to this goal
![Page 40: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/40.jpg)
40
Regular Expressions => Lexical Spec. (1)
1. Select a set of token types• Number, Keyword, Identifier, ...
2. Write a rexp for the lexemes of each token type• Number = digit+
• Keyword = if | else | …• Identifier = letter (letter | digit)*• LParen = ‘(‘• …
![Page 41: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/41.jpg)
41
Regular Expressions => Lexical Spec. (2)
3. Construct R, matching all lexemes for all tokens
R = Keyword | Identifier | Number | …
= R1 | R2 | R3 + …
Facts: If s L(R) then s is a lexeme
» Furthermore s L(Ri) for some “i”
» This “i” determines the token type that is reported
![Page 42: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/42.jpg)
42
Regular Expressions => Lexical Spec. (3)
4. Let the input be x1…xn
(x1 ... xn are symbols in the language alphabet)• For 1 i n check
x1…xi L(R) ?
5. It must be that
x1…xi L(Rj) for some j
6. Remove t = x1…xi from input if t is normal token, then pass it to the parser // else it is whitespace or comments, just skip it!7.go to (4)
![Page 43: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/43.jpg)
43
Ambiguities (1)
There are ambiguities in the algorithm
How much input is used? What if
– x1…xi L(R) and also
– x1…xK L(R) for some i != k.
Rule: Pick the longest possible substring
» The longest match principle !!
![Page 44: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/44.jpg)
44
Ambiguities (2) Which token is used? What if
– x1…xi L(Rj) and also– x1…xi L(Rk)
Rule: use rule listed first (j iff j < k)» Earlier rule first!
Example: » R1 = Keyword and R2 = Identifier» “if” matches both. » Treats “if” as a keyword not an identifier
![Page 45: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/45.jpg)
45
Error Handling
What if
No rule matches a prefix of input ? Problem: Can’t just get stuck … Solution:
» Write a rule matching all “bad” strings» Put it last
Lexer tools allow the writing of:
R = R1 | ... | Rn | Error
» Token Error matches if nothing else matches
![Page 46: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/46.jpg)
46
Summary Regular expressions provide a concise notation for
string patterns Use in lexical analysis requires small extensions
» To resolve ambiguities» To handle errors
Efficient algorithms exist (next)» Require only single pass over the input» Few operations per character (table lookup)
![Page 47: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/47.jpg)
47
5. Finite Automata Regular expressions = specification Finite automata = implementation A finite automaton consists of
» An input alphabet » A finite set of states S» A start state n» A set of accepting states F S» A set of transitions state input state» If the automata is for recognizing a token type ,
then this type should be associated with the machine.
![Page 48: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/48.jpg)
48
Finite Automata
Transition
s1 a s2
Is read
In state s1 on input “a” go to state s2
If end of input (or no transition possible)» If in accepting state => accept» Otherwise => reject
![Page 49: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/49.jpg)
49
Finite Automata State Transition Graphs
A state
• The start state
• An accepting state
• A transitiona
[ T is the tokenType ]T
![Page 50: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/50.jpg)
50
A Simple Example
A finite automaton that accepts only “1”
1
![Page 51: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/51.jpg)
51
Another Simple Example
A finite automaton accepting any number of 1’s followed by a single 0
Alphabet: {0,1}
0
1
accepted input: 1*0
![Page 52: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/52.jpg)
52
And Another Example
Alphabet {0,1} What language does this recognize?
0
1
0
1
0
1
accepted inputs: to be answered later!
![Page 53: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/53.jpg)
53
And Another Example
Alphabet still { 0, 1 }
The operation of the automaton is not completely defined by the input» On input “11” the automaton could be in either
state
1
1
![Page 54: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/54.jpg)
54
Epsilon Moves
Another kind of transition: -moves
• Machine can move from state A to state B without reading input
A B
![Page 55: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/55.jpg)
55
Deterministic and Nondeterministic Automata
Deterministic Finite Automata (DFA)» One transition per input per state » No -moves
Nondeterministic Finite Automata (NFA)» Can have multiple transitions for one input in a
given state» Can have -moves
Finite automata can have only a finite number of states.
![Page 56: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/56.jpg)
56
Execution of Finite Automata
A DFA can take only one path through the state graph» Completely determined by input
NFAs can choose» Whether to make -moves» Which of multiple transitions for a single input
to take
![Page 57: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/57.jpg)
57
Acceptance of NFAs
An NFA can get into multiple states
• Input:
0
1
1
0
1 0 1
• Rule: NFA accepts if it can get in a final state
![Page 58: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/58.jpg)
58
Acceptance of a Finite Automata
A FA (DFA or NFA) accepts an input string s iff there is some path in the transition diagram from the start state to some final state such that the edge labels along this path spell out s
![Page 59: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/59.jpg)
59
NFA vs. DFA (1)
NFAs and DFAs recognize the same set of languages (regular languages)
DFAs are easier to implement» There are no choices to consider
![Page 60: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/60.jpg)
60
NFA vs. DFA (2)
For a given language the NFA can be simpler than the DFA
01
0
0
01
0
1
0
1
NFA
DFA
• DFA can be exponentially larger than NFA
![Page 61: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/61.jpg)
61
Operations on NFA states
-closure(s): set of NFA states reachable from NFA state s on -transitions alone
-closure(S): set of NFA states reachable from some NFA state s in S on -transitions alone
move(S, c): set of NFA states to which there is a transition on input symbol c from some NFA state s in S
notes: » -closure(S) = Us S∈ -closure(s);» -closure(s) = -closure({s});» -closure(S) = ?
![Page 62: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/62.jpg)
62
Computing -closure Input. An NFA and a set of NFA states S. Output. E = -closure(S).begin
push all states in S onto stack; T := S;while stack is not empty do begin
pop t, the top element, off of stack;for each state u with an edge from t to u labeled do
if u is not in T do begin add u to T; push u onto stackend
end;return T
end.
![Page 63: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/63.jpg)
63
Simulating an NFA (for recognizing a token)
Input. An input string ended with eof and an NFA with start state s0 and final states F.
Output. The answer “yes” if accepts, “no” otherwise.begin
S := -closure({s0});c := next_symbol();while c != eof do beginS := -closure(move(S, c));c := next_symbol();end;if S F != then return “yes”else return “no”
end.
![Page 64: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/64.jpg)
64
Simulating an NFA (for recognizing a sequence of tokens)
Input. An input string ended with eof and an NFA with start state s0 and a set F of final states, each marked by a token type.
Output: a token sequence (possibly ended with error ).
![Page 65: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/65.jpg)
65
L0: length = 0 ; buf = new ArrayList(); type = -1; S := -closure({s0}); c := next_symbol();
while( c != eof ) { S := -closure(move(S, c)); F1 = S F ; if (F1 != ) { // c goes to final states! buf.add(c ); length= buf.size(); typr = minTypeOf(F1); } if ( S == ) { // cannot make c-transition!! if (length == 0 ) { output( error-token) ; exit() } else { output token(type, buf[0:length-1] ); puch back buf[length:-] and c into the input; goto L0; } else { pos++; buf.add(c); c := next_symbol(); } } if(length = buf.length() ) { if (length > 0) output token(type, buf ) ; } else { output token(type, buf(0, length-1); output error-token ; }.
![Page 66: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/66.jpg)
66
Regular Expressions to Finite Automata
High-level sketch
Regularexpressions
NFA DFA
LexicalSpecification
Table-driven Implementation of DFA
Optimized DFA
![Page 67: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/67.jpg)
67
Regular Expressions to NFA (1)
For each kind of rexp, define an NFA» Notation: NFA for rexp A
A
• For
• For input aa
![Page 68: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/68.jpg)
68
Regular Expressions to NFA (2)
For AB
• For A | B
A B
A
B
x
ymin(x,y)
![Page 69: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/69.jpg)
69
Regular Expressions to NFA (3)
For A*
A
![Page 70: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/70.jpg)
70
Example of RegExp -> NFA conversion
Consider the regular expression
(1|0)*1 The NFA is
1C E
0D F
B
G
A H 1I J
![Page 71: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/71.jpg)
71
NFA to DFA
![Page 72: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/72.jpg)
72
Regular Expressions to Finite Automata
High-level sketch
Regularexpressions
NFA DFA
LexicalSpecification
Table-driven Implementation of DFA
Optimized DFA
![Page 73: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/73.jpg)
73
RegExp -> NFA :an Examlpe
Consider the regular expression
(1+0)*1 The NFA is
1C E
0D F
B
G
A H 1I J
![Page 74: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/74.jpg)
74
NFA to DFA. The Trick
Simulate the NFA Each state of DFA
= a non-empty subset of states of the NFA Start state
= the set of NFA states reachable through -moves from NFA start states
Add a transition S a S’ to DFA iff» S’ is the set of NFA states reachable from any
state in S after seeing the input a– considering -moves as well
![Page 75: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/75.jpg)
75
NFA -> DFA Example
10 1
A BC
D
E
FG H I J
ABCDHI
FGABCDHI
EJGABCDHI
0
1
0
10 1
![Page 76: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/76.jpg)
76
NFA to DFA. Remark
An NFA may be in many states at any time
How many different states ?
If there are N states, the NFA must be in some subset of those N states
How many non-empty subsets are there?» 2N - 1 = finitely many
![Page 77: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/77.jpg)
77
From an NFA to a DFA Subset construction Algorithm. Input. An NFA N. Output. A DFA D with states S and transition table mv.begin
add -closure(s0) as an unmarked state to S;while there is an unmarked state T in S do begin
mark T; let TokenType(T) = min{ type( s) | s T ∩ F }.∈for each input symbol a do begin
U := -closure(move(T, a));if U is not in S then
add U as an unmarked state to S;mv[T, a] := U
end end end.
![Page 78: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/78.jpg)
78
Implementation
A DFA can be implemented by a 2D table T» One dimension is “states”» Other dimension is “input symbols”
» For every transition Si a Sk define mv[i,a] = k DFA “execution”
» If in state Si and input a, read mv[i,a] (= k) and move to state Sk
» Very efficient
![Page 79: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/79.jpg)
79
Table Implementation of a DFA
S
T
U
0
1
0
10 1
0 1
S T U
T T U
U T U
![Page 80: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/80.jpg)
80
Simulation of a DFA Input. An input string ended with eof and a DFA with start state
s0 and final states F.Output. The answer “yes” if accepts, “no” otherwise.begin
s := s0;c := next_symbol();while c <> eof do begin
s := mv(s, c); c := next_symbol() end; if s is in F then return “yes” else return “no”end.
![Page 81: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/81.jpg)
81
Simulation of a DFA(for recognizing token sequence )
Input. An input string ended with eof and a DFA with start state s0 and a set F of final states, each with a type.
Output. a sequence of tokens possibly ended with error token.begin length = 0; buf = new ArrayList(); type = -1; s := s0; c := next_symbol(); while( c <> eof ) {
s := mv(s, c); if( s is not error state ) { if(s is final) { length= buf.size() + 1; type = type(s); } buf.add(c) ; }
![Page 82: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/82.jpg)
82
if( s is error-state ) { if(length == 0 ) { output error-token ; exit() } else { output token(type, buf[0:length-1]) ; push back buf[length:-] and c into input ; length = 0; type = -1; } } c := next_symbol() } if(s is not final ) { output error-token ;}end.
![Page 83: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/83.jpg)
83
Implementation (Cont.) NFA -> DFA conversion is at the heart of tools
such as flex
But, DFAs can be huge» DFA => optimized DFA : try to decrease the
number of states. » not always helpful!
In practice, flex-like tools trade off speed for space in the choice of NFA and DFA representations
![Page 84: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/84.jpg)
84
Time-Space Tradeoffs
RE to NFA, simulate NFA» time: O(|r| |x|) , space: O(|r|)
RE to NFA, NFA to DFA, simulate DFA» time: O(|x|), space: O(2|r| )
Lazy transition evaluation» transitions are computed as needed at run
time;» computed transitions are stored in cache for
later use
![Page 85: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/85.jpg)
85
DFA to optimized DFA
![Page 86: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/86.jpg)
86
Motivations
Problems:1. Given a DFA M with k states, is it possible to find an
equivalent DFA M’ (I.e., L(M) = L(M’)) with state number fewer than k ?
2. Given a regular language A, how to find a machine with minimum number of states ?
Ex: A = L((a+b)*aba(a+b)*) can be accepted by the following NFA:
By applying the subset construction, we can constructa DFA M2 with 24=16 states, of which only 6 are accessible from the initial state {s}.
s t u v
a b a
a,b a,b
![Page 87: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/87.jpg)
87
Inaccessible states
A state p Q is said to be inaccessible (or unreachable) [from the initial state] if there exists no path from from the initial state to it. If a state is not inaccessible, it is accessible.
Inaccessible states can be removed from the DFA without affecting the behavior of the machine.
Problem: Given a DFA (or NFA), how to find all inaccessible states ?
![Page 88: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/88.jpg)
88
Finding all accessible states:
(like e-closure) Input. An FA (DFA or NFA) Output. the set of all accessible statesbegin
push all start states onto stack; Add all start states into A;
while stack is not empty do beginpop t, the top element, off of stack;for each state u with an edge from t to udo
if u is not in A do begin add u to A; push u onto stackend
end;return A end.
![Page 89: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/89.jpg)
89
Minimization process Minimization process for a DFA:
» 1. Remove all inaccessible states» 2. Collapse all equivalent states
What does it mean that two states are equivalent?» both states have the same observable behaviors.i.e.,» there is no way to distinguish their difference, or» more formally, we say p and q are not equivalent(or
distinguishable) iff there is a string x * s.t. exactly one of (p,x) and (q,x) is a final state,
» where (p,x) is the ending state of the path from p with x as the input.
Equivalents sates can be merged to form a simpler machine.
![Page 90: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/90.jpg)
90
0
1
2 4
3a
aa,b
a,bab
b b5
a,b
0 5
a,b
1,2 3,4a,b a,b a,b
Example:
![Page 91: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/91.jpg)
91
Quotient Construction M=(Q,, ,s,F): a DFA. : a relation on Q defined by:
p q <=>for all x * (p,x) F iff (q,x) FProperty: is an equivalence relation. Hence it partitions Q into equivalence classes [p] = {q Q | p q} for p Q. and the quotient set
Q/ = {[p] | p Q}.Every p Q belongs to exactly one class [p] and p q iff [p]=[q].
Define the quotient machine M/ = <Q’,, ’,s’,F’> where» Q’=Q/ ; s’=[s]; F’={[p] | p F}; and’([p],a)=[(p,a)] for all p Q and a .
![Page 92: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/92.jpg)
92
Minimization algorithm input: a DFA output: a optimized DFA1. Write down a table of all pairs {p,q}, initially unmarked.2. mark {p,q} if p F and q ∈ F or vice versa.3. Repeat until no more change: 3.1 if unmarked pair {p,q} s.t. {move(p,a), move(q,a)} is ∃
marked for some a S, then mark {p,q}.∈4. When done, p q iff {p,q} is not marked.5. merge all equivalent states into one class and return the
resulting machine Note:For recognizing multiple token types, 2 need change to2’ mark {p,q} if type(p) ≠ type(q) [assume non final state has
the same type ]
![Page 93: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/93.jpg)
93
An Example:
The DFA:
a b
>0 1 2
1F 3 4
2F 4 3
3 5 5
4 5 5
5F 5 5
![Page 94: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/94.jpg)
94
Initial Table
1 -
2 - -
3 - - -
4 - - - -
5 - - - - -
0 1 2 3 4
![Page 95: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/95.jpg)
95
After step 2
1 M
2 M -
3 - M M
4 - M M -
5 M - - M M
0 1 2 3 4
![Page 96: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/96.jpg)
96
After first pass of step 3
1 M
2 M -
3 - M M
4 - M M -
5 M M M M M
0 1 2 3 4
![Page 97: 1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.](https://reader036.fdocuments.net/reader036/viewer/2022081503/56649d575503460f94a35784/html5/thumbnails/97.jpg)
97
2nd pass of step 3.
The result : 1 2 and 3 4.1 M
2 M -
3 M2 M M
4 M2 M M -
5 M M1 M1 M M
0 1 2 3 4