Active Coevolutionary Learning of Deterministic Finite Automata
Implementation of Lexical Analysis · Specifying lexical structure using regular expressions •...
Transcript of Implementation of Lexical Analysis · Specifying lexical structure using regular expressions •...
![Page 1: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/1.jpg)
Implementation of Lexical Analysis
![Page 2: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/2.jpg)
2
Outline
•
Specifying lexical structure using regular expressions
•
Finite automata–
Deterministic Finite Automata (DFAs)
–
Non-deterministic Finite Automata (NFAs)
•
Implementation of regular expressionsRegExp
⇒ NFA ⇒
DFA ⇒
Tables
![Page 3: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/3.jpg)
3
Notation
•
For convenience, we will use a variation (we will allow user-defined abbreviations)
in regular
expression notation
•
Union: A + B ≡
A | B•
Option: A + ε ≡ A?
•
Range: ‘a’+’b’+…+’z’
≡
[a-z]•
Excluded range:
complement of [a-z] ≡
[^a-z]
![Page 4: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/4.jpg)
4
Regular Expressions in Lexical Specification
•
Last lecture: a specification for the predicate s ∈
L(R)
•
But a yes/no answer is not enough !•
Instead: partition the input into tokens
•
We will adapt regular expressions to this goal
![Page 5: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/5.jpg)
5
Regular Expressions ⇒
Lexical Spec. (1)
1.
Select a set of tokens•
Integer, Keyword, Identifier, OpenPar, ...
2.
Write a regular expression (pattern) for the lexemes of each token•
Integer
= digit +
•
Keyword = ‘if’
+
‘else’
+
…•
Identifier
= letter (letter + digit)*
•
OpenPar
=
‘(‘•
…
![Page 6: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/6.jpg)
6
Regular Expressions ⇒
Lexical Spec. (2)
3. Construct R, a regular expression matching all lexemes for all tokens
R = Keyword
+ Identifier
+ Integer
+ …= R1
+ R2
+ R3
+ …
Facts: If s ∈
L(R)
then s
is a lexeme–
Furthermore s ∈
L(Ri
)
for some “i”–
This “i”
determines the token that is reported
![Page 7: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/7.jpg)
7
Regular Expressions ⇒
Lexical Spec. (3)
4.
Let input be x1
…xn•
(x1
... xn
are characters)•
For 1 ≤
i ≤
n check
x1
…xi
∈
L(R)
?
5.
It must be thatx1
…xi
∈
L(Rj
)
for some j(if there is a choice, pick a smallest such j)
6.
Remove x1
…xi
from input and go to previous step
![Page 8: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/8.jpg)
8
How to Handle Spaces and Comments?
1.
We could create a token WhitespaceWhitespace = (‘
’
+ ‘\n’
+ ‘\t’)+
•
We could also add comments in there•
An input “
\t\n 5555 “
is transformed into
Whitespace Integer Whitespace2.
Lexer
skips spaces (preferred)
•
Modify step 5 from before as follows:It must be that xk
... xi
∈
L(Rj
)
for some j
such that x1
... xk-1
∈
L(Whitespace)•
Parser is not bothered with spaces
![Page 9: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/9.jpg)
9
Ambiguities (1)
•
There are ambiguities in the algorithm
•
How much input is used? What if•
x1
…xi
∈
L(R)
and also•
x1
…xK
∈
L(R)
–
Rule: Pick the longest possible substring –
The “maximal munch”
![Page 10: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/10.jpg)
10
Ambiguities (2)
•
Which token is used? What if•
x1
…xi
∈
L(Rj
)
and also•
x1
…xi
∈
L(Rk
)
–
Rule: use rule listed first (j if j < k)
•
Example:–
R1
= Keyword and R2
= Identifier–
“if”
matches both
–
Treats “if”
as a keyword not an identifier
![Page 11: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/11.jpg)
11
Error Handling
•
What ifNo rule matches a prefix of input ?
•
Problem: Can’t just get stuck …•
Solution: –
Write a rule matching all “bad”
strings
–
Put it last•
Lexer
tools allow the writing of:
R = R1
+ ... + Rn
+ Error–
Token Error matches if nothing else matches
![Page 12: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/12.jpg)
12
Summary
•
Regular expressions provide a concise notation for string patterns
•
Use in lexical analysis requires small extensions–
To resolve ambiguities
–
To handle errors•
Good algorithms known (next)–
Require only single pass over the input
–
Few operations per character (table lookup)
![Page 13: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/13.jpg)
13
Regular Languages & Finite Automata
Basic formal language theory result:Regular expressions and finite automata both define the class of regular languages.
Thus, we are going to use:•
Regular expressions for specification
•
Finite automata for implementation (automatic generation of lexical analyzers)
![Page 14: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/14.jpg)
14
Finite Automata
A finite automaton is a recognizer for the strings of a regular language
A finite automaton consists of–
A finite input alphabet Σ
–
A set of states S–
A start state n
–
A set of accepting states F ⊆
S–
A set of transitions state →input
state
![Page 15: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/15.jpg)
15
Finite Automata
•
Transitions1
→a
s2•
Is read
In state s1
on input “a”
go to state s2
•
If end of input (or no transition possible)–
If in accepting state ⇒ accept
–
Otherwise ⇒ reject
![Page 16: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/16.jpg)
16
Finite Automata State Graphs
•
A state
•
The start state
•
An accepting state
•
A transitiona
![Page 17: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/17.jpg)
17
A Simple Example
•
A finite automaton that accepts only “1”
1
![Page 18: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/18.jpg)
18
Another Simple Example
•
A finite automaton accepting any number of 1’s followed by a single 0
•
Alphabet: {0,1}
0
1
![Page 19: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/19.jpg)
19
And Another Example
•
Alphabet {0,1}•
What language does this recognize?
0
1
0
1
0
1
![Page 20: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/20.jpg)
20
And Another Example
•
Alphabet still { 0, 1 }
•
The operation of the automaton is not completely defined by the input–
On input “11”
the automaton could be in either state
1
1
![Page 21: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/21.jpg)
21
Epsilon Moves
•
Another kind of transition: ε-moves
ε
•
Machine can move from state A to state B without reading input
A B
![Page 22: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/22.jpg)
22
Deterministic and Non-Deterministic Automata
•
Deterministic Finite Automata (DFA)–
One transition per input per state
–
No ε-moves•
Non-deterministic Finite Automata (NFA)–
Can have multiple transitions for one input in a given state
–
Can have ε-moves•
Finite automata have finite memory–
Enough to only encode the current state
![Page 23: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/23.jpg)
23
Execution of Finite Automata
•
A DFA can take only one path through the state graph–
Completely determined by input
•
NFAs
can choose–
Whether to make ε-moves
–
Which of multiple transitions for a single input to take
![Page 24: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/24.jpg)
24
Acceptance of NFAs
•
An NFA can get into multiple states
•
Input:
0
1
1
0
1 0 1
•
Rule: NFA accepts an input if it can
get in a final state
![Page 25: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/25.jpg)
25
NFA vs. DFA (1)
•
NFAs
and DFAs
recognize the same set of languages (regular languages)
•
DFAs
are easier to implement–
There are no choices to consider
![Page 26: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/26.jpg)
26
NFA vs. DFA (2)
•
For a given language the NFA can be simpler than the DFA
01
0
0
01
0
1
0
1
NFA
DFA
•
DFA can be exponentially larger than NFA
![Page 27: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/27.jpg)
27
Regular Expressions to Finite Automata
•
High-level sketch
Regularexpressions
NFA
DFA
LexicalSpecification
Table-driven Implementation of DFA
![Page 28: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/28.jpg)
28
Regular Expressions to NFA (1)
•
For each kind of reg. expr, define an NFA–
Notation: NFA for regular expression M
i.e. our automata have one
start and one
accepting state
M
•
For εε
•
For input a a
![Page 29: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/29.jpg)
29
Regular Expressions to NFA (2)
•
For AB
A Bε
•
For A + B
A
B
εε
ε
ε
![Page 30: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/30.jpg)
30
Regular Expressions to NFA (3)
•
For A*
Aε
ε
ε
![Page 31: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/31.jpg)
31
Example of Regular Expression →
NFA conversion
•
Consider the regular expression(1+0)*1
•
The NFA is
ε
1C E0D F
ε
εB
ε
εG
ε
ε
ε
A H 1I J
![Page 32: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/32.jpg)
32
NFA to DFA. The Trick
•
Simulate the NFA•
Each state of DFA = a non-empty subset of states of the NFA
•
Start state = the set of NFA states reachable through ε-moves
from NFA start state•
Add a transition S →a S’
to DFA iff
–
S’
is the set of NFA states reachable from any state in S after seeing the input a
•
considering ε-moves as well
![Page 33: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/33.jpg)
33
NFA to DFA. Remark
•
An NFA may be in many states at any time
•
How many different states ?
•
If there are N states, the NFA must be in some subset of those N states
•
How many subsets are there?–
2N
-
1 = finitely many
![Page 34: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/34.jpg)
34
NFA to DFA Example
10 1ε ε
ε
ε
ε
ε
ε
ε
A BC
D
E
FG H I J
ABCDHI
FGABCDHI
EJGABCDHI
0
1
0
10 1
![Page 35: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/35.jpg)
35
Implementation
•
A DFA can be implemented by a 2D table T–
One dimension is “states”
–
Other dimension is “input symbols”–
For every transition Si
→a Sk
define T[i,a] = k
•
DFA “execution”–
If in state Si
and input a, read T[i,a] = k and skip to state Sk
–
Very efficient
![Page 36: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/36.jpg)
36
Table Implementation of a DFA
S
T
U
0
1
0
10 1
0 1S T UT T UU T U
![Page 37: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/37.jpg)
37
Implementation (Cont.)
•
NFA → DFA conversion is at the heart of tools such as lex, ML-Lex
or flex
•
But, DFAs
can be huge
•
In practice, lex/ML-Lex/flex-like
tools trade off speed for space in the choice of NFA and DFA representations
![Page 38: Implementation of Lexical Analysis · Specifying lexical structure using regular expressions • Finite automata – Deterministic Finite Automata (DFAs) – Non-deterministic Finite](https://reader031.fdocuments.net/reader031/viewer/2022041107/5f09b68e7e708231d42828bf/html5/thumbnails/38.jpg)
38
Theory vs. Practice
Two differences:
•
DFAs recognize lexemes. A lexer
must return a type of acceptance (token type) rather than simply an accept/reject indication.
•
DFAs
consume the complete string and accept or reject it. A lexer
must find the end of the
lexeme in the input stream and then find the next one, etc.