CS 321Programming Languages and
CompilersLectures 16 & 17
CS 321Programming Languages and
CompilersLectures 16 & 17
Introduction to Formal Languages
Regular Languages
Lexical Analysis
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing22
LanguagesLanguages
• Have a finite vocabulary
• Have finite length sentences
• Have possibly infinitely many sentences
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing33
Grammars and RecognizersGrammars and Recognizers
• A Grammar is a finitary method by which all sentences of a language, L, may be generated via well-defined rules.
• A Recognizer is a procedure which, given a “string” x, answers “yes” if x L
• We usually also want to answer “no” if x L, I.e. usually demand an algorithm.)
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing44
(Context-Free) Grammars(Context-Free) Grammars
• Def. A (context-free or Chomsky Type-2) grammar (cfg) is a 4-tuple
G = (N, , P, S)
where– N is a finite, non-empty set of symbols (non-terminal
vocabulary) is a finite set of symbols (terminal vocabulary)
– N =
– V N (vocabulary)
– S N (goal symbol)
– P is a finite subset of N V* (production rules)
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing55
Set Operations
Set Operations
• Def. Let X and Y be sets of words
XY {xy | x X and y Y}
X0 {} (where represents the empty string)
X1 X
XI+1 XiX
X* i 0 Xi
X+ i > 0 Xi (so X+ = X* X)
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing66
ExampleExample
• G = (N, , P, E)
where
N = {E, T, F}
= {[, ], +, *, id}
P = {(E,T), (E,E+T), (T,F), (T,T*F), (F,id), (F,[E])}
• (so V = N = {E, T, F, [, ], +, *, id})
• (A, ) P is usually written
A
or A ::=
or A :
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing77
ConventionConvention• Given G = (N, , P, S) (with V = N )
(or G = (V, , P, S) with N=V- )– elements of N: A, B, …
– elements of V: … U, V, W, X, Y, Z
– elements of : a, b, …
– elements of *: … u, v, w, x, y, z
– elements of V *: , , , , ,
• others:– names (not underlined) : N
– S: N
– underlined or courier font:
– special symbols: is used to denote a production rule: ( = A )
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing88
Generating LGenerating L
• How to use a grammar, G, to generate a sentence in L(G):
• Begin with a string, consisting of only the goal symbol.
• repeat
select from a non-terminal “A” and
“rewrite” A according to some production
(A, )
thereby producing ’ from .
until ’ *
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing99
ExampleExampleG = (N, , P, S) where P is (abbreviated) as follows:
E T | E + T
T F | T * F
F id | < E >
and where
N = {E, T, F, Q}
= {+, *, <, >, id}
S = E
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing1010
Regular SetsRegular Sets
• Regular sets (also called regular languages) are defined as follows. Let be a finite alphabet.
1) is a regular set over .
2) {} is a regular set over .
3) a , {a} is a regular set over .
4) If P and Q are regular sets over ,
a) P Q is a regular set over .
b) PQ is a regular set over .
c) P* is a regular set over .
5) Nothing else is a regular set over .
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing1111
Regular ExpressionsRegular Expressions
1) denotes the regular set .
2) denotes the regular set {}.
3) a denotes the regular set {a}.
4) If p and q are regular expressions denoting the regular sets P and Q respectively, then
a) (p|q) denotes P Q.
b) (pq) denotes PQ.
c) (p)* denotes p*
5) Nothing else is a regular expression.
***
Notation: (p)+ ((p)*p)
(p)? p |
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing1212
Right-Linear Grammars (Generators for Regular Sets)
Right-Linear Grammars (Generators for Regular Sets)• Def. Let G = (N, , P, S) be a cfg. G is said to be
right-linear if
P N (* *N)
***
• Proposition. If G is a right-linear cfg then L(G) is a regular set over .
• Proposition. If R is a regular set over , then a right-linear cfg, G, for which L(G) = R.
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing1313
Finite Automata (Recognizers for Regular Sets)Finite Automata (Recognizers for Regular Sets)
Def. A deterministic finite automaton (deterministic finite state machine) is a 5-tuple:
M = (Q, , , q0, F)
where
1) Q is a finite non-empty set of states.
2) is a finite set of input symbols.
3) q0 Q (initial state)
4) F Q (final states)
5) is a partial mapping from Q to Q (transition function or move function)
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing1414
Transition DiagramsTransition Diagrams
• FSMs are often visualized as transition diagrams.
p
r
s
q start
0|1
0|1
0|1
0|1
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing1515
Finite State MachinesFinite State Machines• The preceding transition diagram can be
represented by a tabular move function:
0 1
p q q s
q q q r
r r r
s r r
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing1616
Finite State MachinesFinite State Machines• The preceding transition diagram can be
represented by a tabular move function:
0 1
p q q s
q q q r
r r r
s r r
q0
Q
F
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing1717
Formalizing the Moves of a FSMFormalizing the Moves of a FSM
• A pair (q,u) in Q * is called a configuration of M.
• (q0, u) is an initial configuration.
• M proceeds from one configuration to the next by moving according to the transition function:
(q, au) (q’, u) if (q, a)=q’
(q, u) … (q’, v)
is written
(q, u) * (q’, v)
• The language accepted (or defined) by M is
L(M) = {u * | (q0, u) * (q, ) for some q F}
Note: Sometimes is used to denote the empty string
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing1818
ExampleExample
• With the machine
M = ({p,q,r,s}, {0,1, }, , p, {q,r})
where the move function is shown in the preceding table.
• Question 1: Is 010 L(M)?
• Question 2: Is L(M)?
• Question 3: Is 010 L(M)?
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing1919
“Complete” Finite State Machines“Complete” Finite State Machines
• Extend :
0 1
p q q s
q q q r
r r r t
s r r t
t t t t
/
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing2020
Complete Finite State MachineTransition Diagram Version
Complete Finite State MachineTransition Diagram Version
p
r
s
q start
0|1
0|1
0|1
0|1
t
0|1|
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing2121
Non-deterministic FSMsNon-deterministic FSMs
• A FSM may have a choice of moves, i.e. is a mapping from Q to 2Q.
• Proposition. Let M1 be a non-deterministic FSM. Then a DFSM M2 for which L(M2) = L(M1).
• Proposition. Given a NFSM, M, one can construct a right-linear cfg, G, for which L(G) = L(M), and conversely.
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing2222
Extended Non-determinismExtended Non-determinism
• Besides allowing multiple moves on the same input symbol, we can allow moves on the empty string, ; i.e. for a given state q:
(q, ) Q
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing2323
Examples
0 1 2 3
start a|b
a b b
2
4
1
3
0
start
a
b
b
a
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing2424
Thompson’s ConstructionThompson’s Construction
• Given a regular expression, r representing a regular set R, construct a non-deterministic finite state machine M that recognizes R, i.e. such that L(M)=R.
1) For construct
i f
start
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing2525
Thompson’s ConstructionThompson’s Construction
2) For a in construct
i f
start
a
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing2626
Thompson’s ConstructionThompson’s Construction3) Suppose N(s) and N(t) are NFSM's for regular
expressions s and t.
a) For the regular expression s|t, construct
N(s)
N(t)
s f
start
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing2727
Thompson’s ConstructionThompson’s Construction
b) For the regular expression st, construct:
i N(s) N(t)
start
f
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing2828
Thompson’s ConstructionThompson’s Construction
c) For the regular expression s*, construct
N(s) i f
start
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing2929
Transforming a NFSM to a DFSM (The Subset Construction)
Transforming a NFSM to a DFSM (The Subset Construction)• Define:
-closure(sQ) = {tQ | s can reach t via only -moves}
-closure(T Q) = -closure(s)
move(T Q, a ) = (s,a)
sT
sT
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing3030
NFSM DFSMNFSM DFSM• Given M=(Q, , , q0, F) define M’=(Q’, , ’, q’0, F’)
by:
1) Compute q’0 = -closure(q0).
2) Initialize Q’ with q’0 (unmarked).
3) while an unmarked element q’ of Q’:
a) mark q’
b) a :
-- compute p’ = -closure(move(q’, a))
-- if p’ Q’ then add p’ (unmarked) to Q’
-- set ’(q’, a)=p’
4) F’ = { q’ Q’ | q q’ q F}
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing3131
ExampleExample
• Perform Thompson’s Construction on (a|b)*abb to obtain a non-deterministic finite state machine.
• Perform the subset construction to make it deterministic.
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing3232
Simulating a DFSMSimulating a DFSM
s:= q0
a:=nextchar
while a eof {
s:= (s,a)
a:=nextchar
}
if s F then return “yes”
else return “no”
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing3333
Simulating a NFSMSimulating a NFSM
S:= -closure({q0})
a:=nextchar
while a eof {
S:= -closure(move(S,a))
a:=nextchar
}
if S F then return “yes”
else return “no”
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing3434
Transforming from NFSM to Right-Linear CFGTransforming from NFSM to Right-Linear CFG
• Given M=(Q, , , q0, F), construct G=(Q, , P, q0) where
1) q F include in P
q
2) q1, q2 Q; a q2 (q1, a) include in P
q1 a q2
3) q1, q2 Q q2 (q1, ) include in P
q1 q2
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing3535
ExampleExample• Let M be:
(Note, this is not something obtained from Thompson’s Construction, but written by hand.)
• We have:
q0 a q0 | b q0 | a q1
q1 b q2
q2 b q3
q3
0 1 2 3
start a|b
a b b
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing3636
RLG Regular ExpressionRLG Regular Expression
• The algorithm resembles Gaussian Elimination.• Notice that all of the “A-rules” can be “grouped” by the
non-terminal on the right side of the right-part and “factored”:
A 0A
A 1A1
A 2A2
…
A n-1An-1
A n
where the i are regular expressions over
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing3737
RLG Regular ExpressionRLG Regular Expression
• Then A can be written as the following regular expression over V:
A = 0*( 1A1 | 2A2 | … | n-1An-1 | n )
and the above regular expression can be substituted for A everywhere A appears in the grammar.
• Following that, all rules can again be written in the foregoing “factored” form.
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing3838
RLG Regular ExpressionRLG Regular Expression• Given a right-linear grammar G=(N, . P, S):A) repeat
1) write all rules in “factored” form.2) choose some non-terminal, A S, to eliminate.3) compute the regular expression, r, which is
equivalent to A, and substitute r in place of A everywhere in G.
4) delete all A-rules from G until only S-rules remainB) compute the regular expression, r, to which S is
equivalent.
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing3939
ExampleExample• Recall
q0 a q0 | b q0 | a q1
q1 b q2
q2 b q3
q3 • Rewrite q0 (a | b) q0 | a q1
q1 b q2
q2 b q3
q3
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing4040
ExampleExample
• Eliminate q3 q0 (a | b) q0 | a q1
q1 b q2
q2 b
• Eliminate q2 q0 (a | b) q0 | a q1
q1 b b
• Eliminate q1 q0 (a | b) q0 | a b b
• Compute q0 q0 = (a | b)* a b b
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing4141
Limitations of FSMsLimitations of FSMs
• FSMs have a fixed numbers of states
• For this reason, there are objects that cannot be recognized by FSMs.
• For example there is no FSM that can recognize palindromes of arbitrary length.
• The DO keyword in Fortran cannot be expressed as a regular expression.
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing4242
Minimization of DFSM’sMinimization of DFSM’s
• Well-known algorithm (due to Hopcroft), useful in many other circumstances.
1) Initially partition Q into two groups, F and Q-F.
2) repeat
group, G, of the partition, split G into multiple sub-groups, if incompatible
transitions are found among members of G.
until no further changes occur
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing4343
ExampleExample
0 1
p q1 q2 s
q1 q1 q2 r1
q2 q1 q2 r1
r1 r1 r1
r2 r2 r2
s r2 r2
0 1
p q1 q2 s
q1 q1 q2 r1
q2 q1 q2 r1
r1 r1 r1
r2 r2 r2
s r2 r2
final
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing4444
Algebraic PropertiesAlgebraic Properties
Axiom Description
r | s = s| r | is commutative
r | (s| t) = r | (s| t) | is associative
(rs)t = r (st) concatenation is associative
r (s| t) = rs| rt(s| t)r = sr | tr
concatenation distributes over|
r = rr = r
is the identity element forconcatenation
r* = ( r | )* relation between * and
r** = r* * is idempotent
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing4545
Shorthand NotationsShorthand Notations
• (a)+ denotes one or more instancer* = r+ | r+ = rr*
• (r)? denotes zero or one instancer? = r |
• [a-z] denotes a|b|c|..|z
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing4646
ExamplesExamples
• [a-zA-Z]+ denotes string of one or more characters
• [a-zA-Z][a-zA-Z0-9] + denotes valid identifiers in Fortran
• [0-9] +(.[0-9] +)?(E(+|-)?[0-9] +)? denotes valid unsigned Pascal numbers
Finite Automata & LexingFinite Automata & LexingFinite Automata & LexingFinite Automata & Lexing4747
Extended Transition Diagrams for Parts of PascalExtended Transition Diagrams for Parts of Pascal
Top Related