Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and...
-
Upload
alfred-summers -
Category
Documents
-
view
214 -
download
1
Transcript of Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and...
Regular Expressions
CIS 361
Need finite descriptions of infinite sets of strings.
Discover and specify “regularity”.
The set of languages over a finite alphabet is uncountable, while the set of descriptions is countable
Fundamental Problems
Regular Expressions Language L is regular if there
exists a finite acceptor for it Any language that is described by a
regular expression can be accepted by some finite automaton
Regular Expressions Regular expressions
Combination of strings of symbols from some alphabet, parentheses and operators U, ., *
U is union (some literature uses +) . (or nothing) is concatenation * is star closure or Kleene star
superscripted repetition, 0 or more times
+ is closure superscripted repetition, 1 or more times
Specifying Lexical Structure Using Regular Expressions Have some alphabet = set of symbols Regular expressions are built from:
- empty string Any letter from r1r2 – String r1 followed by r2 (concatenation) r1 U r2 (r1 + r2) – either regular expression r1 or
r2 (union) r* - iterated sequence and choice | r | r r | … Parentheses to indicate grouping/precedence
Regular Expressions Operations
Union Complement Intersection Difference Concatenation Repetition
Kleene star Plus operator
Regular Expressions Union
L M The union of two regular expressions
Q and R is Q U R In terms of automata A and B,
respectively create a new initial state q connect it to the initial states of A and B
by transitions
Regular Expressions Complement
* - L To construct the complement of a
regular expression L, inspect the automaton that accepts its strings
convert the automaton for L to a deterministic automaton
flips favorable and nonfavorable states construct a regular expression for strings
accepted by the updated automaton
Regular Expressions Complement of
bit strings with at least one “1” = bit strings containing no “1”s
= 0* Complement of
bit strings with exactly one “1”= bit strings containing no “1”s
U bit strings with at least two “1”s= 0* U (0* 1 0* 1 0*)(0 U 1)*
Regular Expressions Intersection
L M Apply DeMorgan’s law
Union of the complements of L and M
Regular Expressions Difference
L – M Can be expressed as the intersection
of languages L and * - M
Regular Expressions Concatenation
Strings u and v over alphabet is string uv
Languages L1 and L2 concatenated L1L2 ={uv|u L1, v L2}
Can be extended to any finite number of languages
Regular Expressions Concatenation
LM Algorithm connects every favorable
state of L to the initial state of M by an arrow labeled
Favorable states of L become non-favorable Favorable states of M become favorable
states of the new automaton
Regular Expressions Kleene star
L* In terms of automaton
connect every favorable state of L to the initial state of L by a transition labeled
create a new initial state s, make it the only favorable state and connect it to the old initial state by transition
Regular Expressions Plus (+)
L+ In terms of automaton
connect every favorable state of L to the initial state of L by a transition labeled
That’s it. This gets one or more times to a favorable state
Naming Languages Regular sets can be named using the
derivation in terms of the seed elements and the closure operations. Regular expressions formalize this approach.
Regular sets Regular ExpressionsNumbers Numerals
Semantics Semantics SyntaxSyntax
Regular expressions for strings over {a,b} containing at least one “a”.
Focus on the one “a”(a u b)*a(a u b)*
Focus on the leftmost “a”b*a(a u b)*
Focus on the “a”sb*ab*(ab*)*
Further optimizationb*(ab*)+
Example
Two regular expressions are equivalent if they represent the same regular set.
Equivalence of regular expressions
Concept of Language Generated by Regular Expressions
Set of all strings generated by a regular expression is the language of the regular expression
In general, a language may be (countably) infinite
A string in a language is often called a token
Examples of Languages and Regular Expressions
= { 0, 1, . } (0 U 1)*.(0 U 1)* - Binary floating point numbers (00)* - even-length all-zero strings 1*(01*01*)* - strings with even number of zeros
= {A,…,Z, a,…,z, 0,…,9,_ } (A U … U z)(A U … U z U 0 U … U 9 U _) * identifiers (1 U … U 9)(0 U … U 9)* natural numbers (no negatives)
(0|1|2)* - trinary (base 3) numbers
Finite-State Automata Alphabet Set of states with initial and accepting states Transitions between states, labeled with
symbol(s)
1
0
1
0
(0 | 1)*.(0|1)*