Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and...

21
Regular Expressions CIS 361

Transcript of Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and...

Page 1: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Regular Expressions

CIS 361

Page 2: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Need finite descriptions of infinite sets of strings.

Discover and specify “regularity”.

The set of languages over a finite alphabet is uncountable, while the set of descriptions is countable

Fundamental Problems

Page 3: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Regular Expressions Language L is regular if there

exists a finite acceptor for it Any language that is described by a

regular expression can be accepted by some finite automaton

Page 4: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Regular Expressions Regular expressions

Combination of strings of symbols from some alphabet, parentheses and operators U, ., *

U is union (some literature uses +) . (or nothing) is concatenation * is star closure or Kleene star

superscripted repetition, 0 or more times

+ is closure superscripted repetition, 1 or more times

Page 5: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Specifying Lexical Structure Using Regular Expressions Have some alphabet = set of symbols Regular expressions are built from:

- empty string Any letter from r1r2 – String r1 followed by r2 (concatenation) r1 U r2 (r1 + r2) – either regular expression r1 or

r2 (union) r* - iterated sequence and choice | r | r r | … Parentheses to indicate grouping/precedence

Page 6: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Regular Expressions Operations

Union Complement Intersection Difference Concatenation Repetition

Kleene star Plus operator

Page 7: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Regular Expressions Union

L M The union of two regular expressions

Q and R is Q U R In terms of automata A and B,

respectively create a new initial state q connect it to the initial states of A and B

by transitions

Page 8: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Regular Expressions Complement

* - L To construct the complement of a

regular expression L, inspect the automaton that accepts its strings

convert the automaton for L to a deterministic automaton

flips favorable and nonfavorable states construct a regular expression for strings

accepted by the updated automaton

Page 9: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Regular Expressions Complement of

bit strings with at least one “1” = bit strings containing no “1”s

= 0* Complement of

bit strings with exactly one “1”= bit strings containing no “1”s

U bit strings with at least two “1”s= 0* U (0* 1 0* 1 0*)(0 U 1)*

Page 10: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Regular Expressions Intersection

L M Apply DeMorgan’s law

Union of the complements of L and M

Page 11: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Regular Expressions Difference

L – M Can be expressed as the intersection

of languages L and * - M

Page 12: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Regular Expressions Concatenation

Strings u and v over alphabet is string uv

Languages L1 and L2 concatenated L1L2 ={uv|u L1, v L2}

Can be extended to any finite number of languages

Page 13: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Regular Expressions Concatenation

LM Algorithm connects every favorable

state of L to the initial state of M by an arrow labeled

Favorable states of L become non-favorable Favorable states of M become favorable

states of the new automaton

Page 14: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Regular Expressions Kleene star

L* In terms of automaton

connect every favorable state of L to the initial state of L by a transition labeled

create a new initial state s, make it the only favorable state and connect it to the old initial state by transition

Page 15: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Regular Expressions Plus (+)

L+ In terms of automaton

connect every favorable state of L to the initial state of L by a transition labeled

That’s it. This gets one or more times to a favorable state

Page 16: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Naming Languages Regular sets can be named using the

derivation in terms of the seed elements and the closure operations. Regular expressions formalize this approach.

Regular sets Regular ExpressionsNumbers Numerals

Semantics Semantics SyntaxSyntax

Page 17: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Regular expressions for strings over {a,b} containing at least one “a”.

Focus on the one “a”(a u b)*a(a u b)*

Focus on the leftmost “a”b*a(a u b)*

Focus on the “a”sb*ab*(ab*)*

Further optimizationb*(ab*)+

Example

Page 18: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Two regular expressions are equivalent if they represent the same regular set.

Equivalence of regular expressions

Page 19: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Concept of Language Generated by Regular Expressions

Set of all strings generated by a regular expression is the language of the regular expression

In general, a language may be (countably) infinite

A string in a language is often called a token

Page 20: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Examples of Languages and Regular Expressions

= { 0, 1, . } (0 U 1)*.(0 U 1)* - Binary floating point numbers (00)* - even-length all-zero strings 1*(01*01*)* - strings with even number of zeros

= {A,…,Z, a,…,z, 0,…,9,_ } (A U … U z)(A U … U z U 0 U … U 9 U _) * identifiers (1 U … U 9)(0 U … U 9)* natural numbers (no negatives)

(0|1|2)* - trinary (base 3) numbers

Page 21: Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Finite-State Automata Alphabet Set of states with initial and accepting states Transitions between states, labeled with

symbol(s)

1

0

1

0

(0 | 1)*.(0|1)*