NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

20
C SE6390 3.0 SpecialTopics in A I& Interactive System s II Introduction to C om putationalLinguistics FallSem ester,2010 Instructor: Nick Cercone - 3050 CSEB - [email protected] 1 NL Grammar Hierarchi es Regular Expressio ns, Finite State Automata, Markov Algorithm

description

NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms. Regular Expressions - PowerPoint PPT Presentation

Transcript of NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

Page 1: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 1

NL Grammar Hierarchies

Regular Expressions, Finite State Automata, Markov Algorithms

Page 2: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 2

Regular Expressions• Regular expressions consist of constants and operators

that denote sets of strings and operations over these sets, respectively. The following definition is standard, and found as such in most textbooks on formal language theory. Given a finite alphabet , the following constants are defined:

• (empty set) denoting the set • (empty string) denoting the "empty" string, with

no characters at all.• (literal character) a in denoting a character in

the language.

Page 3: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 3

Regular Expressions

The following operations are defined:

• (concatenation) RS denoting the set { ab | a in R and b in S }. For example {"ab", "c"}{"d", "ef"} = {"abd", "abef", "cd", "cef"}.

• (alternation) R | S denoting the set union of R and S. For example {"ab", "c"}|{"ab", "d", "ef"} = {"ab", "c", "d", "ef"}.

Page 4: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 4

Regular Expressions

• (Kleene star) R* denoting the smallest superset of R that contains e and is closed under string concatenation. This is the set of all strings that can be made by concatenating zero or more strings in R. For example, {"ab", "c"}* = {e, "ab", "c", "abab", "abc", "cab", "cc", "ababab", "abcab", ... }.

Page 5: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 5

Regular Expressions (summary)• (empty set) denoting the set .∅ ∅• (empty string) ε denoting the set containing only the "empty" string, which has no

characters at all.• (literal character) a in Σ denoting the set containing only the character a.• The following operations are defined:• (concatenation) RS denoting the set { αβ | α in R and β in S }. For example {"ab",

"c"}{"d", "ef"} = {"abd", "abef", "cd", "cef"}.• (alternation) R | S denoting the set union of R and S. For example {"ab", "c"}|{"ab",

"d", "ef"} = {"ab", "c", "d", "ef"}.• (Kleene star) R* denoting the smallest superset of R that contains ε and is closed

under string concatenation. This is the set of all strings that can be made by concatenating any finite number (including zero) of strings from R. For example, {"0","1"}* is the set of all finite binary strings (including the empty string), and {"ab", "c"}* = {ε, "ab", "c", "abab", "abc", "cab", "cc", "ababab", "abcab", ... }.

• nothing else is a regular expression over ∑

Page 6: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 6

Finite State Automata• Automata are models of computation: they compute

languages.

• A finite-state automaton is a five-tuple {Q, q0, ∑, , F}, where ∑ is a finite set of alphabet symbols, Q is a finite set of states, q0 Q is the ∈ initial state, F Q is a set of ⊆final (accepting) states and : Q × ∑ × Q is a relation from states and alphabet symbols to states.

Page 7: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 7

Finite State Automata• Example: Finite-state automaton• Q = {q0, q1, q2, q3}• ∑ = {c, a, t, r}• F = {q3} = {<q0, c, q1>, <q1, a, q2>, <q2, t, q3>, <q2, r , q3>}

Page 8: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 8

Finite State Automata

• The reflexive transitive extension of the transition relation is a new relation, ˆ, defined as follows:– for every state q Q, (q, ∈ , q) ˆ∈ – for every string w ∈ ∑∗ and letter a ∈ ∑, if (q,w, q′) ∈

ˆ and (q′, a, q′′) ∈ then (q,w · a, q′′) ˆ∈ .

Page 9: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 9

Finite State Automata• Example: Paths

For the finite-state automaton:

ˆ is the following set of triples:

<q0, ǫ, q0>, <q1, ǫ, q1>, <q2, ǫ, q2>, <q3, ǫ, q3>,

<q0, c, q1>, <q1, a, q2>, <q2, t, q3>, <q2, r , q3>,

<q0, ca, q2>, <q1, at, q3>, <q1, ar , q3>,

<q0, cat, q3>, <q0, car , q3>

Page 10: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 10

Finite State Automata

An extension: -moves.• The transition relation is

extended to:

⊆ Q × (∑ {∪ }) × Q

Example: Automata with -moves - an automaton accepting the language {do, undo, done, undone}:

Page 11: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 11

Formal language theory – definitions

• If L is a language then the reversal of L, denoted LR, is the language {w | wR L}.∈

• If L1 and L2 are languages, then L1 · L2 = {w1 · w2 | w1 L1 and ∈w2 L2}.∈

• Example: Language operations– Let L1 = {i, you, he, she, it, we, they}, L2 = {smile, sleep}.– Then L1R = {i, uoy, eh, ehs, ti, ew, yeht} and L1 · L2 = {ismile,

yousmile, hesmile, shesmile, itsmile, wesmile, theysmile, isleep, yousleep, hesleep, shesleep, itsleep, wesleep, theysleep}.

Page 12: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 12

Formal language theory – definitions

• If L is a language then L0 = {}.• Then, for i > 0, Li = L · Li−1.• Example: Language exponentiation

Let L be the set of words {bau, haus, hof, frau}. Then L0 = {}, L1 = L and L2 = {baubau, bauhaus, bauhof, baufrau, hausbau, haushaus, haushof, hausfrau, hofbau, hofhaus, hofhof, hoffrau, fraubau, frauhaus, frauhof, fraufrau}.

Page 13: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 13

Formal language theory – definitions• The Kleene closure of L and is denoted L∗ and is

defined as ∞i=0 Li .

• L+ = ∞i=0 i=1 Li

• Example: Kleene closureLet L = {dog, cat}. Observe that L0 = {}, L1 = {dog, cat}, L2 = {catcat, catdog, dogcat, dogdog}, etc. Thus L∗ contains, among its infinite set of strings, the strings , cat, dog, catcat, catdog, dogcat, dogdog, catcatcat, catdogcat, dogcatcat, dogdogcat, etc.

• The notation for ∗ should now become clear: it is simply a special case of L∗, where L = ∑

Page 14: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 14

Markov AlgorithmsA Markov Algorithm is a finite sequence P1, P2,...,Pn of Markov productions to be applied to strings in a given alphabet according to the following rules. Let S be a given string. The sequence is searched to find the first production Pi whose antecedent occurs in S. If no such production exists, the operation of the algorithm halts without change in S. If there is a production in the algorithm whose antecedent occurs in S, the first such production is applied to S. If this is a conclusive production, the operation of the algorithm halts without further change in S. If this is a simple production, a new search is conducted using the string S' into which S has been transformed. If the operation of the algorithm ultimately ceases with a string S*, we say that S* is the result of applying the algorithm to S.

Page 15: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 15

Markov AlgorithmsExample:

Take the alphabet to be {a, b, c, d}. The algorithm is given below.

• Algorithm M1M11: [conclusive] a d → d cM12: [simple] b a → WM13: [simple] a → b cM14: [simple] b c → b b aM15: [simple] W → a

Taking S = “dcb” we apply the algorithmby M15 dcb becomes adcbby M11 adcb becomes dccb and halts.

Page 16: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 16

Markov AlgorithmsExample:

Let be a marker not in the alphabet. If S is a string in the alphabet, the result of applying algorithm M3 to S is the string SA.

Algorithm M3M31: [interchange] → , A member of alphabetM32: [conclusive] → AM33: W → Since S initially does not contain , the third production is then used to move past the symbols in S. If S contains n occurrences of symbols, then after n steps we obtain the string S. At this point the first production no longer applies, and the second production produces SA. Since this production is conclusive, the string SA is then the result.

Page 17: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 17

Markov Algorithms

In the preceding example, we have introduced a new notation. Namely, in the first production we have used the variable which ranges over the symbols in the alphabet. Thus the first line is not really a production, but rather a production schema, denoting all the productions which can be obtained by substituting symbols of the alphabet for .

Because of the manner in which the Markov algorithms are used, the order in which the productions are written is vital. If the first two lines of algorithm M3 were interchanged, the result would be to transform S into AS, rather than into SA, and the productions represented by → would never be used.

Page 18: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 18

Markov AlgorithmsExample:

Another procedure which is quite common is that of reversing a string of characters. We do this by moving the first character to the end as before, then moving the next character down to the position just preceding the first character, and so on. Markers: ,

Algorithm M10M101: → W , f members of the

alphabetM102: → M103: f → fM104: → M105: W →

Page 19: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 19

Markov AlgorithmsIllustrating this algorithm on the string “ABCD” we have

by M105 => A B C Dby M103 => B A C Dby M103 => B C A Dby M103 => B C D Aby M104 => B C D Aby M105 => B C D Aby M103 => C B D Aby M103 => C D B Aby M102 => C D B Aby M105 => C D B Aby M103 => D C B A by M102 => D C B Aby M105 => D C B Aby M102 => D C B Aby M105 => D C B Aby M101 => D C B A

Page 20: NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

CSE6390 3.0 Special Topics in AI & Interactive Systems IIIntroduction to Computational Linguistics

Fall Semester, 2010

Instructor: Nick Cercone - 3050 CSEB - [email protected] 20

Other Concluding Remarks

A PSYCHOLOGICAL TIP

Whenever you're called on to make up your mind,and you're hampered by not having any,

the best way to solve the dilemma, you'll find,is simply by spinning a penny.

No -- not so that chance shall decide the affair while you're passively standing there moping;

but the moment the penny is up in the air, you suddenly know what you're hoping.