5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

26
5.11.2002 CSA3050: NLP Algorithms 1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

Transcript of 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

Page 1: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 1

CSA3050: NLP Algorithms

Finite State Transducers for Morphological Parsing

Page 2: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 2

Resumé

• FSAs are equivalent to regular languages

• FSTs are equivalent to regular relations (over pairs of regular languages)

• FSTs are like FSAs but with complex labels.

• We can use FSTs to transduce between surface and lexical levels.

Page 3: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 3

Dotted Pair Notation

f o x1) FSA recogniser for "fox"

f:f o:o x:x

2) FST transducers for fox/fox; goose/geese

g:g o:e s:so:e e:e

Page 4: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 4

Dotted Pair Notation (2)

• By convention, x:y pairs lexical symbol x with surface symbol y

• By convention, within the context of FSTs, we often encounter "default pairs" of the form x:x. These are often written as "x".

g o:e so:e e

Page 5: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 5

FSA for Number Inflection

How can we augment this to producean analysis?

Page 6: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 6

3 Steps

1. Create a transducer Tnum for noun number inflection. This will add number and category information given word classes as input.

2. Create a transducer Tstems mapping words to word classes.

3. Hook the two together.

Page 7: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 7

Tnum example“lexical”

“intermediate”

+PLreg-noun-stem +N

^ #s

reg-noun-stem

Page 8: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 8

1. Tnum: Noun Number Inflection

• multi-character symbols• morpheme boundary ^• word boundary #

Page 9: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 9

Tstems example

#

reg-noun-stem #

“intermediate”

“surface”

d:d o:o g:gf:f o:o x:x

Tstems

Page 10: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 10

Tstems example

#

irreg-pl-noun-form#

“intermediate”

“surface”

m o:i u:ε s es h e e p

Tstems

Page 11: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 11

2. Tstems Lexicon

Page 12: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 12

Hooking Together

• There are two ways to hook the two transducers together

• Cascading: hooking the output of one transducer with the input of the other and running them in series.

• Composition: composing the two transducers together mathematically to create a third, equivalent transducer.

Page 13: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 13

Hooking Together: cascading+PLreg-noun-stem +N

sreg-noun-stem ^ #

sdogfox

#

lexical

intermediate

surface

Tstems

Tnum

Page 14: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 14

Composition of Relations

• Let R and S be binary relations.

• The composition of R and S written R S is defined as:

• (a,c) R S if and only if(a,b) R and (b,c) Sfor all a,b,c

• Transducers can also be composed

Page 15: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 15

Tnum o Tstem

Page 16: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 16

English Spelling Rules

• consonant doubling: beg / begging

• y replacement: try/tries

• k insertion: panic/panicked

• e deletion: make/making

• e insertion: watch/watches

• Each rule can be stated in more detail ...

Page 17: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 17

e Insertion Rule

• Insert an e on the surface tape just when the lexical tape has morpheme ending in x,s,z,or ch and the next and final morpheme is -s

• Stated formally e [x|s|z|ch]^ __ s#

Page 18: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 18

e insertion over 3 levelsThe rule corresponds to the mapping betweensurface and intermediate levels

Page 19: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 19

e insertion as an FST

Page 20: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 20

Incorporating Spelling Rules

• Spelling rules, each corresponding to an FST, can be run in parallel provided that they are "aligned".

• The set of spelling rules is positioned between the surface level and the intermediate level.

• Parallel execution of FSTs can be carried out:– by simulation: in this case FSTs must first be aligned.

– by first constructing a a single FST corresponding to their intersection.

Page 21: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 21

Putting it all together

execution of FSTi

takes place in parallel

Page 22: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 22

Kaplan and KayThe Xerox View

FSTi are alignedbut separate

FSTi intersectedtogether

Page 23: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 23

Operations over FSTs

• We can perform operations over FSTs which yield other FSTs. – Inversion– Union– Composition

• The inversion of T, or T-1 simply computes the inverse mapping to T.

Page 24: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 24

Inversion

surface

lexical lexical

surfacea t

c

c s

ta ^ cPL a t PL^

ta sc

T

T-1

Page 25: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 25

Inversion

• To invert a transducer– we switch the order of the complex symbols,

i.e. every i:o becomes o:i– or we leave the transducer alone, and slightly

change the parsing algorithm.

• Practical consequences:– Transducer is reversible– We can use the exactly the same transducer to

perform either analysis or generation.

Page 26: 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.

5.11.2002 CSA3050: NLP Algorithms 26

Closure Properties of FSTs

Relations computed by FSTs are – closed under

• inversion

• union

• composition

– not closed (in general) under

• intersection. However intersection is possible provided that we restrict the class of transducers.

• complementation

• subtraction