CSA4050: Advanced Topics in NLP

16
CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology

description

CSA4050: Advanced Topics in NLP. Computational Morphology II Introduction 2 Level Morphology. The Problem. So far we have assumed that words are formed by strict concatenation of component morphemes as in the following example: en + large + ment + s - PowerPoint PPT Presentation

Transcript of CSA4050: Advanced Topics in NLP

Page 1: CSA4050: Advanced Topics in NLP

CSA4050: Advanced Topicsin NLP

Computational Morphology II

Introduction 2 Level Morphology

Page 2: CSA4050: Advanced Topics in NLP

4.12.2001 CSA405 Lecture 2lev 2

The Problem

• So far we have assumed that words are formed by strict concatenation of component morphemes as in the following example:en + large + ment + s

• This assumption is convenient because it imposes a 1:1 correspondence between segmentation of the string and lookup of lexical items (which may be different types e.g. roots, affixes, particles etc)

• The problem is that this is an unrealistic assumption to make.

Page 3: CSA4050: Advanced Topics in NLP

4.12.2001 CSA405 Lecture 2lev 3

English Spelling Rules

• Final consonant doublingbegin + ing = beginning

• s to es church + s = churches

• y to i carry + ed = carried

• Final e deletionrake + ing = raking

• n to min + practical = impractical

Page 4: CSA4050: Advanced Topics in NLP

4.12.2001 CSA405 Lecture 2lev 4

Semitic Languages

dhalt

dhalt

dahal

dahlet

dhalna

dhaltu

dahlu

• Deletion of vowel• Changes or insertion

of vowel• Non-concatenative

morphology

[in examples h should be crossed]

Page 5: CSA4050: Advanced Topics in NLP

4.12.2001 CSA405 Lecture 2lev 5

Handling Spelling Rules

• Such phenomena usually occur at morpheme boundaries, and prevent direct lookup of the surface string in the lexicon.

• The solution is to suppose that two strings are involved:

• The surface string: that which appears on the page• The lexical string: that which is used to index

items in the lexicon.• What kind of mapping exists between the two

strings?

Page 6: CSA4050: Advanced Topics in NLP

4.12.2001 CSA405 Lecture 2lev 6

Lexical Transformations

SURFACE STRING

LEXICAL STRING

Page 7: CSA4050: Advanced Topics in NLP

4.12.2001 CSA405 Lecture 2lev 7

Phonological Rules

• Morphological rules are a reflection of phonological changes.

• Assumption: lexical/surface transformation is rule governed.

• Phonological rules systems had been extensively studied from the point of view of generative linguistics under Chomsky during the 1970s

Page 8: CSA4050: Advanced Topics in NLP

4.12.2001 CSA405 Lecture 2lev 8

Typical Phonological Rule

• Typical rule has the following shape Phon1 -> Phon2//Lcontext __ Rcontext

• Meaning: Phoneme Phon1 is transformed to phoneme Phon2 if it occures between left context Lcontext and right context Rcontext

• Example [B] -> [P] // __ #

• B is pronounced like P if it is word final (cf kelb)

Page 9: CSA4050: Advanced Topics in NLP

4.12.2001 CSA405 Lecture 2lev 9

Properties of Phonological Rules within the Generative Tradition

• Rules are rewrite rules

• Rules apply sequentially

• Rules are ordered

• Rules may act upon their own output (cyclic rules)

• Effects of rules are not always reversible

• Collections of rules have Turing power

Page 10: CSA4050: Advanced Topics in NLP

4.12.2001 CSA405 Lecture 2lev 10

C. Douglas Johnson (1972)

• A theory of phonology with the right properties could be implemented using only finite state machinery.

• Each rule is associated with a finite state transducer (FST).• All rules operated in simultaneously, thus eliminating the

delicate problems of ordering associated with sequential cascades of rules.

• The collection of FS rules operating in parallel is mathematically equivalent to a single FST representing the intersection of the component FSTs

• Johnson’s work was mainly theoretical. He was not involved with computational issues, in particular the issue of computing the intersection of multiple FSTs.

Page 11: CSA4050: Advanced Topics in NLP

4.12.2001 CSA405 Lecture 2lev 11

Finite State Machinery

• FS Automaton• For recognition and

generation of regular languages.

• All operations over regular languages have corresponding operations over corresponding FSAs

• FS Transducer• Like FSAs but with output

as well as input• For recognition and

generation of regular relations.

• Some operations over regular languages do not have corresponding operations over corresponding FSTs

Page 12: CSA4050: Advanced Topics in NLP

4.12.2001 CSA405 Lecture 2lev 12

Kimmo Koskenniemi (1983)

• Worked on morphology of Finnish and came up with a system of finite state transducers.

• Came up with a computational framework for executing collections of finite state transducers in parallel.

Page 13: CSA4050: Advanced Topics in NLP

4.12.2001 CSA405 Lecture 2lev 13

Koskenniemi’s Model

SURFACE STRING

LEXICAL STRING

FST1 FST2 FST3 … FSTnInterpreter executes round-robin keeping FSTs in lock-step before moving head

Page 14: CSA4050: Advanced Topics in NLP

4.12.2001 CSA405 Lecture 2lev 14

Martin Kay and Ron Kaplan (1981)

• Kay and Kaplan (both at Xerox PARC) were very interested in the computational issues underlying morphological processing.

• In particular, they studied the problems of– How to combine FSTs in parallel (computing the

intersection of regular relations)

– How to combine FSTs in series (computing the composition of FSTs).

• Restrictions on rules have pleasant consequences

Page 15: CSA4050: Advanced Topics in NLP

4.12.2001 CSA405 Lecture 2lev 15

Restrictions on Rules

• With the restriction that a rule shall not apply to its own output, Kaplan and Kay showed that the result of combining the corresponding relations under the under the operations of intersection, composition and union remains within a closed subclass of those computable by FSTs.

• They then spent many years designing and implementing a calculus for describing and combining FSTs based upon regular expressions.

Page 16: CSA4050: Advanced Topics in NLP

4.12.2001 CSA405 Lecture 2lev 16

Summary

GenerativePhonology

ChomskyGenerativeTradition

MultilevelCascadesof Rules

JohnsonParallel Rules

Kaplan/KayCalculus

KoskiniemmiParallel Rules

KIMMOPC-Kimmo

Xerox Toolsxfst/twolc/lexc