6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing...

81
06/23/22 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    221
  • download

    1

Transcript of 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing...

Page 1: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 1/

Morphology and Finite-state Transducers Part 2

ICS 482: Natural Language Processing

Lecture 6Husni Al-Muhtaseb

Page 2: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 2/

ICS 482: Natural Language Processing

Lecture 6

Morphology and Finite-state Transducers Part 2

Husni Al-Muhtaseb

الرحيم الرحمن الله بسم

Page 3: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

NLP Credits and Acknowledgment

These slides were adapted from presentations of the Authors of the

bookSPEECH and LANGUAGE PROCESSING:

An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

and some modifications from presentations found in the WEB by

several scholars including the following

Page 4: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

NLP Credits and Acknowledgment

If your name is missing please contact memuhtaseb

AtKfupm.

Edu.sa

Page 5: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

NLP Credits and AcknowledgmentHusni Al-MuhtasebJames MartinJim MartinDan JurafskySandiway FongSong young inPaula MatuszekMary-Angela

PapalaskariDick Crouch Tracy KinL. Venkata

SubramaniamMartin Volk Bruce R. MaximJan HajičSrinath SrinivasaSimeon NtafosPaolo PirjanianRicardo VilaltaTom Lenaerts

Heshaam Feili Björn GambäckChristian Korthals Thomas G.

DietterichDevika

SubramanianDuminda

Wijesekera Lee McCluskey David J. KriegmanKathleen McKeownMichael J. CiaraldiDavid FinkelMin-Yen KanAndreas Geyer-

Schulz Franz J. KurfessTim FininNadjet BouayadKathy McCoyHans Uszkoreit Azadeh Maghsoodi

Khurshid Ahmad

Staffan Larsson

Robert Wilensky

Feiyu Xu

Jakub Piskorski

Rohini Srihari

Mark Sanderson

Andrew Elks

Marc Davis

Ray Larson

Jimmy Lin

Marti Hearst

Andrew McCallum

Nick KushmerickMark CravenChia-Hui ChangDiana MaynardJames Allan

Martha Palmerjulia hirschbergElaine RichChristof Monz Bonnie J. DorrNizar HabashMassimo PoesioDavid Goss-GrubbsThomas K HarrisJohn HutchinsAlexandros PotamianosMike RosnerLatifa Al-Sulaiti Giorgio Satta Jerry R. HobbsChristopher ManningHinrich SchützeAlexander GelbukhGina-Anne Levow Guitao GaoQing MaZeynep Altan

Page 6: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 6/

Previous Lectures• 1 Pre-start questionnaire• 2 Introduction and Phases of an NLP system• 2 NLP Applications• 3 Chatting with Alice• 3 Regular Expressions, Finite State Automata• 3 Regular languages• 4 Regular Expressions & Regular languages• 4 Deterministic & Non-deterministic FSAs• 5 Morphology: Inflectional & Derivational• 5 Parsing

Page 7: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 7/

Today’s Lecture

• Review of Morphology• Finite State Transducers• Stemming & Porter Stemmer

Page 8: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 8/

Reminder: Quiz 1 Next class

• Next time: Quiz – Ch 1!, 2, & 3 (Lecture presentations)– Do you need a sample quiz?

• What is the difference between a sample and a template?• Let me think – It might appear at the WebCt site on late

Saturday.

Page 9: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 9/

Introduction

State Machines (no probability)

• Finite State Automata (and Regular Expressions)

• Finite State Transducers

(English)

Morphology

Logical formalisms

(First-Order Logics)

Rule systems (and prob. version)

(e.g., (Prob.) Context-Free Grammars)

Syntax

Pragmatics

Discourse and Dialogue

Semantics

AI planners

Page 10: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 10/

English Morphology

• Morphology is the study of the ways that words are built up from smaller meaningful units called morphemes

• morpheme classes– Stems: The core meaning bearing units– Affixes: Adhere to stems to change their

meanings and grammatical functions– Example: unhappily

Page 11: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 11/

English Morphology

• We can also divide morphology up into two broad classes– Inflectional– Derivational

• Non English– Concatinative Morphology– Templatic Morphology

Page 12: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 12/

Word Classes

• By word class, we have in mind familiar notions like noun, verb, adjective and adverb

• Why to concerned with word classes?– The way that stems and affixes combine is based

to a large degree on the word class of the stem

Page 13: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 13/

Inflectional Morphology

• Word building process that serves grammatical function without changing the part of speech or the meaning of the stem

• The resulting word– Has the same word class as the original– Serves a grammatical/ semantic purpose different

from the original

Page 14: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 14/

Inflectional Morphology in English

on Nouns• PLURAL -s books• POSSESSIVE -’s Mary’son Verbs• 3 SINGULAR -s s/he knows• PAST TENSE -ed talked• PROGRESSIVE -ing talking• PAST PARTICIPLE -en, -ed written, talkedon Adjectives• COMPARATIVE -er longer• SUPERLATIVE -est longest

Page 15: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 15/

Nouns and Verbs (English)

• Nouns are simple– Markers for plural and possessive

• Verbs are slightly more complex– Markers appropriate to the tense of the verb

• Adjectives– Markers for comparative and superlative

Page 16: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 16/

Regulars and Irregulars• some words misbehave (refuse to follow the

rules)– Mouse/mice, goose/geese, ox/oxen– Go/went, fly/flew

• The terms regular and irregular will be used to refer to words that follow the rules and those that don’t.

Page 17: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 17/

Regular and Irregular Verbs

• Regulars…– Walk, walks, walking, walked, walked

• Irregulars– Eat, eats, eating, ate, eaten– Catch, catches, catching, caught, caught– Cut, cuts, cutting, cut, cut

Page 18: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 18/

Derivational Morphology

• word building process that creates new words, either by changing the meaning or changing the part of speech of the stem– Irregular meaning change– Changes of word class

Page 19: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 19/

Examples of derivational morphemes in English that change the part of speech

• ful (N → Adj) – pain → painful– beauty → beautiful– truth → truthful– cat → *catful– rain → *rainful

• ment (V → N) establish →

establishment

• ity (Adj → N) – pure → purity

• ly (Adj → Adv) – quick → quickly

• en (Adj → V) – wide → widen

Page 20: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 20/

Examples of derivational morphemes in English that change the meaning

• dis- – appear → disappear

• un- – comfortable → uncomfortable

• in- – accurate → inaccurate

• re- – generate → regenerate

• inter- – act → interact

Page 21: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 21/

Examples on Derivational Morphology

V → N

compute computer

nominate nominee

deport deportation

computerize computerization

N → V

computer computerize

A → N

furry furriness

apt aptitude

sincere sincerity

N → A

cat catty, catlike

hope hopeless

magic magical

V → A

love lovable

A → V

black blacken

modern modernize

Page 22: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 22/

Derivational Examples

• Verb/Adj to Noun

-ation computerize computerization

-ee appoint appointee

-er kill killer

-ness fuzzy fuzziness

Page 23: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 23/

Derivational Examples

• Noun/ Verb to Adj

-al Computation Computational

-able Embrace Embraceable

-less Clue Clueless

Page 24: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 24/

Compute

• Many paths are possible…• Start with compute

– Computer -> computerize -> computerization– Computation -> computational– Computer -> computerize -> computerizable– Compute -> computee

Page 25: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 25/

Templatic Morphology: Root Pattern Examples from Arabic

Word & Transliteration

MeaningWord & Transliteration

Meaning

<naâma> [ He slept [ناَم <naâ'imun> [ Sleeping [نائم�

<yanaâmu> [ He sleeps [يناَم�<munawwamun> [ [منَّو�َم�

Under hypnotic

<nam> [ Sleep [نم� <na'ûmun> [ Late riser [نؤوَم�

<tanwçmun> [ [تنَّويم�

Lulling to sleep <'anwamu> [ [أنَّوَمMore given to sleep

<manaâmun> [ [مناَم�

Dream<nawwaâmun> [ اَم� � [نَّو

The most given to sleep

<nawmatun> [نَّومة]

Of one sleep<manaâmun> [ [مناَم�

Dormitory

<nawwaâmatun> [ [نَّوامة�

Sleeper<'an yanaâma> [ يناَم [أن

That he sleeps

<nawmiyyatun> [ [نَّومية�

Pertaining to sleep

<munawwamun> [ [منَّو!َم�

hypnotic

Page 26: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 26/

Morphotactic Models

• English nominal inflection

q0 q2q1

plural (-s)reg-n

irreg-sg-n

irreg-pl-n

•Inputs: cats, goose, geese

•reg-n: regular noun

•irreg-pl-n: irregular plural noun

•irreg-sg-n: irregular singular noun

Page 27: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 27/

• Derivational morphology: adjective fragment

q3

q5

q4

q0

q1 q2un-

adj-root1

-er, -ly, -est

adj-root1

adj-root2

-er, -est

• Adj-root1: clear, happy, real

• Adj-root2: big, red

Page 28: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 28/

Using FSAs to Represent the Lexicon and Do Morphological Recognition

• Lexicon: We can expand each non-terminal in our NFSA into each stem in its class (e.g. adj_root2 = {big, red}) and expand each such stem to the letters it includes (e.g. red r e d, big b i g)

q0

q1

r e

q2

q4

q3

-er, -est

db

gq5 q6i

q7

Page 29: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 29/

Limitations• To cover all of English will require very

large FSAs with consequent search problems– Adding new items to the lexicon means re-

computing the FSA– Non-determinism

• FSAs can only tell us whether a word is in the language or not – what if we want to know more?– What is the stem?– What are the affixes?– We used this information to build our FSA:

can we get it back?

Page 30: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 30/

Parsing with Finite State Transducers

• cats cat +N +PL• Kimmo Koskenniemi’s two-level morphology

– Words represented as correspondences between lexical level (the morphemes) and surface level (the orthographic word)

– Morphological parsing :building mappings between the lexical and surface levels

c a t +N +PL

c a t s

Page 31: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 31/

Finite State Transducers

• FSTs map between one set of symbols and another using an FSA whose alphabet is composed of pairs of symbols from input and output alphabets

• In general, FSTs can be used for– Translator (Hello:مرحبا)– Parser/generator (Hello:How may I help you?)– To map between the lexical and surface levels of

Kimmo’s 2-level morphology

Page 32: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 32/

• FST is a 5-tuple consisting of– Q: set of states {q0,q1,q2,q3,q4} : an alphabet of complex symbols, each is an

i/o pair such that i I (an input alphabet) and o O (an output alphabet) and is in I x O

– q0: a start state– F: a set of final states in Q {q4} (q,i:o): a transition function mapping Q x to

Q– Emphatic Sheep Quizzical Cow

q0 q4q1 q2 q3

b:m a:o a:oa:o !:?

Page 33: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 33/

FST for a 2-level Lexicon

• Example

Reg-n Irreg-pl-n Irreg-sg-n

c a t g o:e o:e s e g o o s e

q0 q1 q2 q3c a t

q1 q3 q4q2

se:o e:o e

q0 q5

g

Page 34: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 34/

FST for English Nominal Inflection

q0 q7

+PL:^s#

Combining (cascade or composition) this FSA with FSAs for each noun type replaces e.g. reg-n with every regular noun representation in the lexicon

q1 q4

q2 q5

q3 q6

reg-n

irreg-n-sg

irreg-n-pl

+N:

+PL:-s#

+SG:-#

+SG:-#

+N:

+N:

Page 35: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 35/

Orthographic Rules and FSTs

• Define additional FSTs to implement rules such as consonant doubling (beg begging), ‘e’ deletion (make making), ‘e’ insertion (watch watches), etc.

Lexical f o x +N +PL

Intermediate f o x ^ s #

Surface f o x e s

Page 36: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 36/

• Note: These FSTs can be used for generation as well as recognition by simply exchanging the input and output alphabets (e.g. ^s#:+PL)

Page 37: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 37/

FSAs and the Lexicon

• First we’ll capture the morphotactics– The rules governing the ordering of affixes in a

language.

• Then we’ll add in the actual stems

Page 38: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 38/

Simple Rules

Page 39: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 39/

Adding the Words

But it does not express that:

•Reg nouns ending in –s, -z, -sh, -ch, -x -> es (kiss, waltz, bush, rich, box)

•Reg nouns ending –y preceded by a consonant change the –y to -i

Page 40: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 40/

Derivational Rules

[nouni] eg. hospital[adjal] eg. formal[adjous] eg. arduous[verbj] eg. speculate[verbk] eg. conserve

Page 41: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 41/

Parsing/Generation vs. Recognition

• Recognition is usually not quite what we need. – Usually if we find some string in the language we

need to find the structure in it (parsing)– Or we have some structure and we want to produce

a surface form (production/ generation)

Page 42: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 42/

In other words

• Given a word we need to find: the stem and its class and properties (parsing)

• Or we have a stem and its class and properties and we want to produce the word (production/generation)

• Example (parsing)– From “cats” to “cat +N +PL”– From “lies” to ……

Page 43: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 43/

Applications

• The kind of parsing we’re talking about is normally called morphological analysis

• It can either be – An important stand-alone component of an

application (spelling correction, information retrieval)

– Or simply a link in a chain of processing

Page 44: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 44/

Finite State Transducers

• The simple story– Add another tape– Add extra symbols to the transitions

– On one tape we read “cats”, on the other we write “cat +N +PL”, or the other way around.

Page 45: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 45/

FSTs

generationparsing

Page 46: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 46/

Transitions

• c:c means read a c on one tape and write a c on the other

• +N:ε means read a +N symbol on one tape and write nothing on the other

• +PL:s means read +PL and write an s

c:c a:a t:t +N:ε +PL:s

Page 47: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 47/

Typical Uses

• Typically, we’ll read from one tape using the first symbol on the machine transitions (just as in a simple FSA).

• And we’ll write to the second tape using the other symbols on the transitions.

Page 48: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 48/

Ambiguity

• Recall that in non-deterministic recognition multiple paths through a machine may lead to an accept state.– Didn’t matter which path was actually traversed

• In FSTs the path to an accept state does matter since different paths represent different parses and different outputs will result

Page 49: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 49/

Ambiguity

• What’s the right parse for– Unionizable– Union-ize-able– Un-ion-ize-able

• Each represents a valid path through the derivational morphology machine.

Page 50: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 50/

Ambiguity

• There are a number of ways to deal with this problem– Simply take the first output found– Find all the possible outputs (all paths) and return

them all (without choosing)– Bias the search so that only one or a few likely

paths are explored

Page 51: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 51/

More Details

• Its not always as easy as – “cat +N +PL” <-> “cats”

• There are geese, mice and oxen• There are also spelling/ pronunciation

changes that go along with inflectional changes

Page 52: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 52/

Multi-Tape Machines

• To deal with this we can simply add more tapes and use the output of one tape machine as the input to the next

• So to handle irregular spelling changes we’ll add intermediate tapes with intermediate symbols

Page 53: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 53/

Spelling Rules and FSTs

Name Description of Rule Example

Consonant doubling

1-letter consonant doubled before -ing/-ed

beg/begging

E deletion Silent e dropped before

-ing and –edmake/making

E insertion e added after –s, -z, -x,

-ch, -sh before -swatch/watches

Y replacement -y changes to –ie before

-s, and to -i before -edtry/tries

K insertion verbs ending with vowel + -c add -k

panic/panicked

Page 54: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 54/

Multi-Level Tape Machines

• We use one machine to transducer between the lexical and the intermediate level, and another to handle the spelling changes to the surface tape

Page 55: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 55/

Lexical to Intermediate Level

Machine

Page 56: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 56/

FST for the E-insertion Rule: Intermediate to Surface

q0 q3 q4

q5

q1 q2

^:

:e

^:

^:

z, s, xz, s, x

z, s, x

s

#other

z, x#, other

#, other

#

other

s

• The add an “e” rule as in fox^s# <-> foxes #__^/ s

z

s

x

e

MachineMore

Page 57: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 57/

Note

• A key feature of this machine is that it doesn’t do anything to inputs to which it doesn’t apply.

• Meaning that: they are written out unchanged to the output tape.

Page 58: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 58/

English Spelling Changes

• We use one machine to transduce between the lexical and the intermediate level, and another to handle the spelling changes to the surface tape

Page 59: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 59/

Foxes

Machine 1

Machine 2

Page 60: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 60/

Overall Plan

Page 61: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 61/

Final Scheme: Part 1

Page 62: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 62/

Final Scheme: Part 2

Page 63: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 63/

Stemming vs Morphology

• Sometimes you just need to know the stem of a word and you don’t care about the structure.

• In fact you may not even care if you get the right stem, as long as you get a consistent string.

• This is stemming… it most often shows up in IR (Information Retrieval) applications

Page 64: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 64/

Stemming in IR

• Run a stemmer on the documents to be indexed

• Run a stemmer on users queries• Match

– This is basically a form of hashing

Page 65: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 65/

Porter Stemmer

• No lexicon needed• Basically a set of staged sets of rewrite rules

that strip suffixes• Handles both inflectional and derivational

suffixes• Doesn’t guarantee that the resulting stem is

really a stem • Lack of guarantee doesn’t matter for IR

Page 66: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 66/

Porter Example

• Computerization– ization -> -ize computerize– ize -> ε computer

• Other Rules– ing -> ε (motoring -> motor)– ational -> ate (relational -> relate)

• Practice: See Poter’s Stemmer at Appendix B and suggest some rules for A KFUPM Arabic Stemmer

Page 67: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 67/

Porter Stemmer

• The original exposition of the Porter stemmer did not describe it as a transducer but…– Each stage is separate transducer– The stages can be composed to get one big

transducer

Page 68: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 68/

Human Morphological Processing: How do people represent words?

• Hypotheses:– Full listing hypothesis: words listed – Minimum redundancy hypothesis: morphemes

listed

• Experimental evidence:– Priming experiments (Does seeing/ hearing one

word facilitate recognition of another?)– Regularly inflected forms prime stem but not

derived forms – But spoken derived words can prime stems if

they are semantically close (e.g. government/govern but not department/depart)

Page 69: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 69/

Reminder: Quiz 1 Next class

• Next time: Quiz – Ch 1!, 2, & 3 (Lecture presentations)– Do you need a sample quiz?

• What is the difference between a sample and a template?• Let me think – It might appear at the WebCt site on late

Saturday.

Page 70: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 70/

More Examples

Page 71: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 71/

Using FSTs for orthographic rules

#__/ s

z

s

x

e

#

q0 q1 q2 q3 q4

q5:̂#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

Page 72: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 72/

Using FSTs for orthographic rules

fox^s#…we get to q1 with ‘x’

#

q0 q1 q2 q3 q4

q5:̂#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

Page 73: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 73/

Using FSTs for orthographic rules

#

q0 q1 q2 q3 q4

q5:̂#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

fox^s#…we get to q2 with ‘^’

Page 74: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 74/

Using FSTs for orthographic rules

#

q0 q1 q2 q3 q4

q5:̂#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

fox^s#…we can get to q3 with ‘NULL’

Page 75: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 75/

Using FSTs for orthographic rules

#

q0 q1 q2 q3 q4

q5:̂#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

fox^s#…we also get to q5 with ‘s’but we don’t want to!

Page 76: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 76/

#

q0 q1 q2 q3 q4

q5:̂#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

fox^s#…we also get to q5 with ‘s’but we don’t want to!

So why is this transition there??friend^ship, ?fox^s^s (= foxes’s)

Page 77: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 77/

#

q0 q1 q2 q3 q4

q5:̂#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

fox^s#…q4 with s

Page 78: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 78/

#

q0 q1 q2 q3 q4

q5:̂#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

fox^s#…q0 with # (accepting state)

Back

Page 79: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 79/

#

q0 q1 q2 q3 q4

q5:̂#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

arizona: we leave q0 but return

Other transitions…

Page 80: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 80/

#

q0 q1 q2 q3 q4

q5:̂#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

m i s s ^ s

Other transitions…

Page 81: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb.

04/18/23 81/

الله ورحمة عليكم السالم

اللهم سبحانكال أن أشهد وبحمدكأستغفرك أنت إال إله

اليك وأتَّوب