Lecture 9: Part of Speech - University of Virginia School...

21
Lecture 9: Part of Speech Kai-Wei Chang CS @ University of Virginia [email protected] Couse webpage: http://kwchang.net/teaching/NLP16 1 CS6501 Natural Language Processing

Transcript of Lecture 9: Part of Speech - University of Virginia School...

Page 1: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

Lecture 9: Part of Speech

Kai-Wei ChangCS @ University of Virginia

[email protected]

Couse webpage: http://kwchang.net/teaching/NLP16

1CS6501 Natural Language Processing

Page 2: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

This lecture

vParts of speech (POS) vPOS Tagsets

2CS6501 Natural Language Processing

Page 3: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

CS6501 Natural Language Processing 3

Parts of Speech

vTraditional parts of speechv~ 8 of them

Page 4: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

CS6501 Natural Language Processing 4

POS examples

vN noun chair, bandwidth, pacingvV verb study, debate, munchvADJ adjective purple, tall, ridiculousvADV adverb unfortunately, slowlyvP preposition of, by, tovPRO pronoun I, me, minevDET determiner the, a, that, those

Page 5: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

CS6501 Natural Language Processing 5

Parts of Speech

vA.k.a. parts-of-speech, lexical categories, word classes, morphological classes, lexical tags...

v Lots of debate within linguistics about the number, nature, and universality of these

Page 6: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

CS6501 Natural Language Processing 6

POS Tagging

vThe process of assigning a part-of-speech to each word in a collection (sentence).

WORD tag

the DETkoala Nput Vthe DETkeys Non Pthe DETtable N

Page 7: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

CS6501 Natural Language Processing 7

Why is POS Tagging Useful?

vFirst step of a vast number of practical tasksvParsing

v Need to know if a word is an N or V before you can parse

v Information extractionv Finding names, relations, etc.

vSpeech synthesis/recognitionv OBject obJECTv OVERflow overFLOWv DIScount disCOUNTv CONtent conTENT

vMachine Translation

Page 8: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

CS6501 Natural Language Processing 8

Open and Closed Classes

v Closed class: a small fixed membership v Prepositions: of, in, by, …v Pronouns: I, you, she, mine, his, them, …v Usually function words (short common words which

play a role in grammar)

v Open class: new ones can be createdv English has 4: Nouns, Verbs, Adjectives, Adverbsv Many languages have these 4, but not all!

Page 9: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

CS6501 Natural Language Processing 9

Open Class Words

v Nounsv Proper nouns (Boulder, Granby, Eli Manning)v Common nouns (the rest). v Count nouns and mass nouns

v Count: have plurals, get counted: goat/goats, one goat, two goats

v Mass: don’t get counted (snow, salt, communism) (*two snows)

v Verbsv In English, have morphological affixes (eat/eats/eaten)

Page 10: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

CS6501 Natural Language Processing 10

Closed Class Words

Examples:vprepositions: on, under, over, …vparticles: up, down, on, off, …vdeterminers: a, an, the, …vpronouns: she, who, I, ..vconjunctions: and, but, or, …vauxiliary verbs: can, may should, …vnumerals: one, two, three, third, …

Page 11: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

CS6501 Natural Language Processing 11

Prepositions from CELEX

CELEX:onlinedictionaryFrequencycountsarefromCOBUILD16-billion-wordcorpus

Page 12: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

CS6501 Natural Language Processing 12

English Particles

Page 13: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

CS6501 Natural Language Processing 13

Conjunctions

Page 14: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

CS6501 Natural Language Processing 14

Choosing a Tagset

v Could pick very coarse tagsetsv N, V, Adj, Adv, Other

v More commonly used set is finer grainedv E.g., “Penn TreeBank tagset”, 45 tags: PRP$, WRB,

WP$, VBGv Brown cropus, 87 tags.

v Prague Dependency Treebank (Czech)v 4452 tagsv AAFP3----3N----: (nejnezajímavějším)

Adj Regular Feminine Plural….Superlative [Hajic 2006, VMC tutorial]

Page 15: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

CS6501 Natural Language Processing 15

Penn TreeBank POS Tagset

Page 16: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

CS6501 Natural Language Processing 16

Using the Penn Tagset

vThe/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./.

Page 17: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

Universal Tag set

v ~ 12 different tagsvNOUN, VERB, ADJ, ADV, PRON, DET, ADP,

NUM, CONJ, PRT, “.”, X

CS6501 Natural Language Processing 17

Page 18: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

CS6501 Natural Language Processing 18

POS Tagging v.s. Word clustering

vWords often have more than one POS: backvThe back door = JJvOn my back = NNvWin the voters back = RBvPromised to back the bill = VB

These examples from Dekang Lin

Page 19: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

CS6501 Natural Language Processing 19

How Hard is POS Tagging?

Page 20: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

POS tag sequences

vSome tag sequences more likely occur than others

vPOS Ngram viewhttps://books.google.com/ngrams/graph?content=_ADJ_+_NOUN_%2C_ADV_+_NOUN_%2C+_ADV_+_VERB_

CS6501 Natural Language Processing 20

ExistingmethodsoftenmodelPOStaggingasasequencetagging problem

Page 21: Lecture 9: Part of Speech - University of Virginia School ...kc2wc/teaching/NLP16/slides/09-POS.pdfCS6501 Natural Language Processing 9 Open Class Words v Nouns v Proper nouns (Boulder,

Evaluation

vHow many words in the unseen test data can be tagged correctly?

vUsually evaluated on Penn TreebankvState of the art ~97% vTrivial baseline (most likely tag) ~94%vHuman performance ~97%

CS6501 Natural Language Processing 21