Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008
description
Transcript of Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008
![Page 1: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/1.jpg)
Unsupervised Approaches to Sequence Tagging, Morphology Induction, and
Lexical Resource Acquisition
Reza Bosaghzadeh & Nathan Schneider
LS2 ~ 1 December 2008
![Page 2: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/2.jpg)
Unsupervised Approaches to Morphology
• Morphology refers to the internal structure of words– A morpheme is a minimal meaningful
linguistic unit–Morpheme segmentation is the
process of dividing words into their component morphemes
unsupervised learning
.
![Page 3: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/3.jpg)
Unsupervised Approaches to Morphology
• Morphology refers to the internal structure of words– A morpheme is a minimal meaningful
linguistic unit–Morpheme segmentation is the process of
dividing words into their component morphemes
un-supervise-d learn-ing–Word segmentation is the process of
finding word boundaries in a stream of speech or text
unsupervisedlearningofnaturallanguage
![Page 4: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/4.jpg)
Unsupervised Approaches to Morphology
• Morphology refers to the internal structure of words– A morpheme is a minimal meaningful
linguistic unit–Morpheme segmentation is the process of
dividing words into their component morphemes
un-supervise-d learn-ing–Word segmentation is the process of
finding word boundaries in a stream of speech or textunsupervised learning of natural language
![Page 5: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/5.jpg)
ParaMor: Monson et al. (2007, 2008)
• Learns inflectional paradigms from raw text– Requires only the vocabulary of a corpus– Looks at word counts of substrings, and
proposes (stem, suffix) pairings based on type frequency
• 3-stage algorithm– Stage 1: Candidate paradigms based on
frequencies– Stages 2-3: Refinement of paradigm set via
merging and filtering• Paradigms can be used for morpheme
segmentation or stemming
![Page 6: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/6.jpg)
ParaMor: Monson et al. (2007, 2008)
speak dancehablar bailarhablo bailohablamos bailamoshablan bailan… …• A sampling of Spanish verb
conjugations
![Page 7: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/7.jpg)
ParaMor: Monson et al. (2007, 2008)
speak dancehablar bailarhablo bailohablamos bailamoshablan bailan… …• A proposed paradigm with stems {habl,
bail} and suffixes {-ar, -o, -amos, -an}
![Page 8: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/8.jpg)
ParaMor: Monson et al. (2007, 2008)
speak dance buyhablar bailar comprarhablo bailo comprohablamos bailamos compramoshablan bailan compran… … …• Same paradigm from previous slide,
but with stems {habl, bail, compr}
![Page 9: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/9.jpg)
ParaMor: Monson et al. (2007, 2008)
speak dancehablar bailarhablo bailohablamos bailamoshablan bailan… …• From just this list, other paradigm analyses
(which happen to be incorrect) are possible
![Page 10: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/10.jpg)
ParaMor: Monson et al. (2007, 2008)
speak dancehablar bailarhablo bailohablamos bailamoshablan bailan… …• Another possibility: stems {hab, bai},
suffixes {-lar, -lo, -lamos, -lan}
![Page 11: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/11.jpg)
ParaMor: Monson et al. (2007, 2008)
speak dance buyhablar bailar comprarhablo bailo comprohablamos bailamos compramoshablan bailan compran… … …• Spurious segmentations—this paradigm
doesn’t generalize to comprar (or most verbs)
![Page 12: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/12.jpg)
ParaMor: Monson et al. (2007, 2008)
• What if not all conjugations were in the corpus?
speak dance buyhablar bailar comprar
bailo comprohablamos bailamos compramoshablan… … …
![Page 13: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/13.jpg)
ParaMor: Monson et al. (2007, 2008)
• We have two similar paradigms that we want to merge
speak dance buyhablar bailar comprar
bailo comprohablamos bailamos compramoshablan… … …
![Page 14: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/14.jpg)
ParaMor: Monson et al. (2007, 2008)
speak dance buyhablar bailar comprarhablo bailo comprohablamos bailamos compramoshablan bailan compran… … …• This amounts to smoothing, or
“hallucinating” out-of-vocabulary items
![Page 15: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/15.jpg)
ParaMor: Monson et al. (2007, 2008)
• Heuristic-based, deterministic algorithm can learn inflectional paradigms from raw text
• Paradigms can be used straightforwardly to predict segmentations– Combining the outputs of ParaMor and
Morfessor (another system) won the segmentation task at MorphoChallenge 2008 for every language: English, Arabic, Turkish, German, and Finnish
• Currently, ParaMor assumes suffix-based morphology
![Page 16: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/16.jpg)
• Word segmentation results – comparison
• See Narges & Andreas’s presentation for more on this model
Goldwater Unigram DPGoldwater Bigram HDP
Goldwater et al. (2006; in submission)
![Page 17: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/17.jpg)
Multilingual morpheme segmentation Snyder & Barzilay (2008)
speak ES speak FRhablar parlerhablo parlehablamos parlonshablan parlent… …• Abstract morphemes cross
languages: (ar, er), (o, e), (amos, ons), (an, ent), (habl, parl)
• Considers parallel phrases and tries to find morpheme correspondences
• Stray morphemes don’t correspond across languages
![Page 18: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/18.jpg)
Morphology papers: inputs & outputs
![Page 19: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/19.jpg)
Narrative events: Chambers & Jurafsky (2008)
• Given a corpus, identifies related events that constitute a “narrative” and (when possible) predict their typical temporal ordering– E.g.: CRIMINAL PROSECUTION narrative, with
verbs: arrest, accuse, plead, testify, acquit/convict
• Key insight: related events tend to share a participant in a document– The common participant may fill different
syntactic/semantic roles with respect to verbs: arrest.OBJECT, accuse.OBJECT, plead.SUBJECT
![Page 20: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/20.jpg)
Narrative events: Chambers & Jurafsky (2008)
• A temporal classifier can reconstruct pairwise canonical event orderings, producing a directed graph for each narrative
![Page 21: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/21.jpg)
Narrative events: Grenager & Manning (2006)
• From dependency parses, a generative model predicts semantic roles corresponding to each verb’s arguments, as well as their syntactic realizations– PropBank-style: ARG0, ARG1, etc. per verb (do
not necessarily correspond across verbs)– Learned syntactic patterns of the form:
(subj=give.ARG0, verb=give, np#1=give.ARG2, np#2=give.ARG1) or (subj=give.ARG0, verb=give, np#2=give.ARG1, pp_to=give.ARG2)
• Used for semantic role labeling
![Page 22: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/22.jpg)
“Semanticity”: Our proposed scale of semantic richness
• text < POS < syntax/morphology/alignments < coreference/semantic roles/temporal ordering < translations/narrative event sequences
• We score each model’s inputs and outputs on this scale, and call the input-to-output increase “semantic gain”– Haghighi et al.’s bilingual lexicon induction wins
in this respect, going from raw text to lexical translations
![Page 23: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/23.jpg)
Robustness to language variation• About half of the papers we examined
had English-only evaluations• We considered which techniques were
most adaptable to other (esp. resource-poor) languages. Two main factors:– Reliance on existing tools/resources for
preprocessing (parsers, coreference resolvers, …)
– Any linguistic specificity in the model (e.g. suffix-based morphology)
![Page 24: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/24.jpg)
SummaryWe examined three areas of unsupervised NLP:
1. Sequence tagging: How can we predict POS (or topic) tags for words in sequence?
2. Morphology: How are words put together from morphemes (and how can we break them apart)?
3. Lexical resources: How can we identify lexical translations, semantic roles and argument frames, or narrative event sequences from text?
In eight recent papers we found a variety of approaches, including heuristic algorithms, Bayesian methods, and EM-style techniques.
![Page 25: Reza Bosaghzadeh & Nathan Schneider LS2 ~ 1 December 2008](https://reader035.fdocuments.net/reader035/viewer/2022062222/568160aa550346895dcfcaf8/html5/thumbnails/25.jpg)
Thanks to Noah and Kevin for their feedback on the paper; Andreas and Narges for their collaboration on the presentations; and all of you for giving us your attention!
Questions?
un-supervise-d learn-inghablar bailarhablo bailohablamos
bailamos
hablan bailan
subj=give.ARG0 verb=give np#1=give.ARG2 np#2=give.ARG1