Parsing: Top-Down vs. Bottom-Up Parsing Algorithms Partial Parsing
Miguel Ballesteros -...
Transcript of Miguel Ballesteros -...
![Page 1: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/1.jpg)
1
Introduction to Parsing
Miguel Ballesteros
Algorithms for NLP Course. 7-11
Using some materials of Joakim Nivre from Uppsala University
![Page 2: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/2.jpg)
2
Outline
● Input.● Output.● Mapping.● Models.● Evaluation
![Page 3: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/3.jpg)
3
In 711
● Phrase-structure parsing:– CKY algorithm.
– Earley's algorithm.
– Shift-Reduce
– PCFGs
– Weighted CKY
![Page 4: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/4.jpg)
4
In 711
● Dependency parsing:– Intro to dependency linguistics.
– Transition-based dependency parsing.
– Graph-based dependency parsing.
![Page 5: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/5.jpg)
5
Introduction
● Syntactic parsing is normally conceived as a structural prediction problem.– We map from an input X of sentences to an output
space Y of syntactic representations.
Input
She thinks Mary is nice to animals →
![Page 6: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/6.jpg)
Why Syntax Parsing?
● Parsing a sentence gives you
– “who did what to whom??” in a sentence.
– Linguistic structure in terms of Syntax.
This is useful for:– Machine translation (syntax is essential for translation)
– Web search (Google, for example)
– Information extraction.
– Many other tasks:● Text simplification.● Summarization● Etc.
![Page 7: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/7.jpg)
Why is parsing hard? Ambiguity
● Ambiguity. As in POS tagging or NER.
● Receives/inherits ambiguity from POS tagging.
![Page 8: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/8.jpg)
Why is parsing hard?
● Chris told us in the second lecture of the course that there is an infinite number of sentences in a natural language.
![Page 9: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/9.jpg)
Why is parsing hard?
● Chris told us in the second lecture that there is an infinite number of sentences in a language.
● Remember the example of adding n- times “Chomsky said that” to any sentence.
● Chris said that parsing is hard!
Parsing is hard!
![Page 10: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/10.jpg)
Why is parsing hard?
● Chris said that parsing is hard!● Chris said that Chris said that parsing is hard!
Parsing is hard!
![Page 11: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/11.jpg)
Why is parsing hard?
● Chris said that parsing is hard!● Chris said that Chris said that parsing is hard!● Chris said that Chris said Chris said that parsing is hard!
Parsing is hard! Parsing is hard!
![Page 12: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/12.jpg)
Why is parsing hard?
● We have to parse sentences that we have never seen when we trained our parser.
Or even harder:
● We have to parse sentences that have never We have to parse sentences that have never been written before.been written before.
![Page 13: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/13.jpg)
Why is Parsing hard?
● The man saw the girl with a telescope.– so... who had the telescope?
![Page 14: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/14.jpg)
Why is parsing hard? Ambiguity
● The man saw the girl with a telescope.– so... who had the telescope?
![Page 15: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/15.jpg)
15
Input
● We generally assume that an input x ∈ XX is a sentence consisting of a sequence x1 . . . xn of tokens.
● How do we delimit sentences?● How do we split sentences into tokens?● What properties do tokens have?
![Page 16: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/16.jpg)
16
Input
● In carefully edited “western” script, the delimitation of sentences is usually straightforward.
● In other writing systems, and in certain text genres like email and twitter, this task can be much more challenging.
In spoken language, many researchers would question that the sentence is a relevant unit at all.
![Page 17: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/17.jpg)
17
Input
● In syntactic parsing research, it is generally assumed that the delimitation of sentences is not part of the parsing task itself.
● However, parsing presupposes adequate sentence segmentation.
● Parsing results will be much worse if sentence segmentation is not correct.
![Page 18: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/18.jpg)
18
Input
Dr. Frederking sent an application last week.
– If we use a simple sentence splitting that assumes that dots divide sentences, then we have a problem...
● Dr . (1st sentence).● Frederking sent an application last week. (2nd sentence).
– And we just want one sentence!● Dr. Frederking sent an application last week.
![Page 19: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/19.jpg)
19
Input
● Whereas alphabetic scripts often contain reliable clues to word boundaries.
● The word segmentation problem in written Chinese is much more challenging.
![Page 20: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/20.jpg)
20
Input
● In some morphologically rich languages, word forms can be quite complex and include elements that in other languages would be realized as independent word forms.
![Page 21: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/21.jpg)
21
Input
● Whereas alphabetic scripts often contain reliable clues to word boundaries.
● The word segmentation problem in written Chinese is much more challenging.
● In some morphologically rich languages, word forms can be quite complex and include elements that in other languages would be realized as independent word forms.
● MuvaffakiyetsizleştiricileştiriveremeyebileceklerimizdeMuvaffakiyetsizleştiricileştiriveremeyebileceklerimizdenmişsinizcesinenmişsinizcesine
![Page 22: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/22.jpg)
22
Input
ev – (the) house
evler – (the) houses
evleriniz – your houses
evlerinizden – from your houses
evlerinizdendi – (he/she/it) was from your houses
● Turkish is an agglutinative language.● Turkish 2006 treebank and Turkish 2007 treebank.
![Page 23: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/23.jpg)
23
Input
● The question of whether word forms should be subjected to morphological segmentation prior to parsing is related to the more general question of what linguistic analysis of tokens (if any) should be carried out before parsing begins.
![Page 24: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/24.jpg)
24
Input
● there are systems (most) that presuppose that the tokens have been annotated for parts of speech, lemma, and morphosyntactic properties like case, number, tense, aspect
![Page 25: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/25.jpg)
25
Input
● we find systems that take only the raw tokens (as sequences of characters) as input to parsing and which may or may not provide the additional annotation as output of the parsing process.
![Page 26: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/26.jpg)
26
Input
● we find systems that take only the raw tokens (as sequences of characters) as input to parsing and which may or may not provide the additional annotation as output of the parsing process.
In any case, it is important to keep these In any case, it is important to keep these differences in mind when comparingdifferences in mind when comparingdifferent parsing systems.different parsing systems.
![Page 27: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/27.jpg)
27
Output
● An input sentence x X should be mapped to ∈a syntactic representation y Y, but what ∈counts as a syntactic representation of a sentence?
![Page 28: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/28.jpg)
28
Output
● An input sentence x X should be mapped to a ∈syntactic representation y Y, but what counts as a ∈syntactic representation of a sentence?
● Different linguistic theories have provided different answers to this question. – in terms of what properties and relations to represent (for
example, dependency or constituency).
– Or in how to delimit syntax from morphology.
– Or in how to delimit syntax from semantics.
![Page 29: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/29.jpg)
29
Output - Constituency
● Historically, the most common type of representation used in parsing is based on the notion of constituencyconstituency.
![Page 30: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/30.jpg)
30
Output - Constituency
● Historically, the most common type of representation used in parsing is based on the notion of constituencyconstituency.
● In a constituent structure (or phrase structure), a sentence is recursively decomposed into smaller segments that are categorized according to their internal structure into noun phrases, verb phrases, etc.
![Page 31: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/31.jpg)
31
Output - Constituency
● Historically, the most common type of representation used in parsing is based on the notion of constituencyconstituency.
● In a constituent structure (or phrase structure), a sentence is recursively decomposed into smaller segments that are categorized according to their internal structure into noun phrases, verb phrases, etc.
● CFGs!
![Page 32: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/32.jpg)
32
Output - Constituency
● Constituent structures are naturally induced by CFGs (Chomsky, 1956)
● Many theoretical frameworks:– Lexical Functional Grammar (LFG)
(Kaplan & Bresnan, 1982; Bresnan,2000),
– Tree Adjoining Grammar (TAG) (Joshi, 1985, 1997)
– Head-Driven Phrase Structure Grammar (HPSG) (Pollard & Sag, 1987, 1994).
![Page 33: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/33.jpg)
33
Output - Constituency
● They are also widely used in annotation schemes for treebanks, such as – The Penn Treebank scheme for English (Marcus et al.,
1993, 1994),
– Adaptations of this scheme that have been developed for Chinese (Xue et al., 2004), Korean (Han et al., 2002), Arabic (Maamouri & Bies, 2004), and Spanish (Moreno et al., 2003).
– And other languages, see SPMRL 2013 data sets, for example.
![Page 34: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/34.jpg)
34
Output - Constituency
![Page 35: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/35.jpg)
35
Output - Dependency
● Another kind of representation that has gained popularity in recent years is instead based on the notion of dependencydependency.
![Page 36: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/36.jpg)
36
Output - Dependency
● Another kind of representation that has gained popularity in recent years is instead based on the notion of dependencydependency.
● In a dependency structure, a sentence is analyzed by connecting its words by binary asymmetrical dependency relations, and words are categorized according to their functional role into subject, object, etc.
![Page 37: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/37.jpg)
37
Output - Dependency
● DependencyDependency structures are adopted in theoretical frameworks such as
● Functional Generative Description (Sgall et al., 1986)
● Meaning-Text Theory (Mel’ˇcuk, 1988)
They are used for treebank annotation especially for languages with free or flexible word order.
![Page 38: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/38.jpg)
38
Output - Dependency
● Prague Dependency Treebank of Czech (Hajiˇc et al., 2001; B ohmov a et al., 2003),
● Arabic (Hajiˇc et al., 2004), Basque (Aduriz et al., 2003), Danish (Kromann, 2003), Greek (Prokopidis et al., 2005), Russian (Boguslavsky et al., 2000), Slovene (Dˇzeroski et al., 2006), Turkish (Oflazer et al., 2003), and other languages.
● Dependency version of the Penn treebank.
Etc.
![Page 39: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/39.jpg)
39
Output - Dependency
![Page 40: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/40.jpg)
40
Output – Syntactic (+Semantic)
● A third kind of syntactic representation is found in categorial grammar, which connects syntactic (and semantic) analysis to inference in a logical calculus.
● In statistical parsing, categorial grammar is mainly represented by Combinatory Categorial Grammar (CCG) (Steedman, 2000),
● CCGbank (Hockenmaier & Steedman, 2007), a reannotation of the Wall Street Journal section of the Penn Treebank.
![Page 41: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/41.jpg)
41
Mapping
● The parsing problem.● how to characterize the mapping X → Y ?
– Being X the input and Y the output.
– Is it a function or a relation?
– What relation should hold between a sentence and its syntactic representation(s)?
![Page 42: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/42.jpg)
42
Mapping (grammaticality - 1)
● Historically, the notion of parsing was intimately tied to the notion of grammaticalitygrammaticality,
● Task of the parser is twofold:– to decide whether a sequence of tokens was a
grammatical sentence
– to derive every valid syntactic representation for the sentence
![Page 43: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/43.jpg)
43
Mapping (grammaticality - 2)
● This implies that the mapping is a relationrelation, not a function, where:– ambiguous sentences are mapped to more than one
representation
– ungrammatical sentences (token sequences) are mapped to nothing.
● The task of disambiguation, that is, of selecting the contextually most appropriate representation in cases of ambiguity, is on this view not part of parsing itself.
![Page 44: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/44.jpg)
44
Mapping (grammaticality - 3)
● Important distinction:
1- input x being grammatical or not with respect to a specific formal grammar G (say, a CFG)
2- and x being grammatical in a given natural language (say, English or Spanish).
● 1 is a very well defined problem.● 2 is not. Since there is no “exact” grammar“exact” grammar of English or Spanish.
And this is why parsing is hard.
We hope for the best :-) : solving 1 is a good approximation of solving 2.
![Page 45: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/45.jpg)
45
Mapping (optimality - 1)
● More recent.● The task of a parser is then to return the contextually most
appropriate representation for an input sentence, ● There could be many other “grammatical” or close to
grammatical representations.● A parser is supposed to return the output y* that maximizes
some mathematical function fM : X × Y → <<2
y* = argmax fM (x, y)
y
![Page 46: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/46.jpg)
46
Mapping (optimality - 2)
● Difficult problem because we lack knowledge of the true optimization function used by native speakers of the language.
● It is a basic assumption in most of current parsing research that we can approach it with annotated data in form of treebanks.
![Page 47: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/47.jpg)
47
Mapping (optimality - 3)
● We assume that the task of a parser is to map a sentence x є X to an optimal syntactic representation y є Y,
– And we expect that there is only one representation.
![Page 48: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/48.jpg)
48
Model
● A syntactic parser is based on a mathematical model M relating the input space X to the output space Y.
● M must provide a ranking or scoring of possible outputs for a given input.
![Page 49: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/49.jpg)
49
Model
● It is often natural to see a parsing model as M = (GEN, EVAL)– GEN: A generative component GEN that maps an input x
to a set of candidate analyses {y1 , . . . , yk }, that is, GEN(x) Y (for x X ).⊆ ∈
– An evaluative component EVAL that ranks candidate analyses via a numerical scoringscheme, that is, EVAL(y) ∈ << (real numbers)
(for y GEN(x)).∈
![Page 50: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/50.jpg)
50
Model
● Thus, what we want to find is
y* = argmax fM (x, y) = argmax EVAL (X,Y)
y y GEN(x)∈
● Both the generative and the evaluative component may include parameters that need to be estimated from empirical data using
statistical inference.
![Page 51: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/51.jpg)
51
Model
● Given that we have constructed a parsing model (with or without statistical learning), we need an efficient way of constructing and ranking the candidate analyses for a given input sentence.
● This is the inferenceinference problem for a parsing model.
![Page 52: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/52.jpg)
52
Evaluation
● The most common way of evaluating the accuracyaccuracy of a parser is to run the parser on a sample of sentences:– T = {(x1 , y1 ), . . . , (xm , ym )} :: the test set.
Assuming that the treebank annotation y i for each sentence xi is the preferred analysis.
![Page 53: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/53.jpg)
53
Evaluation
● The test set should not be touched during the creation and optimization of the system.
![Page 54: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/54.jpg)
54
Evaluation
● The simplest way of measuring the test set accuracy is to use the Exact Match score:
– Counts the exact number of sentences in which the parser output is identical to yi
![Page 55: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/55.jpg)
55
Evaluation
● The simplest way of measuring the test set accuracy is to use the Exact Match score:
– Counts the exact number of sentences in which the parser output is identical to yi
– It is a crude metric.
– Users might like it, though.
![Page 56: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/56.jpg)
56
Evaluation
● For constituency parsers:– PARSEVAL metrics (Black et al., 1991; Grishman et al., 1992)
– consider the number of matching constituents between the parser output and the gold standard.
![Page 57: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/57.jpg)
57
EvaluationCandidate Gold
X: a X:a
Y: b Z: b
Z: cd V: cd
-- Y: b c d
W: a b c d W: a b c d
![Page 58: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/58.jpg)
58
EvaluationCandidate Gold
X: a X:a
Y: b Z: b
Z: cd V: cd
-- Y: b c d
W: a b c d W: a b c d
● Precision: all have correct yield but we count labels. 2 have the correct label and the correct yield.
● This means 50% precision. 2/4
– For unlabeled, it would be 100% precision.
precision: number of correct constituents (yield) in parser output divided by number of constituents in the parser output
![Page 59: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/59.jpg)
59
EvaluationCandidate Gold
X: a X:a
Y: b Z: b
Z: cd V: cd
-- Y: b c d
W: a b c d W: a b c d
● Precision: all have correct yield but we count labels. 2 have the correct label and the correct yield.
● This means 50% precision. 2/4
– For unlabeled, it would be 100% precision.
![Page 60: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/60.jpg)
60
EvaluationCandidate Gold
X: a X:a
Y: b Z: b
Z: cd V: cd
-- !! Y: b c d
W: a b c d W: a b c d
● Recall: there is a missing subtree. 5 yields in gold.
● This means 40% recall. 2/5
– For unlabeled, it would be 80% recall.
recall: number of constituents from the goldstandard (yield) that can be found in the parser
output divided by the number of constituents in thegold standard
![Page 61: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/61.jpg)
61
EvaluationCandidate Gold
X: a X:a
Y: b Z: b
Z: cd V: cd
-- !! Y: b c d
W: a b c d W: a b c d
● Recall: there is a missing subtree. 5 yields in gold.
● This means 40% recall. 2/5
– For unlabeled, it would be 80% recall.
![Page 62: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/62.jpg)
62
Evaluation
● Complaints about Parseval:– Rewards shallow/safe analyses better than those that
make more claims but a few mistakes.
– Especially with corpora, punishes parsers that provide more information than necessary.
– Some "single" errors can hurt the score repeatedly, for example a single misplaced node may trigger multiple crossing brackets and incorrect yields.
– Weights all nodes evenly, rather than making crucial semantical relations more important.
![Page 63: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/63.jpg)
63
Evaluation
● For dependency parsers:– Attachment score. (Buchholz & Marsi, 2006)
– Measures the proportion of words in a sentence that are attached to the correct head.
– (could be labeled)
![Page 64: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/64.jpg)
64
Evaluation
Head-Gold Label-Gold Head-System Label-System
1 . Parsers 2 SBJ 2 DOBJ
2. are 0 ROOT 0 Root
3. cool 2 PRD 1 PRD
4. . 2 Punct 1 Punct
![Page 65: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/65.jpg)
65
Evaluation
Head-Gold Label-Gold Head-System Label-System
1 . Parsers 2 SBJ 2 DOBJ
2. are 0 ROOT 0 Root
3. cool 2 PRD 1 DOBJ
4. . 2 Punct 1 Punct
● Unlabeled attachment score:– 2 heads out of 4 correct:
– 50% UAS
![Page 66: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/66.jpg)
66
Evaluation
Head-Gold Label-Gold Head-System Label-System
1 . Parsers 2 SBJ 2 DOBJ
2. are 0 ROOT 0 Root
3. cool 2 PRD 1 PRD
4. . 2 Punct 1 Punct
● Labeled attachment score:– 1 head and label out of 4 correct:
– 25% LAS
![Page 67: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/67.jpg)
67
Evaluation
Head-Gold Label-Gold Head-System Label-System
1 . Parsers 2 SBJ 2 DOBJ
2. are 0 ROOT 0 Root
3. cool 2 PRD 1 PRD
4. . 2 Punct 1 Punct
● Label accuracy:– 3 labels out of 4 correct:
– 75% LA
![Page 68: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/68.jpg)
68
Evaluation
● Time and space efficiency are also important.● processing speed and memory consumption,
can be critical and therefore often need to be evaluated empirically
● A faster parser with reasonable results could be (I'd say ISIS) more interesting than a very slow but accurate parser.
![Page 69: Miguel Ballesteros - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/fa2015-11711/images/1/17/IntroToParsing.pdf · Introduction to Parsing Miguel Ballesteros ... Syntactic parsing is normally](https://reader034.fdocuments.net/reader034/viewer/2022052608/5ab836727f8b9ad5338c94c2/html5/thumbnails/69.jpg)
69
Parsing algorithms
● CKY (bottom-up)● Earley (top-down)● Shift-Reduce
----------------------● Transition-based dependency parsers.● Graph-based dependency parsers.