ICMP (Internet Control Message Protocol) Computer Networks By: Saeedeh Zahmatkesh 90-91 spring.
Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi...
Transcript of Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi...
![Page 1: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/1.jpg)
Natural Language ProcessingParsing
Potsdam, 10 May 2012
Saeedeh MomtaziInformation Systems Group
based on the slides of the course book
![Page 2: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/2.jpg)
Parsing
Finding structural relationship between words in a sentence
ApplicationsSpell checkingSpeech recognitionMachine translationLanguage modeling
Saeedeh Momtazi | NLP | 10.05.2012
2
![Page 3: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/3.jpg)
Outline
1 Phrase Structure
2 Syntactic ParsingCKY Algorithm
3 Statistical Parsing
Saeedeh Momtazi | NLP | 10.05.2012
3
![Page 4: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/4.jpg)
Outline
1 Phrase Structure
2 Syntactic ParsingCKY Algorithm
3 Statistical Parsing
Saeedeh Momtazi | NLP | 10.05.2012
4
![Page 5: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/5.jpg)
Constituency
Working based on Constituency (Phrase structure)
Organizing words into nested constituentsShowing that groups of words within utterances can act as singleunitsForming coherent classes from these units that can behave insimilar ways
With respect to their internal structureWith respect to other units in the language
Considering a head word for each constituent
Saeedeh Momtazi | NLP | 10.05.2012
5
![Page 6: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/6.jpg)
Constituency
the writer talked to the audiences about his new book.
the writer talked about his new book to the audiences. 4
about his new book the writer talked to the audiences. 4
the writer talked book to the audiences about his new. 7
Saeedeh Momtazi | NLP | 10.05.2012
6
![Page 7: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/7.jpg)
Context Free Grammar (CFG)
Grammar G consists of
Terminals (T )Non-terminals (N)Start symbol (S)Rules (R)
Saeedeh Momtazi | NLP | 10.05.2012
7
![Page 8: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/8.jpg)
CFG
TerminalsThe set of words in the text
Non-TerminalsThe constituents in a language(noun phrase, verb phrase, ....)
Start symbolThe main constituent of the language(sentence)
RulesEquations that consist of a single non-terminal on the left andany number of terminals and non-terminals on the right
Saeedeh Momtazi | NLP | 10.05.2012
8
![Page 9: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/9.jpg)
CFGS → NP VPS → VPNP → NNP → Det NNP → NP NPNP → NP PPVP → VVP → VP PPVP → VP NPPP → Prep NP
N → bookV → bookDet → theN → flightPrep → throughN → Houston
Saeedeh Momtazi | NLP | 10.05.2012
9
![Page 10: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/10.jpg)
CFG
Saeedeh Momtazi | NLP | 10.05.2012
10
![Page 11: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/11.jpg)
CFG
Saeedeh Momtazi | NLP | 10.05.2012
11
![Page 12: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/12.jpg)
Outline
1 Phrase Structure
2 Syntactic ParsingCKY Algorithm
3 Statistical Parsing
Saeedeh Momtazi | NLP | 10.05.2012
12
![Page 13: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/13.jpg)
Parsing
Taking a string and a grammar and returning proper parsetree(s) for that string
Covering all and only the elements of the input string
Reaching the start symbol at the top of the string
The system cannot select the correct tree among all thepossible trees
Saeedeh Momtazi | NLP | 10.05.2012
13
![Page 14: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/14.jpg)
Main Grammar Fragments
Sentence
Noun PhraseAgreement
Verb PhraseSub-categorization
Saeedeh Momtazi | NLP | 10.05.2012
14
![Page 15: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/15.jpg)
Grammar Fragments: Sentence
DeclarativesA plane left.S → NP VP
ImperativesLeave!S → VP
Yes-No QuestionsDid the plane leave?S → Aux NP VP
WH QuestionsWhen did the plane leave?S → NPWH Aux NP VP
Saeedeh Momtazi | NLP | 10.05.2012
15
![Page 16: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/16.jpg)
Grammar Fragments: NP
Each NP has a central critical noun called head
The head of an NP can be expressed usingPre-nominals: the words that can come before the headPost-nominals: the words that can come after the head
Saeedeh Momtazi | NLP | 10.05.2012
16
![Page 17: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/17.jpg)
Grammar Fragments: NP
Pre-nominalsSimple lexical items: the, this, a, an, ...a car
Simple possessivesJohn’s car
Complex recursive possessivesJohn’s sister’s friend’s car
Quantifiers, cardinals, ordinals...three cars
Adjectiveslarge cars
Saeedeh Momtazi | NLP | 10.05.2012
17
![Page 18: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/18.jpg)
Grammar Fragments: NP
Post-nominalsPrepositional phrasesflight from Seattle
Non-finite clausesflight arriving before noon
Relative clausesflight that serves breakfast
Saeedeh Momtazi | NLP | 10.05.2012
18
![Page 19: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/19.jpg)
Agreement
Having constraints that hold among various constituentsConsidering these constraints in a rule or set of rules
Example: determiners and the head nouns in NPs have toagree in number
This flight 4
Those flights 4
This flights 7
Those flight 7
Grammars that do not consider constraints will over-generateAccepting and assigning correct structures to grammatical examples (this
flight)
But also accepting incorrect examples (these flight)
Saeedeh Momtazi | NLP | 10.05.2012
19
![Page 20: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/20.jpg)
Agreement at sentence level
Considering similar constraints at sentence level
Example: subject and verb in sentences have to agree in numberand person
John flies 4
We fly 4
John fly 7
We flies 7
Saeedeh Momtazi | NLP | 10.05.2012
20
![Page 21: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/21.jpg)
Agreement
Possible CFG solution
Ssg → NPsg VPsg
Spl → NPpl VPpl
NPsg → Detsg Nsg
NPpl → Detpl Npl
VPsg → Vsg NPsg
VPpl → Vpl NPpl
...
Shortcoming:Introducing many rules in the system
Saeedeh Momtazi | NLP | 10.05.2012
21
![Page 22: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/22.jpg)
Grammar Fragments: VP
VPs consist of a head verb along with zero or moreconstituents called arguments
VP → VVP → V NPVP → V PPVP → V NP PPVP → V NP NP
disappearprefer a morning flightfly on Thursdayleave Boston in the morninggive me the flight number
ArgumentsObligatory: complementOptional: adjunct
Saeedeh Momtazi | NLP | 10.05.2012
22
![Page 23: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/23.jpg)
Sub-categorization
Even though there are many valid VP rules, not all verbs areallowed to participate in all VP rules
disappear a morning flight 7
Solution:Subcategorizing the verbs according to the sets of VP rules thatthey can participate inThis is a modern take on the traditional notion oftransitive/intransitiveModern grammars may have 100s or such classes
Saeedeh Momtazi | NLP | 10.05.2012
23
![Page 24: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/24.jpg)
Sub-categorization
Example:
SneezeFindGiveHelpPreferTold
John sneezedPlease find [a flight to NY]NP
Give [me]NP[a cheaper fair]NP
Can you help [me]NP[with a flight]PP
I prefer [to leave earlier]TO-VP
I was told [United has a flight]S
John sneezed the book 7
I prefer United has a flight 7
Give with a flight 7
Saeedeh Momtazi | NLP | 10.05.2012
24
![Page 25: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/25.jpg)
Sub-categorization
The over-generation problem also exists in VP rulesPermitting the presence of strings containing verbs andarguments that do not go together
John sneezed the bookVP → V NP
Solution:Similar to agreement phenomena, we need a way to formallyexpress the constraints
Saeedeh Momtazi | NLP | 10.05.2012
25
![Page 26: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/26.jpg)
Parsing Algorithms
Top-DownStarting with the rules that give us an S, since trees should berooted with an SWorking on the way down from S to the words
Bottom-UpStarting with trees that link up with the words, since trees shouldcover the input wordsWorking on the way up from words to larger and larger trees
Saeedeh Momtazi | NLP | 10.05.2012
26
![Page 27: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/27.jpg)
Top-Down vs. Bottom-Up
Top-DownOnly searches for trees that can be answers (i.e. S’s)But also suggests trees that are not consistent with any of thewords
Bottom-UpOnly forms trees consistent with the wordsBut suggests trees that make no sense globally
Saeedeh Momtazi | NLP | 10.05.2012
27
![Page 28: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/28.jpg)
Top-Down vs. Bottom-Up
In both cases, we left out how to keep track of the search spaceand how to make choices
SolutionsBacktracking
Making a choice, if it works out then fineIf not, then back up and make a different choice⇒ duplicated work
Dynamic programmingAvoiding repeated workSolving exponential problems in polynomial timeStoring ambiguous structures efficiently
Saeedeh Momtazi | NLP | 10.05.2012
28
![Page 29: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/29.jpg)
Dynamic Programming Methods
CKY: bottom-upEarly: top-down
Saeedeh Momtazi | NLP | 10.05.2012
29
![Page 30: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/30.jpg)
Outline
1 Phrase Structure
2 Syntactic ParsingCKY Algorithm
3 Statistical Parsing
Saeedeh Momtazi | NLP | 10.05.2012
30
![Page 31: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/31.jpg)
Chomsky Normal Form
Each grammar can be represented by a set of binary rules
A→ B C
A→ w
A, B, C are noun-terminals w is a terminal
Saeedeh Momtazi | NLP | 10.05.2012
31
![Page 32: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/32.jpg)
Chomsky Normal Form
Converting to Chomsky normal form
A→ B C D
X → B C
A→ X D
X does not occur anywhere else in the the grammar
Saeedeh Momtazi | NLP | 10.05.2012
32
![Page 33: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/33.jpg)
Chomsky Normal Form
Converting to Chomsky normal form
A→ B
B → C D
A→ C D
Saeedeh Momtazi | NLP | 10.05.2012
33
![Page 34: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/34.jpg)
CKY Parsing
A→ B C
If there is an A somewhere in the input, then there must be a Bfollowed by a C in the input
If the A spans from i to j in the input, then there must be a ksuch that i < k < j
B spans from i to kC spans from k to j
Saeedeh Momtazi | NLP | 10.05.2012
34
![Page 35: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/35.jpg)
CKY Parsing
[0,1] [0,2] [0,3] [0,4] [0,5]
[1,2] [1,3] [1,4] [1,5]
[2,3] [2,4] [2,5]
[3,4] [3,5]
[4,5]
Book the flight through Houston0 1 2 3 4 5
Saeedeh Momtazi | NLP | 10.05.2012
35
![Page 36: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/36.jpg)
CKY ParsingN → book[0,1]
V → book[0,1]
[0,1] [0,2] [0,3] [0,4] [0,5]
Det → the[1,2]
[1,2] [1,3] [1,4] [1,5]
N → flight[2,3]
[2,3] [2,4] [2,5]
Prep → through[3,4]
[3,4] [3,5]
N → houston[4,5]
[4,5]
Book the flight through Houston0 1 2 3 4 5
Saeedeh Momtazi | NLP | 10.05.2012
36
![Page 37: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/37.jpg)
CKY ParsingN → book[0,1]
V → book[0,1]
NP → N[0,1]
VP → V[0,1]
S → VP[0,1]
[0,1] [0,2] [0,3] [0,4] [0,5]
Det → the[1,2]
[1,2] [1,3] [1,4] [1,5]
N → flight[2,3]
NP → N[2,3]
[2,3] [2,4] [2,5]
Prep → through[3,4]
[3,4] [3,5]
N → houston[4,5]
NP → N[4,5]
[4,5]
Book the flight through Houston0 1 2 3 4 5
Saeedeh Momtazi | NLP | 10.05.2012
37
![Page 38: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/38.jpg)
CKY ParsingN → book[0,1]
V → book[0,1]
NP → N[0,1]
VP → V[0,1]
S → VP[0,1]
[0,1] [0,2] [0,3] [0,4] [0,5]
Det → the[1,2]
[1,2]
NP → Det[1,2], N[2,3]
[1,3] [1,4] [1,5]
N → flight[2,3]
NP → N[2,3]
[2,3] [2,4] [2,5]
Prep → through[3,4]
[3,4] [3,5]
N → houston[4,5]
NP → N[4,5]
[4,5]
Book the flight through Houston0 1 2 3 4 5
Saeedeh Momtazi | NLP | 10.05.2012
38
![Page 39: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/39.jpg)
CKY ParsingN → book[0,1]
V → book[0,1]
NP → N[0,1]
VP → V[0,1]
S → VP[0,1]
[0,1] [0,2] [0,3] [0,4] [0,5]
Det → the[1,2]
[1,2]
NP → Det[1,2], N[2,3]
[1,3] [1,4] [1,5]
N → flight[2,3]
NP → N[2,3]
[2,3] [2,4] [2,5]
Prep → through[3,4]
[3,4]
PP→Prep[3,4],NP[4,5]
[3,5]
N → houston[4,5]
NP → N[4,5]
[4,5]
Book the flight through Houston0 1 2 3 4 5
Saeedeh Momtazi | NLP | 10.05.2012
39
![Page 40: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/40.jpg)
CKY ParsingN → book[0,1]
V → book[0,1]
NP → N[0,1]
VP → V[0,1]
S → VP[0,1]
[0,1] [0,2]
NP → NP[0,1], NP[1,3]
VP → VP[0,1], NP[1,3]
S → VP[0,3]
[0,3] [0,4] [0,5]
Det → the[1,2]
[1,2]
NP → Det[1,2], N[2,3]
[1,3] [1,4] [1,5]
N → flight[2,3]
NP → N[2,3]
[2,3] [2,4] [2,5]
Prep → through[3,4]
[3,4]
PP→Prep[3,4],NP[4,5]
[3,5]
N → houston[4,5]
NP → N[4,5]
[4,5]
Book the flight through Houston0 1 2 3 4 5
Saeedeh Momtazi | NLP | 10.05.2012
40
![Page 41: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/41.jpg)
CKY ParsingN → book[0,1]
V → book[0,1]
NP → N[0,1]
VP → V[0,1]
S → VP[0,1]
[0,1] [0,2]
NP → NP[0,1], NP[1,3]
VP → VP[0,1], NP[1,3]
S → VP[0,3]
[0,3] [0,4] [0,5]
Det → the[1,2]
[1,2]
NP → Det[1,2], N[2,3]
[1,3] [1,4] [1,5]
N → flight[2,3]
NP → N[2,3]
[2,3] [2,4]
NP → NP[2,3], PP[3,5]
[2,5]
Prep → through[3,4]
[3,4]
PP→Prep[3,4],NP[4,5]
[3,5]
N → houston[4,5]
NP → N[4,5]
[4,5]
Book the flight through Houston0 1 2 3 4 5
Saeedeh Momtazi | NLP | 10.05.2012
41
![Page 42: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/42.jpg)
CKY ParsingN → book[0,1]
V → book[0,1]
NP → N[0,1]
VP → V[0,1]
S → VP[0,1]
[0,1] [0,2]
NP → NP[0,1], NP[1,3]
VP → VP[0,1], NP[1,3]
S → VP[0,3]
[0,3] [0,4] [0,5]
Det → the[1,2]
[1,2]
NP → Det[1,2], N[2,3]
[1,3] [1,4]
NP → NP[1,3], PP[3,5]
[1,5]
N → flight[2,3]
NP → N[2,3]
[2,3] [2,4]
NP → NP[2,3], PP[3,5]
[2,5]
Prep → through[3,4]
[3,4]
PP→Prep[3,4],NP[4,5]
[3,5]
N → houston[4,5]
NP → N[4,5]
[4,5]
Book the flight through Houston0 1 2 3 4 5
Saeedeh Momtazi | NLP | 10.05.2012
42
![Page 43: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/43.jpg)
CKY ParsingN → book[0,1]
V → book[0,1]
NP → N[0,1]
VP → V[0,1]
S → VP[0,1]
[0,1] [0,2]
NP → NP[0,1], NP[1,3]
VP → VP[0,1], NP[1,3]
S → VP[0,3]
[0,3] [0,4]
VP → VP[0,1], NP[1,5]
VP' → VP[0,3], PP[3,5]
S → VP[0,5]
S → VP'[0,5]
[0,5]
Det → the[1,2]
[1,2]
NP → Det[1,2], N[2,3]
[1,3] [1,4]
NP → NP[1,3], PP[3,5]
[1,5]
N → flight[2,3]
NP → N[2,3]
[2,3] [2,4]
NP → NP[2,3], PP[3,5]
[2,5]
Prep → through[3,4]
[3,4]
PP→Prep[3,4],NP[4,5]
[3,5]
N → houston[4,5]
NP → N[4,5]
[4,5]
Book the flight through Houston0 1 2 3 4 5
ambiguity
Saeedeh Momtazi | NLP | 10.05.2012
43
![Page 44: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/44.jpg)
Outline
1 Phrase Structure
2 Syntactic ParsingCKY Algorithm
3 Statistical Parsing
Saeedeh Momtazi | NLP | 10.05.2012
44
![Page 45: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/45.jpg)
Probabilistic Context FreeGrammar (PCFG)
Grammar G consists of
Terminals (T )Non-terminals (N)Start symbol (S)Rules (R)Probability function (P)
P : R → [0, 1]∀X ∈ N,
∑X→λ∈R P(X → λ) = 1
Saeedeh Momtazi | NLP | 10.05.2012
45
![Page 46: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/46.jpg)
CFGS → NP VPS → VPNP → NNP → Det NNP → NP NPNP → NP PPVP → VVP → VP PPVP → VP NPPP → Prep NP
N → bookV → bookDet → theN → flightPrep → throughN → Houston
Saeedeh Momtazi | NLP | 10.05.2012
46
![Page 47: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/47.jpg)
PCFGS → NP VPS → VPNP → NNP → Det NNP → NP NPNP → NP PPVP → VVP → VP PPVP → VP NPPP → Prep NP
0.90.10.30.40.10.20.10.30.61.0
N → bookV → bookDet → theN → flightPrep → throughN → Houston
0.51.01.00.41.00.1
Saeedeh Momtazi | NLP | 10.05.2012
47
![Page 48: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/48.jpg)
Treebank
A treebank is a corpus in which each sentence has been pairedwith a parse tree
These are generally created byParsing the collection with an automatic parserCorrecting each parse by human annotators if required
Requirement:detailed annotation guidelines that provide
A POS tagsetA grammarAnnotation schema
Instructions for how to deal with particular grammaticalconstructions
Saeedeh Momtazi | NLP | 10.05.2012
48
![Page 49: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/49.jpg)
Penn Treebank
Penn Treebank is a widely used treebank for EnglishMost well-known section: Wall Street Journal Section
1 M words from 1987-1989
(S (NP (NNP John))(VP (VPZ flies)
(PP (IN to)(NNP Paris)))
(. .))
Saeedeh Momtazi | NLP | 10.05.2012
49
![Page 50: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/50.jpg)
Statistical Parsing
Considering the corresponding probabilities while parsing asentenceSelecting the parse tree which has the highest probability
Tree and string probabilitiesP(t): the probability of a tree t
Product of the probabilities of the rules used to generate the tree
P(s): the probability of a string sSum of the probabilities of the trees which created to parse thestring
Saeedeh Momtazi | NLP | 10.05.2012
50
![Page 51: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/51.jpg)
PCFGS → NP VPS → VPNP → NNP → Det NNP → NP NPNP → NP PPVP → VVP → VP PPVP → VP NPPP → Prep NP
0.90.10.30.40.10.20.10.30.61.0
N → bookV → bookDet → theN → flightPrep → throughN → Houston
0.51.01.00.41.00.1
Saeedeh Momtazi | NLP | 10.05.2012
51
![Page 52: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/52.jpg)
Statistical Parsing
P(t) = 0.2× 0.4× 1.0× 1.0× 0.4× 1.0× 0.1 = 0.0032
Saeedeh Momtazi | NLP | 10.05.2012
52
![Page 53: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/53.jpg)
Probabilistic CKY ParsingN → book[0,1]
V → book[0,1]
NP → N[0,1]
VP → V[0,1]
S → VP[0,1]
[0,1]
0.1*0.15*0.16=0.00240.6*0.1*0.16=0.00960.1*0.0096=0.00096
[0,2]
NP → NP[0,1], NP[1,3]
VP → VP[0,1], NP[1,3]
S → VP[0,3]
[0,3] [0,4]
VP → VP[0,1], NP[1,5]
VP' → VP[0,3], PP[3,5]
S → VP[0,5]
S → VP'[0,5]
[0,5]
Det → the[1,2]
[1,2]
NP → Det[1,2], N[2,3]
0.4*1.0*0.4=0.16
[1,3] [1,4]
NP → NP[1,3], PP[3,5]
[1,5]
N → flight[2,3]
NP → N[2,3]
[2,3] [2,4]
NP → NP[2,3], PP[3,5]
[2,5]
Prep → through[3,4]
[3,4]
PP→Prep[3,4],NP[4,5]
[3,5]
N → houston[4,5]
NP → N[4,5]
[4,5]
Book the flight through Houston0 1 2 3 4 5
0.51.0
0.3*0.5=0.150.1*1.0=0.1
0.1*0.1=0.01
1.0
0.40.3*0.4=0.12
Saeedeh Momtazi | NLP | 10.05.2012
53
![Page 54: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/54.jpg)
Exercise
Implement the probabilistic CKY algorithm which works basedon the grammar rules R.
Saeedeh Momtazi | NLP | 10.05.2012
54
![Page 55: Parsing Potsdam, 10 May 2012 - Hasso Plattner InstituteParsing Potsdam, 10 May 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book ... 0.9 0.1 0.3](https://reader030.fdocuments.net/reader030/viewer/2022040400/5e6c377c10858010ad4ed4c3/html5/thumbnails/55.jpg)
Further Reading
Speech and Language ProcessingChapters 12, 13, 14, 15
Saeedeh Momtazi | NLP | 10.05.2012
55