Syntactic analysis using Context Free Grammars
Analysis of language
• Morphological analysis– Chairs <chair, n, plural>, <chair, v, 3rd person, present>
• Part Of Speech (POS) tagging– The/DT man/NN left/VBD the/DT room/NN– The/DT red/ADJ block/NN on/PREP the/DT blue/ADJ
cylinder/NN was/AUX moved/VBD onto/PREP the/DT brown/ADJ table/NN
• Any further analysis?
Analysis of language
• Part Of Speech (POS) tagging– The/DT man/NN left/VBD the/DT room/NN– The/DT red/ADJ block/NN on/PREP the/DT blue/ADJ
cylinder/NN was/AUX moved/VBD onto/PREP the/DT brown/ADJ table/NN
• Any further analysis?– chunks, clauses, syntax, semantics, word senses etc…
• Today’s lecture on analyzing syntax
What is Syntax?
• Study of structure of language– how words can connect to each other
• Specifically, goal is to relate surface form (i.e. the sentence) to semantics (the meaning)
• Representational device is tree structure
Structure in StringsProposal 1
• Some words: the a small nice big very boy girl sees likes
• Some good sentences:– (the) boy (likes a girl) – (the small) girl (likes the big girl)– (a very small nice) boy (sees a very nice boy)
• Some bad sentences:– *(the) boy (the girl)– *(small) boy (likes the nice girl)
Structure in StringsProposal 2
• Some words: the a small nice big very boy girl sees likes
• Some good sentences:– (the boy) likes (a girl) – (the small girl) likes (the big girl)– (a very small nice boy) sees (a very nice boy)
• Some bad sentences:– *(the boy) (the girl)– *(small boy) likes (the nice girl)
More Structure in StringsProposal 2 -- ctd
• Some words: the a small nice big very boy girl sees likes
• Some good sentences:– ((the) boy) likes ((a) girl) – ((the) (small) girl) likes ((the) (big) girl)– ((a) ((very) small) (nice) boy) sees ((a) ((very) nice) girl)
• Some bad sentences:– *((the) boy) ((the) girl)– *((small) boy) likes ((the) (nice) girl)
From Substrings to Trees
• (((the) boy) likes ((a) girl))
boythe
likesgirl
a
27
Context-Free Grammars
• Terminals– This would be the lexicon/vocabulary
• Non-Terminals– The constituents in a language
• Like noun phrase, verb phrase, prepositional phrase and sentence
• Rules– Rules consist of a single non-terminal on the left and any number of terminals
and non-terminals on the right.– Describe the allowed structure of the constituents– Express the ways in which symbols of the language can be grouped or ordered
together
Phrase Structure Tree
• (((the/Det) boy/N) likes/V ((a/Det) girl/N))
boy
the
likes
girl
a
DetP
NP NP
DetP
S
Phrase-structuretree
nonterminalsymbols= constituents
terminal symbols = words
Context?
• The notion of context in CFGs is not the same as the ordinary meaning of the word context in language.
• All it really means is that the non-terminal on the left-hand side of a rule is out there all by itself– A -> B C– Means that I can rewrite an A as a B followed by a C regardless of the context
in which A is found
CFG: Example
• Many possible CFGs for English, here is an example (fragment):– S NP VP– VP V NP– NP DetP N | AdjP NP– AdjP Adj | Adv AdjP– N boy | girl– V sees | likes– Adj big | small– Adv very – DetP a | the
the very small boy likes a girl
Derivations in a CFG
S NP VPVP V NPNP DetP N | AdjP NPAdjP Adj | Adv AdjPN boy | girlV sees | likesAdj big | smallAdv very DetP a | the
S
S
Derivations in a CFG
S NP VPVP V NPNP DetP N | AdjP NPAdjP Adj | Adv AdjPN boy | girlV sees | likesAdj big | smallAdv very DetP a | the
NP VP
NP
S
VP
Derivations in a CFG
S NP VPVP V NPNP DetP N | AdjP NPAdjP Adj | Adv AdjPN boy | girlV sees | likesAdj big | smallAdv very DetP a | the
DetP N VP
DetP
NP
S
VP
N
Derivations in a CFG
S NP VPVP V NPNP DetP N | AdjP NPAdjP Adj | Adv AdjPN boy | girlV sees | likesAdj big | smallAdv very DetP a | the
the boy VP
boythe
DetP
NP
S
VP
N
Derivations in a CFG
S NP VPVP V NPNP DetP N | AdjP NPAdjP Adj | Adv AdjPN boy | girlV sees | likesAdj big | smallAdv very DetP a | the
the boy likes NP
boythe likes
DetP
NP
NP
S
VP
N V
Derivations in a CFG
S NP VPVP V NPNP DetP N | AdjP NPAdjP Adj | Adv AdjPN boy | girlV sees | likesAdj big | smallAdv very DetP a | the
the boy likes a girl
boythe likes
DetP
NP
girla
NP
DetP
S
VP
N
N
V
38
Simple lexicon
39
Simple grammar
40
Generativity
• We can view these rules as either analysis or synthesis machines– Generate strings in the language– Reject strings not in the language– Impose structures (trees) on strings in the language
41
A CFG defines a formal language
• Sentences (strings of words) that can be derived by the grammar are in the formal language defined by the grammar
• Sentences that cannot be derived from the grammar are not in the language– Ungrammatical
42
Derivations• A derivation is a sequence of
rules applied to a string that accounts for that string– Covers all the elements in the
string– Covers only the elements in the
string
Recursion
• We’ll have to deal with rules such as the following where the non-terminal on the left also appears somewhere on the right (directly).– NP -> NP PP [[The flight] [to Boston]]– VP -> VP PP [[departed Miami] [at noon]]
Recursion
• Of course, this is what makes syntax interesting– flights from Denver– Flights from Denver to Miami– Flights from Denver to Miami in February– Flights from Denver to Miami in February on a Friday– Flights from Denver to Miami in February on a Friday under $300– Flights from Denver to Miami in February on a Friday under $300 with lunch
The Point
• If you have a rule like– VP -> V NP
– It only cares that the thing after the verb is an NP. It doesn’t have to know about the internal affairs of that NP
The Point
• VP -> V NP• I hate
– flights from Denver– Flights from Denver to Miami– Flights from Denver to Miami in February– Flights from Denver to Miami in February on a Friday– Flights from Denver to Miami in February on a Friday under $300– Flights from Denver to Miami in February on a Friday under $300 with lunch
Potential Problems in CFG
• Agreement• Subcategorization• Movement
48
Agreement• By agreement, we have in mind constraints that hold among various constituents
that take part in a rule or set of rules
• For example, in English, determiners and the head nouns in NPs have to agree in their number.
This flightThose flights
*This flights*Those flight
49
Problem
• Our earlier NP rules are clearly deficient since they don’t capture this constraint– NP Det Nominal
• Accepts, and assigns correct structures, to grammatical examples (this flight)• But its also happy with incorrect examples (*these flight)
– Such a rule is said to overgenerate.
50
Verb Phrases• English VPs consist of a head verb along with 0 or more following
constituents which we’ll call arguments.
Subcategorization• Sneeze: John sneezed• Find: Please find [a flight to NY]NP
• Give: Give [me]NP[a cheaper fare]NP
• Help: Can you help [me]NP[with a flight]PP
• Prefer: I prefer [to leave earlier]TO-VP
• Told: I was told [United has a flight]S
• …
• *John sneezed the book• *I prefer United has a flight• *Give with a flight
• Subcat expresses the constraints that a predicate (verb for now) places on the number and type of the argument it wants to take
So?
• So the various rules for VPs overgenerate.– They permit the presence of strings containing verbs and arguments that don’t
go together– For example– VP -> V NP therefore– Sneezed the book is a VP since “sneeze” is a verb and “the book” is a valid NP
• Subcategorization frames can fix this problem (“slow down” overgeneration)
• This is a modern take on the traditional notion of transitive/intransitive.• Modern grammars may have 100s or such classes.
53
Subcategorization
• Sneeze: John sneezed• Find: Please find [a flight to NY]NP
• Give: Give [me]NP[a cheaper fare]NP
• Help: Can you help [me]NP[with a flight]PP
• Prefer: I prefer [to leave earlier]TO-VP
• Told: I was told [United has a flight]S
• …
Movement
• Core example– [[My travel agent]NP [booked [the flight]NP]VP]S
• I.e. “book” is a straightforward transitive verb. It expects a single NP arg within the VP as an argument, and a single NP arg as the subject.
Movement
• What about?– Which flight do you want me to have the travel agent book?
• The direct object argument to “book” isn’t appearing in the right place. It is in fact a long way from where its supposed to appear.
• And note that its separated from its verb by 2 other verbs.
Grammar equivalence and normal form
• Strong equivalence:– two grammars are strongly equivalent if:
• they generate the same set of strings• they assign the same phrase structure to each sentence
– two grammars are weakly equivalent if:• they generate the same set of strings• they do not assign the same phrase structure to each sentence
• Normal form – Restrict the form of productions– Chomsky Normal Form (CNF)– Right hand side of the productions has either one or two terminals or non-
terminals– e.g. A -> BC A -> a– Any grammar can be translated into a weakly equivalent CNF– A -> B C D <=> A-> B X X -> C D
Building tree structures
• Draw tree structures for the following phrases
• Dallas• from Denver• arriving in Washington• I need to fly between Philadelphia and Atlanta• My flight from Philadelphia to Atlanta has been cancelled
Top Related