Post on 26-May-2015
description
L-R Feature Structure Unification Syntactic Parser Richard Caneba
RPI Cognitive Science Department
Human-Level Intelligence Laboratory
Intuitions • An interpretive grammar views syntax as finding the most
appropriate sequence of head and dependency relationship between phrases and words.
• Language understanding occurs (roughly) left to right
• Syntactic trees have a flat structure, that gives no syntactic preferences to sequences of adjunctive modifiers of the same category (adjectives, adverbs, modifying prepositional phrases)
• We can infer a number of things immediately from the perception of a weird, although by no means all things
Intuitions cont’d • There are many patterns that exist in natural language, that
can be deterministic in some cases, and must be defeasible/probabilistic in others.
• Reliably deterministic: • [Det N] => NP[Det N]
• [Adj N] => NP[Adj N]
• Defeasible: • *V NP NP…+ (<1.0)> VP*V NP NP…+
• *V NP NP…+ (<1.0)> VP*V NP*NP…+…+
• Make an attempt to do search ONLY if there is a genuine ambiguity as to what the next step in a L-R parse should be • Second object/Relative clause modifier in ditransitive context
• Prepositional phrase attachment
Feature Structure Unification • A traditional challenge with the HPSG theory of grammar is
that, in order to preserve the recursiveness of their grammar rules, they were required to have a “right-branching” structure that posited additional feature structure nodes for each dependency-head relationship the theory posits
• This is to some extent slightly cognitively unrealistic:
• Posits an unecessary amount of structure for a syntactic parse
• Intuitively there is no syntactic distinction that should be made between sequences of adjuncts (it’s hard to tell the difference between “the angry green dog” and the “green angry dog.”
Lexical Representation of Syntax • Each word posits a sequence of head-dependency
relationships that form a “phrasal chain.”
• These chains are based on the notion that we can infer immediately some head-dependency relationships based on the syntactic category of the word.
• Roughly, each node in a chain is of three types (not explicitly defined in the lexicon, but nonetheless present):
• Word Level (WordUtteranceEvent)
• Dependency Level (PhraseUtteranceEvent)
• HeadLevel (PhraseUtteranceEvent)
Lexical Representation of Syntax • Let’s do a quick example to show the lexical syntactic
representation:
• “the angry dog”
• With part-of-speech tags, that is:
• [Det the][Adj angry][N dog].
• The representation in di-graph form:
Lexical Representation of Syntax
PhraseUtteranceEvents
WordUtteranceEvents
CommonNoun
Part
Of
IsA
Part
Of
Noun IsA
CandType Verb
CandType Preposition
Specifier
Syntactic Entry for a Common Noun
CandType Noun
dog
Ph
on
Determiner IsA
Lexical Representation of Syntax
PhraseUtteranceEvents
WordUtteranceEvents
Adjective
Part
Of
IsA
Noun IsA
Syntactic Entry for an Adjective
CandType Noun
angry
Ph
on
NOTE: will need to posit a dependency layer, to account for adverbs that modify the adjective i.e. “really big”.
Lexical Representation of Syntax
PhraseUtteranceEvents
WordUtteranceEvents
Determiner
Part
Of
IsA
Part
Of
Noun IsA
CandType Verb
CandType Preposition
Syntactic Entry for a Determiner
CandType Noun
the
Ph
on
Grammar Rules • In our example, we will need to have at least two rules:
• One that unifies the structures posited by the determiner to the structures posited by the common noun
• One that unifies the structures posited by adjective, either to the determiner or the noun
• Let’s consider this from L-R:
• First, unify the Det-NP-XP structure chain to the Adj-NP structure chain
• Next, unify that resulting structure chain to the N-NP-XP structure chain
Grammar Rules • Determiner-Adjective Rule
Determiner
Part
Of
IsA
Part
Of
Noun IsA
Verb
CandType Preposition
CandType Noun
the
Ph
on
Adjective
Part
Of
IsA
Noun IsA
CandType Noun
angry
Ph
on
Grammar Rules
• Determiner-Adjective Rule
Determiner
Part
Of
IsA
Part
Of
Noun IsA
Verb
CandType Preposition
CandType Noun
the
Ph
on
Adjective
Part
Of
IsA
Noun IsA
CandType Noun
angry
Ph
on
Same
Grammar Rules • Determiner-Adjective Rule
Determiner IsA
Part
Of
Noun IsA
Verb CandType
Preposition
CandType Noun
the
Ph
on
Adjective IsA
angry
Ph
on
CandType
Grammar Rules • We would like to allow for anywhere from 0-infinite number
of adjectives to stand between the determiner and the noun that selects the determiner as its specifier.
• We can achieve this by explicitly stating that whenever a Det chain and an Adj chain are unified, it’s exposed as a determiner on the right wall of the growing parse, as opposed to an adjective.
Grammar Rules • Determiner-Adjective Resulting Structure
Determiner IsA
Part
Of
Noun IsA
Verb CandType
Preposition
CandType Noun
the
Ph
on
Adjective IsA
angry
Ph
on
CandType
Grammar Rules • Determiner-Adjective Resulting Structure + NP
Determiner IsA
Part
Of
Noun IsA
Verb CandType
Preposition
Noun
the
Ph
on
Adjective IsA
angry
Ph
on
CommonNoun
Part
Of
IsA
Part
Of
Noun
CandType
Verb CandType
Preposition
CandType
Noun
dog
Ph
on
Grammar Rules • Expose the resulting structure from the Det-Adj unification as
just the Det structure:
Det Adj N
NP NP
XP XP
Spr
Grammar Rules • Expose the resulting structure from the Det-Adj unification as
just the Det structure:
Det Adj N
NP NP
XP XP
Border
Frontier
Spr
Grammar Rules • Expose the resulting structure from the Det-Adj unification as
just the Det structure:
Det Adj N
NP NP
XP XP
Border
Frontier
Spr
Same
Same
Grammar Rules
<!--Pre-head Adjective Modifier w/ Det: Shift Border--> <constraint shouldFalsify="false"> Border(?ba, ?t0, ?w)^ Border(?bb, ?t0, ?w)^ Frontier(?fa, ?t1, ?w)^ Frontier(?fb, ?t1, ?w)^ Meets(?t0, ?t1, E, ?w)^ PartOf(?ba, ?bb, E, ?w)^ PartOf(?fa, ?fb, E, ?w)^ IsA(?ba, Determiner, E, ?w)^ IsA(?bb, Noun, E, ?w)^ IsA(?fa, Adjective, E, ?w)^ IsA(?fb, Noun, E, ?w) ==> Same(?bb, ?fb, E, ?w)^ Border(?ba, ?t1, ?w)^ </constraint>
<!--Subcategorization Rules: NP Specifier--> <constraint shouldFalsify="false"> Border(?ba, ?t0, ?w)^ Border(?bb, ?t0, ?w)^ Frontier(?fa, ?t1, ?w)^ Frontier(?fb, ?t1, ?w)^ Meets(?t0, ?t1, E, ?w)^ PartOf(?ba, ?bb, E, ?w)^ PartOf(?fa, ?fb, E, ?w)^ IsA(?ba, Determiner, E, ?w)^ IsA(?bb, Noun, E, ?w)^ Specifier(?fa, ?spr, E, ?w)^ IsA(?spr, Determiner, E, ?w)^ IsA(?fb, Noun, E, ?w)^ Heard(?wue, E, ?w)^ IsA(?wue, WordUtteranceEvent, ?t1, ?w) ==> Same(?ba, ?spr, E, ?w)^ Same(?bb, ?fb, E, ?w)^ Border(?wue, ?t1, ?w)^ _NPSPR(?ba, ?bb, ?fa, ?fb, E, ?w) </constraint>
Grammar Rules
*send+ *john+ *a+ *message+ *that+ *says+ *“hi”+.
Grammar Rules
[V send] [N john] [Det a] [N message] [RelP that] [V says] [Q “hi”+.
Grammar Rules
[V send] [N john] [Det a] [N message] [RelP that] [V says] [Q “hi”+.
NP NP
NP
NP VP VP VP
XP XP XP
NP
XP
Grammar Rules
[V send] [N john] [Det a] [N message] [RelP that] [V says] [Q “hi”+.
NP NP NP
NP
NP VP VP VP
XP XP XP XP
Grammar Rules
[V send] [N john] [Det a] [N message] [RelP that] [V says] [Q “hi”+.
NP NP NP
NP
NP VP VP VP
XP XP XP XP
Grammar Rules
[V send] [N john] [Det a] [N message] [RelP that] [V says] [Q “hi”+.
NP NP NP
NP
NP VP VP VP
XP XP XP XP
Grammar Rules
[V send] [N john] [Det a] [N message] [RelP that] [V says] [Q “hi”+.
NP NP NP
NP
NP VP VP VP
XP XP XP XP
Grammar Rules
[V send] [N john] [Det a] [N message] [RelP that] [V says] [Q “hi”+.
NP NP NP
NP
NP VP VP VP
XP XP XP XP
Grammar Rules
[V send] [N john] [Det a] [N message] [RelP that] [V says] [Q “hi”+.
NP NP NP
NP
NP VP VP VP
XP XP XP XP
Grammar Rules
[V send] [N john] [Det a] [N message] [RelP that] [V says] [Q “hi”+.
NP
NP
NP
VP
VP
Grammar Rules • Benefits of this feature-structure unification parse:
• Captures the intuition that when we hear a word, and posit its feature structure, we can infer the existence of not only the word’s direct feature structure (usually generated by lexical rules) but also the existence of additional structures and their head/dependency relationships, and some definition of the values in the structure.
• Ambiguities (i.e. the head of an NP) are resolved from L-R through lazy definitions and unificiation of under-defined structures to well-defined structures in terms of particular features.
• Posits no more additional structures in the parse tree than is necessary in order to reflect a parse, whereas theories like HPSG posited by a large number of structures in a branching tree in order to preserve the recursivity of its grammar rules.
• However, we have shown that with feature structure unification, at least in theory, we can preserve recursivity of many of the rules without requiring a left or right branching structure.
• All of the necessary structure to build a parse are known from the beginning.
Grammar Rules! • The future:
• Ungrammaticality: when objects aren’t where they are supposed to be, search for a likely head-dependency relationship • Missing arguments: “Car is big.” • Extra words (rare to have full content words be considered extra, but occurs in natural language: “I saw the, um,
car.”) • Dependents out of order: “Give the car me.” • Dangling dependent: “ • Will require a good branch and bound system, that only performs search when what is expected/predicted
reasonably is violated.
• Give a feature-structure unification account of garden path sentence • Should be fairly natural given the L-R predictive nature of the parser
• Attach a semantic representation that generates word-sense based on head-dependency relationships. • Syntax should be closely tied to semantics, in that both serve to help compute each other to varying degrees.
• Examine discourse from a syntactic perspective, and syntax from a discourse perspective, and use to disambiguate simultaneously:
Notes on Theory (boring) • By having a lexical representation that is closely tied to the syntax, a number of advantages
fall out: • Parsimony: by allowing a lot of information to be loosely defined/undefined at the lexical
level, we do not need to posit additional lexical entries to cover all possible configurations of a phrases arguments in the entry, nor do we need an excessive number of lexical rules to generate these representations.
• Generativity: a word’s sense is at least in part generated by its relationship to its dependents and head, and the semantic/syntactic type that these dependents/heads have in theory can compute a words sense on the fly (inspired by GL theory from Pustejovsky).
• Context embedding: by tying your theory of the lexicon closely to syntactic theory, you move towards embedding your lexical representation in a cognitive system that is closely tied to the way words are ACTUALLY used.
Lexical Mosaics • Thus, we can see that the sense of words comes from a number of
different locations: • Memory • Syntactic context • Pragmatic/Discourse factors
• It is the hope for future research to tie these together in an organized way to give a theory on lexical representation that is tied closely to these factors, in a computable and tractable manner.
• Early goals: • Compute word senses from syntactic context + memory (very
difficult) • Use syntactic context to disambiguate lexical ambiguity • Use generative word sense to disambiguate syntactic ambiguity • Simultaneously attempt to give a computational account of lexical
memory, syntactic parsing, and pragmatic/discourse.