TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica...
-
Upload
george-morgan -
Category
Documents
-
view
224 -
download
3
Transcript of TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica...
TEORIE E TECNICHE DEL RICONOSCIMENTO
Linguistica computazionale in Python:- Analisi sintattica (parsing)
DAL CHUNKING ALL’ANALISI SINTATTICA COMPLETA
PROBLEMA: AMBIGUITA’
While hunting in Africa, I shot an elephant in my pajamas. How an elephant got into my pajamas I'll never know.
PROBLEMA: AMBIGUITA’
While hunting in Africa, I shot an elephant in my pajamas. How an elephant got into my pajamas I'll never know.
CARATTERIZZAZIONE DELLA SINTASSI DI UNA LINGUA: CONTEXT-FREE GRAMMARS
• Slides ELN?
CARATTERIZZAZIONE DELLA SINTASSI DI UNA LINGUA: CONTEXT-FREE GRAMMARS
• Capture constituency and ordering– Ordering:
• What are the rules that govern the ordering of words and bigger units in the language?
– Constituency:How words group into units and how the various kinds of units behave
Constituency• E.g., Noun phrases (NPs)
• Three parties from Brooklyn• A high-class spot such as Mindy’s• The Broadway coppers• They• Harry the Horse• The reason he comes into the Hot Box
• How do we know these form a constituent?
Constituency (II)– They can all appear before a verb:
• Three parties from Brooklyn arrive…• A high-class spot such as Mindy’s attracts…• The Broadway coppers love…• They sit
– But individual words can’t always appear before verbs:• *from arrive…• *as attracts…• *the is• *spot is…
– Must be able to state generalizations like:• Noun phrases occur before verbs
Constituency (III)
• Preposing and postposing:– On September 17th, I’d like to fly from Atlanta to Denver– I’d like to fly on September 17th from Atlanta to Denver– I’d like to fly from Atlanta to Denver on September 17th.
• But not:– *On September, I’d like to fly 17th from Atlanta to Denver– *On I’d like to fly September 17th from Atlanta to Denver
Indicating constituents: brackets, trees
• [S [NP [PRO I]] [VP [V prefer] [NP [Det a] [Nom [N morning]
[N flight] ] ] ] ]S
NP VP
NP
VerbPro
Nom
Det NounNoun
I prefer morninga flight
NLE 12
Beyond regular languages: Context-Free Grammars
S NP VPNP Det NominalNominal NounVP V
Det theDet aNoun flightV left
CFGs: set of rules
• S -> NP VP– This says that there are units called S, NP, and VP
in this language– That an S consists of an NP followed immediately
by a VP– Doesn’t say that that’s the only kind of S– Nor does it say that this is the only place that NPs
and VPs occur
Generativity
• As with FSAs you can view these rules as either analysis or synthesis machines– Generate strings in the language– Reject strings not in the language– Impose structures (trees) on strings in the
language
• How can we define grammatical vs. ungrammatical sentences?
Derivations
• A derivation is a sequence of rules applied to a string that accounts for that string– Covers all the elements in the string– Covers only the elements in the string
Derivations as Trees
S
NP VP
NP
VerbPro
Nom
Det NounNoun
I prefer morninga flight
CFGs more formally
• A context-free grammar has 4 parameters (“is a 4-tuple”)
1) A set of non-terminal symbols (“variables”) N
2) A set of terminal symbols (disjoint from N)
3) A set of productions P, each of the form• A -> • Where A is a non-terminal and is a string of symbols from the
infinite set of strings ( N)*
4) A designated start symbol S
Defining a CF language via derivation
• A string A derives a string B if – A can be rewritten as B via some series of rule applications
• More formally:– If A -> is a production of P– and are any strings in the set ( N)*– Then we say that
• A directly derives or A – Derivation is a generalization of direct derivation– Let 1, 2, … m be strings in ( N)*, m>= 1, s.t.
• 1 2, 2 3… m-1 m
• We say that 1derives m or 1* m
– We then formally define language LG generated by grammar G• A set of strings composed of terminal symbols derived from S• LG = {w | w is in * and S * w}
NLE 22
What `context free’ means
NLE 23
Derivations and languages
• The language LG GENERATED by a CFG grammar G is the set of strings of TERMINAL symbols that can be derived from the start symbol S using the production rules in G– LG = {w | w is in * and S derives w}
• The strings in LG are called GRAMMATICAL
• The strings not in LG are called UNGRAMMATICAL
NLE 24
Grammar development
• One of the most basic skills in NLE is the ability to write a CFG for some fragment of a language (e.g., the dates)
• We’ll briefly cover some of the issues to be addressed when writing small CFG grammars
CFG in PYTHON
• NLTK, 8.3
ANALISI SINTATTICA
• TOP-DOWN search: the parse tree has to be rooted in the start symbol S– EXPECTATION-DRIVEN parsing– Esempio; RECURSIVE DESCENT
• BOTTOM-UP search: the parse tree must be an analysis of the input– DATA-DRIVEN parsing– Esempio: SHIFT-REDUCE
TOP-DOWN PARSING CON NLTK
• Recursive descent parsing (NLTK, 8.3)– nltk.RecursiveDescentParser(grammar)– nltk.app.rdparser()
BOTTOM-UP PARSING CON NLTK
• Shift-reduce (NLTK, 8.3, p. 305)– nltk.app.srparser()– ShiftReduceParser(grammar)
MODELLI PIU’ AVANZATI DI PARSING
• Left corner (NLTK)• Chart (NLTK)
DEPENDENCIES E DEPENDENCY GRAMMAR (NLTK, 8.5)
IL PROBLEMA DELL’AMBIGUITA’
• Ambiguity – Church and Patel (1982): the number of
attachment ambiguities grows like the Catalan numbers
• C(2) = 2, C(3) = 5, C(4) = 14, C(5) = 132, C(6) = 469, C(7) = 1430, C(8) = 4867
• Avoiding reparsing
COMMON STRUCTURAL AMBIGUITIES
• COORDINATION ambiguity– OLD (MEN AND WOMEN) vs
(OLD MEN) AND WOMEN• ATTACHMENT ambiguity:
– Gerundive VP attachment ambiguity• I saw the Eiffel Tower flying to Paris
– PP attachment ambiguity• I shot an elephant in my pajamas
PP ATTACHMENT AMBIGUITY
AMBIGUITY: SOLUTIONS
• Use a PROBABILISTIC GRAMMAR (not covered in this module)
• Use semantics
SCRIVERE UNA GRAMMATICA
• NLTK, 8.6