A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica...
-
Upload
david-carter -
Category
Documents
-
view
217 -
download
1
Transcript of A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica...
![Page 1: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/1.jpg)
A Multilanguage Non-Projective A Multilanguage Non-Projective Dependency ParserDependency Parser
Giuseppe AttardiGiuseppe Attardi
Dipartimento di InformaticaDipartimento di Informatica
Università di PisaUniversità di Pisa
Università di Pisa
![Page 2: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/2.jpg)
Language and IntelligenceLanguage and Intelligence
““Understanding cannot be measured by Understanding cannot be measured by external behavior; it is an internal metric external behavior; it is an internal metric of how the brain remembers things and of how the brain remembers things and uses its memories to make predictions”.uses its memories to make predictions”.
““The difference between the intelligence of The difference between the intelligence of humans and other mammals is that we humans and other mammals is that we have language”.have language”.
Jeff Hawkins, “On Intelligence”, 2004
![Page 3: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/3.jpg)
Hawkins’ Memory-Prediction Hawkins’ Memory-Prediction frameworkframework
The brain uses vast amounts of The brain uses vast amounts of memory to create a model of the memory to create a model of the world. Everything you know and world. Everything you know and have learned is stored in this model. have learned is stored in this model. The brain uses this memory-based The brain uses this memory-based model to make continuous model to make continuous predictions of future events. It is the predictions of future events. It is the ability to make predictions about the ability to make predictions about the future that is the crux of intelligence. future that is the crux of intelligence.
![Page 4: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/4.jpg)
More …More …
““Spoken and written words are just patterns Spoken and written words are just patterns in the world…in the world…
The syntax and semantics of language are The syntax and semantics of language are not different from the hierarchical not different from the hierarchical structure of everyday objects.structure of everyday objects.
We associate spoken words with our We associate spoken words with our memory of their physical and semantic memory of their physical and semantic counterparts.counterparts.
Through language one human can invoke Through language one human can invoke memories and create next justapositions memories and create next justapositions of mental objects in another human.”of mental objects in another human.”
![Page 5: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/5.jpg)
ConclusionConclusion
Ability to process language should Ability to process language should be essential in many computer be essential in many computer applicationsapplications
![Page 6: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/6.jpg)
Why NLP is not needed in IR?Why NLP is not needed in IR?
Document retrieval as primary measure of Document retrieval as primary measure of information retrieval successinformation retrieval success
Document retrieval reduces the need for Document retrieval reduces the need for NLP techniquesNLP techniques– Discourse factors can be ignored– Query words perform word-sense
disambiguation
Lack of robustness:Lack of robustness:– NLP techniques are typically not as robust as
word indexing
![Page 7: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/7.jpg)
Question AnsweringQuestion Answering
![Page 8: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/8.jpg)
Question Answering from Open-Domain TextQuestion Answering from Open-Domain Text
Search Engines return list of Search Engines return list of (possibly) (possibly) relevantrelevant documents documents
Users still to have to dig through Users still to have to dig through returned list to find answerreturned list to find answer
QA: give the user a (short) QA: give the user a (short) answer to their question, perhaps answer to their question, perhaps supported by evidence supported by evidence
![Page 9: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/9.jpg)
The Google answer #1The Google answer #1
Include question words (why, who, Include question words (why, who, etc.) in stop-listetc.) in stop-list
Do standard IRDo standard IRSometimes this (sort of) works:Sometimes this (sort of) works:
– Question: Who was the prime minister of Australia during the Great Depression?
– Answer: James Scullin (Labor) 1929–31
![Page 10: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/10.jpg)
Page about Curtin (WW II Labor Prime Minister)(Can deduce answer)
Page about Curtin (WW II Labor Prime Minister)
(Lacks answer)
Page about Chifley(Labor Prime Minister)(Can deduce answer)
![Page 11: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/11.jpg)
But often it doesn’t…But often it doesn’t…
Question: Question: How much money did IBM How much money did IBM spend on advertising in 2002?spend on advertising in 2002?
Answer: Answer: I dunno, but I’d like to … I dunno, but I’d like to …
![Page 12: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/12.jpg)
The Google answer #2The Google answer #2
Take the question and try to find Take the question and try to find it as a string on the webit as a string on the web
Return the next sentence on that Return the next sentence on that web page as the answerweb page as the answer
Works brilliantly if this exact Works brilliantly if this exact question appears as a FAQ question appears as a FAQ question, etc.question, etc.
Works lousily most of the timeWorks lousily most of the timeBut, wait …But, wait …
![Page 13: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/13.jpg)
AskJeeves AskJeeves
AskJeeves was the most hyped example of AskJeeves was the most hyped example of “Question answering”“Question answering”– Have basically given up now: just web search except
when there are factoid answers of the sort MSN also does
It largely did pattern matching to match your It largely did pattern matching to match your question to their own knowledge base of question to their own knowledge base of questionsquestions
If that works, you get the human-curated answers If that works, you get the human-curated answers to that known questionto that known question
If that fails, it falls back to regular web searchIf that fails, it falls back to regular web search A potentially interesting middle ground, but a A potentially interesting middle ground, but a
fairly weak shadow of real QAfairly weak shadow of real QA
![Page 14: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/14.jpg)
Question Answering at TRECQuestion Answering at TREC
Consists of answering a set of 500 fact-Consists of answering a set of 500 fact-based questions, e.g. based questions, e.g. “When was Mozart “When was Mozart bornborn?”?”
Systems were allowed to return 5 ranked Systems were allowed to return 5 ranked answer snippets to each question.answer snippets to each question.– IR think– Mean Reciprocal Rank (MRR) scoring:
• 1, 0.5, 0.33, 0.25, 0.2, 0 for 1, 2, 3, 4, 5, 6+ doc
– Mainly Named Entity answers (person, place, date, …)
From 2002 systems are only allowed to From 2002 systems are only allowed to return a single return a single exactexact answer answer
![Page 15: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/15.jpg)
TREC 2000 Results (long)TREC 2000 Results (long)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
SMU
Queens
Wat
erlo
oIB
MLIM
SINTT IC
Pisa
MRR
![Page 16: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/16.jpg)
FalconFalcon
The Falcon system from SMU was by The Falcon system from SMU was by far best performing system at TREC far best performing system at TREC 20002000
It used NLP and performed deep It used NLP and performed deep semantic processingsemantic processing
![Page 17: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/17.jpg)
Question parseQuestion parse
Who was the first Russian astronaut to walk in space
WP VBD DT JJ NNP NP TO VB IN NN
NP NP
PP
VP
S
VP
S
![Page 18: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/18.jpg)
Question semantic formQuestion semantic form
astronaut
walk space
Russianfirst
PERSON
first(x) astronaut(x) Russian(x) space(z) walk(y, z, x) PERSON(x)
Question logic form:Question logic form:
Answer type
![Page 19: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/19.jpg)
TREC 2001: no NLPTREC 2001: no NLP
Best system from Insight Software Best system from Insight Software using surface patternsusing surface patterns
AskMSR uses a Web Mining AskMSR uses a Web Mining approach, by retrieving suggestions approach, by retrieving suggestions from Web searchesfrom Web searches
![Page 20: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/20.jpg)
Insight Sofware: Surface patterns approachInsight Sofware: Surface patterns approach
Best at TREC 2001: 0.68 MRRBest at TREC 2001: 0.68 MRR Use of Characteristic PhrasesUse of Characteristic Phrases ““When was <person> born”When was <person> born”
– Typical answers• “Mozart was born in 1756.”• “Gandhi (1869-1948)...”
– Suggests phrases (regular expressions) like• “<NAME> was born in <BIRTHDATE>”• “<NAME> ( <BIRTHDATE>-”
– Use of Regular Expressions can help locate correct answer
![Page 21: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/21.jpg)
AskMSR: Web MiningAskMSR: Web Mining
1 2
3
45
![Page 22: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/22.jpg)
Step 1: Rewrite queriesStep 1: Rewrite queries
Intuition: The user’s question is Intuition: The user’s question is often syntactically quite close to often syntactically quite close to sentences that contain the answersentences that contain the answer– Where is the Louvre Museum located?– The Louvre Museum is located in Paris
– Who created the character of Scrooge?– Charles Dickens created the character of
Scrooge.
![Page 23: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/23.jpg)
Query rewritingQuery rewriting
Classify question into seven categoriesClassify question into seven categories– Who is/was/are/were…?– When is/did/will/are/were …?– Where is/are/were …?
a. Category-specific transformation rulesa. Category-specific transformation ruleseg “For Where questions, move ‘is’ to all possible eg “For Where questions, move ‘is’ to all possible locations”locations”
““Where Where isis the Louvre Museum located” the Louvre Museum located” ““isis the Louvre Museum located” the Louvre Museum located” ““the the isis Louvre Museum located” Louvre Museum located” ““the Louvre the Louvre isis Museum located” Museum located” ““the Louvre Museum the Louvre Museum isis located” located” ““the Louvre Museum located the Louvre Museum located isis””
b. Expected answer “Datatype” (eg, Date, Person, Location, …)b. Expected answer “Datatype” (eg, Date, Person, Location, …)WhenWhen was the French Revolution? was the French Revolution? DATE DATE
Hand-crafted classification/rewrite/datatype rulesHand-crafted classification/rewrite/datatype rules
Nonsense,but whocares? It’sonly a fewmore queriesto Google.
![Page 24: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/24.jpg)
Step 2: Query search engineStep 2: Query search engine
Send all rewrites to a Web search Send all rewrites to a Web search engineengine
Retrieve top N answersRetrieve top N answersFor speed, rely just on search For speed, rely just on search
engine’s “snippets”, not the full text engine’s “snippets”, not the full text of the actual documentof the actual document
![Page 25: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/25.jpg)
Nevertheless …Nevertheless …
![Page 26: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/26.jpg)
NLP Technologies are usedNLP Technologies are used
Question Analysis:Question Analysis:– identify the semantic type of the
expected answer implicit in the queryNamed-Entity Detection:Named-Entity Detection:
– determine the semantic type of proper nouns and numeric amounts in text
![Page 27: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/27.jpg)
Parsing in QAParsing in QA
Top systems in TREC 2005 perform Top systems in TREC 2005 perform parsing of queries and answer parsing of queries and answer paragraphsparagraphs
Some use specially built parserSome use specially built parserParsers are slow: ~ 1min/sentenceParsers are slow: ~ 1min/sentence
![Page 28: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/28.jpg)
Parsing TechnologyParsing Technology
![Page 29: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/29.jpg)
Constituent ParsingConstituent Parsing
Requires Phrase Structure GrammarRequires Phrase Structure Grammar– CFG, PCFG, Unification Grammar
Produces phrase structure parse treeProduces phrase structure parse tree
Rolls-Royce Inc. said it expects its sales to remain steady
ADJP
VPNP
S
VP
S
NP
VP
NP
VP
![Page 30: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/30.jpg)
Statistical Methods in NLPStatistical Methods in NLP
Some NLP problems:– Information extraction
• Named entities, Relationships between entities, etc.
– Finding linguistic structure• Part-of-speech tagging, Chunking, Parsing
Can be cast as learning mapping:– Strings to hidden state sequences
• NE extraction, POS tagging
– Strings to strings• Machine translation
– Strings to trees• Parsing
– Strings to relational data structures• Information extraction
![Page 31: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/31.jpg)
TechniquesTechniques
– Log-linear (Maximum Entropy) taggers– Probabilistic context-free grammars
(PCFGs)– Discriminative methods:
• Conditional MRFs, Perceptron, Kernel methods
![Page 32: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/32.jpg)
Learning mappingLearning mapping
Strings to hidden state sequences– NE extraction, POS tagging
Strings to strings– Machine translation
Strings to trees– Parsing
Strings to relational data structures– Information extraction
![Page 33: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/33.jpg)
POS as TaggingPOS as Tagging
INPUT:
Profits soared at Boeing Co., easily topping forecasts on Wall Street.
OUTPUT:
Profits/N soared/V at/P Boeing/N Co./N ,/, easily/ADV topping/V forecasts/N on/P Wall/N Street/N ./.
![Page 34: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/34.jpg)
NE as TaggingNE as Tagging
INPUT:
Profits soared at Boeing Co., easily topping forecasts on Wall Street.
OUTPUT:
Profits/O soared/O at/O Boeing/BC Co./IC ,/O easily/O topping/O forecasts/O on/NA Wall/BL Street/IL ./O
![Page 35: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/35.jpg)
Statistical ParsersStatistical Parsers
Probabilistic Generative Model of Probabilistic Generative Model of Language which include parse Language which include parse structure (e.g. Collins 1997)structure (e.g. Collins 1997)– Learning consists in estimating the
parameters of the model with simple likelihood based techniques
Conditional parsing models Conditional parsing models (Charniak 2000; McDonald 2005)(Charniak 2000; McDonald 2005)
![Page 36: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/36.jpg)
ResultsResults
Method Accuracy
PCFGs (Charniak 97) 73.0%
Conditional Models – Decision Trees (Magerman 95) 84.2%
Lexical Dependencies (Collins 96) 85.5%
Conditional Models – Logistic (Ratnaparkhi 97) 86.9%
Generative Lexicalized Model (Charniak 97) 86.7%
Generative Lexicalized Model (Collins 97) 88.2%
Logistic-inspired Model (Charniak 99) 89.6%
Boosting (Collins 2000) 89.8%
![Page 37: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/37.jpg)
Linear Models for Parsing and TaggingLinear Models for Parsing and Tagging
Three components:GEN is a function from a string to a set of
candidates
maps a candidate to a feature vector
W is a parameter vector
![Page 38: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/38.jpg)
Component 1: GENComponent 1: GEN
GEN enumerates a set of candidates for a sentence
She announced a program to promote safety in trucks and vans
GEN
![Page 39: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/39.jpg)
Examples of GENExamples of GEN
A context-free grammarA finite-state machineTop N most probable analyses from
a probabilistic grammar
![Page 40: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/40.jpg)
Component 2: Component 2:
maps a candidate to a feature vector Rd
defines the representation of a candidate
<1, 0, 2, 0, 0, 15, 5>
![Page 41: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/41.jpg)
FeatureFeature
A “feature” is a function on a structure, e.g.,
h(x) = Number of times is seen in x
Feature vector:Feature vector:
A set of functions h1…hd define a feature vector
(x) = <h1(x), h2(x) … hd(x)>
A
B C
![Page 42: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/42.jpg)
Component 3: Component 3: WW
W is a parameter vector Rd
. W map a candidate to a real-valued score
![Page 43: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/43.jpg)
Putting it all togetherPutting it all together
X is set of sentences, Y is set of possible outputs (e.g. trees)
Need to learn a function : X → Y GEN, , W define
Choose the highest scoring tree as the most plausible structure
WyxFxGENy
)(argmax)()(
![Page 44: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/44.jpg)
Constituent ParsingConstituent Parsing
Requires GrammarRequires Grammar– CFG, PCFG, Unification Grammar
Produces phrase structure parse treeProduces phrase structure parse tree
Rolls-Royce Inc. said it expects its sales to remain steady
ADJP
VPNP
S
VP
S
NP
VP
NP
VP
![Page 45: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/45.jpg)
Dependency TreeDependency Tree
Word-word dependency relationsWord-word dependency relationsFar easier to understand and to Far easier to understand and to
annotateannotate
Rolls-Royce Inc. said it expects its sales to remain steady
![Page 46: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/46.jpg)
Inductive Dependency ParserInductive Dependency Parser
Traditional statistical parsers are Traditional statistical parsers are trained directly on the trained directly on the task of task of tagging a sentencetagging a sentence
Instead an Inductive Parser is trained Instead an Inductive Parser is trained and and learns the sequence of parse learns the sequence of parse actionsactions required to build the parse required to build the parse treetree
![Page 47: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/47.jpg)
Grammar Not RequiredGrammar Not Required
A traditional parser requires a A traditional parser requires a grammar for generating candidate grammar for generating candidate treestrees
An inductive parser needs no An inductive parser needs no grammargrammar
![Page 48: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/48.jpg)
Parsing as ClassificationParsing as Classification
Inductive dependency parsingInductive dependency parsingParsing based on Shift/Reduce Parsing based on Shift/Reduce
actionsactionsLearn from annotated corpus which Learn from annotated corpus which
action to perform at each stepaction to perform at each step
![Page 49: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/49.jpg)
Parser ActionsParser ActionsRight Ho
VER:auxvisto
VER:pperunaDET
ragazzaNOM
conPRE
gliDET
occhialiNOM
.POS
nexttop
Shift
Left
![Page 50: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/50.jpg)
Dependency GraphDependency Graph
Let Let RR = { = {rr11, … , , … , rrmm}} be the set of permissible be the set of permissible dependency typesdependency types
A dependency graph for a string of wordsA dependency graph for a string of words
WW = = ww11 … … wwnn is a labeled directed graph is a labeled directed graph
D = (W, A)D = (W, A), where, where(a) (a) WW is the set of nodes, i.e. word tokens in is the set of nodes, i.e. word tokens in
the input string,the input string,
(b) (b) AA is a set of labeled arcs is a set of labeled arcs ((wwii, , rr, , wwjj),),wwii, , wwjj WW, , rr RR,,
(c) (c) wwjj WW, there is at most one arc, there is at most one arc((wwii, , rr, , wwjj) ) AA..
![Page 51: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/51.jpg)
Parser StateParser State
The parser state is a quadrupleThe parser state is a quadrupleSS, , II, , TT, , AA, where, whereS is a stack of partially processed tokensI is a list of (remaining) input tokensT is a stack of temporary tokensA is the arc relation for the dependency
graph
(w, r, h) A represents an arc w → h, tagged with dependency r
![Page 52: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/52.jpg)
Parser ActionsParser Actions
ShiftShiftSS, , nn||II, , TT, , AA
nn||SS, , II, , TT, , AA
RightRightss||SS, , nn||II, , TT, , AA
SS, , nn||II, , TT, , AA{({(ss, , rr, , nn)})}
LeftLeftss||SS, , nn||II, , TT, , AA
SS, , ss||II, , TT, , AA{({(nn, , rr, , ss)})}
![Page 53: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/53.jpg)
Parser AlgorithmParser Algorithm
The parsing algorithm is fully The parsing algorithm is fully deterministic and works as follows:deterministic and works as follows:Input Sentence: (w1, p1), (w2, p2), … ,
(wn, pn) S = <> T = <(w1, p1), (w2, p2), … , (wn, pn)> L = <> while T != <> do beginx = getContext(S, T, L);y = estimateAction(model, x);performAction(y, S, T, L); end
![Page 54: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/54.jpg)
Learning PhaseLearning Phase
![Page 55: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/55.jpg)
Learning FeaturesLearning Features
feature Value
W word
L lemma
P part of speech (POS) tag
M morphology: e.g. singular/plural
W< word of the leftmost child node
L< lemma of the leftmost child node
P< POS tag of the leftmost child node, if present
M< whether the rightmost child node is singular/plural
W> word of the rightmost child node
L> lemma of the rightmost child node
P> POS tag of the rightmost child node, if present
M> whether the rightmost child node is singular/plural
![Page 56: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/56.jpg)
Learning EventLearning Event
leggiNOM
leDET
antiADV
chePRO
,PON
SerbiaNOM
eranoVER
discusseADJ
chePRO
SostenevaVER
context
left context target nodes right context
(-3, W, che), (-3, P, PRO),(-2, W, leggi), (-2, P, NOM), (-2, M, P), (-2, W<, le), (-2, P<, DET), (-2, M<, P),(-1, W, anti), (-1, P, ADV),(0, W, Serbia), (0, P, NOM), (0, M, S),(+1, W, che), ( +1, P, PRO), (+1, W>, erano), (+1, P>, VER), (+1, M>, P),(+2, W, ,), (+2, P, PON)
![Page 57: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/57.jpg)
Parser ArchitectureParser Architecture
Modular learners architecture:Modular learners architecture:– MaxEntropy, MBL, SVM, Winnow,
PerceptronFeatures can be selectedFeatures can be selected
![Page 58: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/58.jpg)
Feature used in ExperimentsFeature used in Experiments
LemmaFeatures -2 -1 0 1 2 3LemmaFeatures -2 -1 0 1 2 3PosFeatures -2 -1 0 1 2 3PosFeatures -2 -1 0 1 2 3MorphoFeatures -1 0 1 2MorphoFeatures -1 0 1 2DepFeatures -1 0DepFeatures -1 0PosLeftChildren 2PosLeftChildren 2PosLeftChild -1 0PosLeftChild -1 0DepLeftChild -1 0DepLeftChild -1 0PosRightChildren 2PosRightChildren 2PosRightChild -1 0PosRightChild -1 0DepRightChild -1DepRightChild -1PastActions 1PastActions 1
![Page 59: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/59.jpg)
ProjectivityProjectivity
An arc An arc wwii→→wwkk is projective iff is projective iff
jj, , ii < < jj < < kk or or i i > > jj > > kk,,wwii →*→* wwkk
A dependency tree is projective iff A dependency tree is projective iff every arc is projectiveevery arc is projective
Intuitively: arcs can be drawn on a Intuitively: arcs can be drawn on a plane without intersectionsplane without intersections
![Page 60: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/60.jpg)
Non ProjectiveNon Projective
Většinu těchto přístrojů lze take používat nejen jako fax , ale
![Page 61: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/61.jpg)
Actions for non-projective arcsActions for non-projective arcs
Right2Right2ss11||ss22||SS, , nn||II, , TT, , AA
ss11||SS, , nn||II, , TT, , AA{({(ss22, , rr, , nn)})}
Left2Left2ss11||ss22||SS, , nn||II, , TT, , AA
ss22||SS, , ss11||II, , TT, , AA{({(nn, , rr, , ss22)})}
Right3Right3ss11||ss22||ss33||SS, , nn||II, , TT, , AA
ss11||ss22||SS, , nn||II, , TT, , AA{({(ss33, , rr, , nn)})}
Left3Left3ss11||ss22||ss33||SS, , nn||II, , TT, , AA
ss22||ss33||SS, , ss11||II, , TT, , AA{({(nn, , rr, , ss33)})}
ExtractExtractss11||ss22||SS, , nn||II, , TT, , AA
nn||ss11||SS, , II, , ss22||TT, , AA
InsertInsertSS, , II, , ss11||TT, , AA
ss11||SS, , II, , TT, , AA
![Page 62: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/62.jpg)
ExampleExample
Right2Right2 ( (nejennejen → → aleale) and ) and Left3Left3 ( (faxfax → → VětšinuVětšinu) )
Většinu těchto přístrojů lze take používat nejen jako fax , ale
![Page 63: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/63.jpg)
ExamplesExamples
zou gemaakt moeten worden in
zou moeten worden gemaakt in
Extract followed by Insert
![Page 64: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/64.jpg)
ExperimentsExperiments
three classifiers: one to decide three classifiers: one to decide between Shift/Reduce, one to between Shift/Reduce, one to decide which Reduce action and a decide which Reduce action and a third one to chose the dependency third one to chose the dependency in case of Left/Right actionin case of Left/Right action
two classifiers: one to decide which two classifiers: one to decide which action to perform and a second one action to perform and a second one to chose the dependency in case of to chose the dependency in case of Left/Right actionLeft/Right action
![Page 65: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/65.jpg)
CoNLL-X Shared TaskCoNLL-X Shared Task
To assign labeled dependency structures To assign labeled dependency structures for a range of languages by means of a for a range of languages by means of a fully automatic dependency parserfully automatic dependency parser
Input: tokenized and tagged sentencesInput: tokenized and tagged sentences Tags: token, lemma, POS, morpho Tags: token, lemma, POS, morpho
features, ref. to head, dependency labelfeatures, ref. to head, dependency label For each token, the parser must output its For each token, the parser must output its
head and the corresponding dependency head and the corresponding dependency relationrelation
![Page 66: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/66.jpg)
CoNLL-X: Data FormatCoNLL-X: Data Format
NN WORDWORD LEMMALEMMA CPOSCPOS POSPOS FEATSFEATS HEADHEAD DEPREL PHEAD PDEPRELDEPREL PHEAD PDEPREL
11 AA oo artart artart <artd>|F|S<artd>|F|S 22 >N>N __ __22 direcçãodirecção direcçãodirecção nn nn F|SF|S 44 SUBJSUBJ __ __33 jájá jájá advadv advadv __ 44 ADVLADVL __ __44 mostroumostrou mostrarmostrar vv v-finv-fin PS|3S|INDPS|3S|IND 00 STASTA __ __55 boa_vontade boa_vontadeboa_vontade boa_vontade nn nn F|SF|S 44 ACCACC __ __66 ,, ,, puncpunc puncpunc __ 44 PUNCPUNC __ __77 masmas masmas conjconj conj-cconj-c <co-vfin>|<co-fmc><co-vfin>|<co-fmc> 44 COCO __ __88 aa oo artart artart <artd>|F|S<artd>|F|S 99 >N>N __ __99 grevegreve grevegreve nn nn F|SF|S 1010 SUBJSUBJ __ __1010 prossegueprossegue prosseguirprosseguir vv v-finv-fin PR|3S|INDPR|3S|IND 44 CJTCJT __ __1111 emem emem prpprp prpprp __ 1010 ADVLADVL __ __1212 todas_astodas_as todo_otodo_o pronpron pron-detpron-det <quant>|F|P<quant>|F|P 1313 >N>N __ __1313 delegaçõesdelegações delegaçõodelegaçõo nn nn F|PF|P 1111 P<P< __ __1414 dede dede prpprp prpprp <sam-><sam-> 1313 N<N< __ __1515 oo oo artart artart <-sam>|<artd>|M|S<-sam>|<artd>|M|S 1616 >N>N __ __1616 paíspaís paíspaís nn nn M|SM|S 1414 P<P< __ __1717 .. .. puncpunc puncpunc __ 44 PUNCPUNC __ __
NN WORDWORD LEMMALEMMA CPOSCPOS POSPOS FEATSFEATS HEADHEAD DEPREL PHEAD PDEPRELDEPREL PHEAD PDEPREL
11 AA oo artart artart <artd>|F|S<artd>|F|S 22 >N>N __ __22 direcçãodirecção direcçãodirecção nn nn F|SF|S 44 SUBJSUBJ __ __33 jájá jájá advadv advadv __ 44 ADVLADVL __ __44 mostroumostrou mostrarmostrar vv v-finv-fin PS|3S|INDPS|3S|IND 00 STASTA __ __55 boa_vontade boa_vontadeboa_vontade boa_vontade nn nn F|SF|S 44 ACCACC __ __66 ,, ,, puncpunc puncpunc __ 44 PUNCPUNC __ __77 masmas masmas conjconj conj-cconj-c <co-vfin>|<co-fmc><co-vfin>|<co-fmc> 44 COCO __ __88 aa oo artart artart <artd>|F|S<artd>|F|S 99 >N>N __ __99 grevegreve grevegreve nn nn F|SF|S 1010 SUBJSUBJ __ __1010 prossegueprossegue prosseguirprosseguir vv v-finv-fin PR|3S|INDPR|3S|IND 44 CJTCJT __ __1111 emem emem prpprp prpprp __ 1010 ADVLADVL __ __1212 todas_astodas_as todo_otodo_o pronpron pron-detpron-det <quant>|F|P<quant>|F|P 1313 >N>N __ __1313 delegaçõesdelegações delegaçõodelegaçõo nn nn F|PF|P 1111 P<P< __ __1414 dede dede prpprp prpprp <sam-><sam-> 1313 N<N< __ __1515 oo oo artart artart <-sam>|<artd>|M|S<-sam>|<artd>|M|S 1616 >N>N __ __1616 paíspaís paíspaís nn nn M|SM|S 1414 P<P< __ __1717 .. .. puncpunc puncpunc __ 44 PUNCPUNC __ __
![Page 67: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/67.jpg)
CoNLL-X: LanguagesCoNLL-X: Languages
The same parser should handle all The same parser should handle all languageslanguages
13 languages:13 languages:– Arabic, Bulgaria, Chinese, Czech,
Danish, Dutch, Japanese, German, Portuguese, Slovene, Spanish, Swedish, Turkish
![Page 68: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/68.jpg)
CoNLL-X: CollectionsCoNLL-X: Collections
Ar Cn Cz Dk Du De Jp Pt Sl Sp Se Tr Bu
K tokens 54 337 1,249 94 195 700 151 207 29 89 191 58 190
K sents 1.5 57.0 72.7 5.2 13.3 39.2 17.0 9.1 1.5 3.3 11.0 5.0 12.8
Tokens/sentence 37.2 5.9 17.2 18.2 14.6 17.8 8.9 22.8 18.7 27.0 17.3 11.5 14.8
CPOSTAG 14 22 12 10 13 52 20 15 11 15 37 14 11
POSTAG 19 303 63 24 302 52 77 21 28 38 37 30 53
FEATS 19 0 61 47 81 0 4 146 51 33 0 82 50
DEPREL 27 82 78 52 26 46 7 55 25 21 56 25 18
% non-project. relations
0.4 0.0 1.9 1.0 5.4 2.3 1.1 1.3 1.9 0.1 1.0 1.5 0.4
% non-project. sentences
11.2 0.0 23.2 15.6 36.4 27.8 5.3 18.9 22.2 1.7 9.8 11.6 5.4
![Page 69: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/69.jpg)
CoNLL: Evaluation MetricsCoNLL: Evaluation Metrics
Labeled Attachment Score (LAS)Labeled Attachment Score (LAS)– proportion of “scoring” tokens that are
assigned both the correct head and the correct dependency relation label
Unlabeled Attachment Score (UAS)Unlabeled Attachment Score (UAS)– proportion of “scoring” tokens that are
assigned the correct head
![Page 70: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/70.jpg)
CoNLL-X Shared Task ResultsCoNLL-X Shared Task Results
Language
Maximum Entropy MBL
LAS%
UAS%
Trainsec
Parsesec
LAS%
UAS%
Trainsec
Parsesec
Arabic 56.43 70.96 181 2.6 59.70 74.69 24 950
Bulgarian 81.15 86.71 452 1.5 79.17 85.92 88 353
Chinese 81.19 86.10 1,156 1.8 72.17 83.08 540 478
Czech 62.10 73.44 13,800 12.8 69.20 80.22 496 13,500
Danish 75.25 80.96 386 3.2 76.13 83.65 52 627
Dutch 67.79 72.71 679 3.3 68.97 74.73 132 923
Japanese 84.17 87.15 129 0.8 83.39 86.73 44 97
German 75.88 80.25 9,315 4.3 79.79 84.31 1,399 3,756
Portuguese 79.40 87.58 1,044 4.9 80.97 87.74 160 670
Slovene 61.97 73.18 98 3.0 62.67 76.60 16 547
Spanish 72.35 76.06 204 2.4 74.37 79.70 54 769
Swedish 75.20 83.03 1,424 2.9 74.85 83.73 96 1,177
Turkish 49.27 65.29 177 2.3 47.58 65.25 43 727
![Page 71: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/71.jpg)
CoNLL-X: Overall ResultsCoNLL-X: Overall Results
LAS UAS
Average Ours Average OursArabic 59.94 59.70 73.48 74.69Bulgarian 79.98 81.15 85.89 86.71Chinese 78.32 81.19 84.85 86.10Czech 67.17 69.20 77.01 80.22Danish 78.31 76.13 84.52 83.65Dutch 70.73 68.97 75.07 74.73Japanese 85.86 84.17 89.05 87.15German 78.58 79.79 82.60 84.31Portuguese 80.63 80.97 86.46 87.74Slovene 65.16 62.67 76.53 76.60Spanish 73.52 74.37 77.76 79.70Swedish 76.44 74.85 84.21 83.73Turkish 55.95 49.27 69.35 65.29
Average scores from 36 participant submissions
Average scores from 36 participant submissions
![Page 72: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/72.jpg)
Well-formed Parse TreeWell-formed Parse Tree
A graph D = (W, A) is well-formed iff it is acyclic, projective and connected
![Page 73: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/73.jpg)
Multiple HeadsMultiple Heads
Examples include:– verb coordination in which the subject
or object is an argument of several verbs
– relative clauses in which words must satisfy dependencies both inside and outside the clause
![Page 74: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/74.jpg)
ExamplesExamples
Il governo garantirà sussidi a coloro che cercheranno lavoro
He designs and develops programs
![Page 75: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/75.jpg)
SolutionSolution
Il governo garantirà sussidi a coloro che cercheranno lavoro
He designs and develops programs
N<PRED
SUBJSUBJ ACCACC
![Page 76: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/76.jpg)
Italian TreebankItalian Treebank
Using SI-TAL collection from CNR Using SI-TAL collection from CNR ILCILC
Annotations split into separate Annotations split into separate morpho & functional filesmorpho & functional files
Not all tokens have relations, some Not all tokens have relations, some have more than one, no accents, …have more than one, no accents, …
Implemented some heuristics to Implemented some heuristics to generate an corpus in CoNLL formatgenerate an corpus in CoNLL format
Tool for visualization and annotationTool for visualization and annotation
![Page 77: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/77.jpg)
DgAnnotatorDgAnnotator
A GUI tool for:A GUI tool for:– Annotating texts with dependency relations– Visualizing and comparing trees– Generating corpora in XML or CoNLL format– Exporting DG trees to PNG
DemoDemo Available at: Available at: http://http://
medialab.di.unipi.it/Project/QA/Parser/DgAmedialab.di.unipi.it/Project/QA/Parser/DgAnnotatornnotator//
![Page 78: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/78.jpg)
Future DirectionsFuture Directions
Opinion ExtractionOpinion Extraction– Finding opinions (positive/negative)– Blog track in TREC2006
Intent AnalysisIntent Analysis– Determine author intent, such as:
problem (description, solution), agreement (assent, dissent), preference (likes, dislikes), statement (claim, denial)
![Page 79: A Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.](https://reader036.fdocuments.net/reader036/viewer/2022070408/56649e625503460f94b5e2aa/html5/thumbnails/79.jpg)
ReferencesReferences
G. Attardi. 2006. Experiments with a G. Attardi. 2006. Experiments with a Multilanguage Non-projective Dependency Multilanguage Non-projective Dependency Parser. In Proc. CoNLL-X.Parser. In Proc. CoNLL-X.
H. Yamada, Y. Matsumoto. 2003. Statistical H. Yamada, Y. Matsumoto. 2003. Statistical Dependency Analysis with Support Vector Dependency Analysis with Support Vector Machines. In Machines. In Proc. Proc. IWPT.IWPT.
M. T. Kromann. 2001. Optimality parsing and local cost functions in discontinuous grammars. In Proc. FG-MOL.