Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen...
Transcript of Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen...
![Page 1: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/1.jpg)
Open Information Extraction Systems and Downstream Applications
Joint work with Oren Etzioni, Stephen Soderland, Michael Schmitz,
Ido Dagan, Ganesh Ramakrishnan, Sunita Sarawagi, Parag Singla,Niranjan Balasubramanian, Robert Bart, Janara Christensen,
Danish Contractor, Anthony Fader, Aman Madaan, Ashish Mittal, Harinder Pal, Abhishek Yadav, Gabriel Stanovsky
MausamComputer Science and Engineering
Indian Institute of TechnologyNew Delhi, India
![Page 2: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/2.jpg)
“The Internet is the world’s largest library. It’s just that all the books are on the floor.”
- John Allen Paulos
~20 Trillion URLs (Google)
2
![Page 3: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/3.jpg)
Overview
KBFactExtraction
![Page 4: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/4.jpg)
Closed KBs
“Apple’s founder Steve jobs died of cancer following a…”
rel:founder_of(Apple, Steve Jobs)
Closed IE
Populating information wrt a given ontology from natural language text
rel:founder_of(Apple, Steve Jobs)
(Google, Larry Page)(Facebook, Mark Zuckerberg)
…
rel:acquisition(Google, DeepMind)
(IBM, uStream)(Microsoft, Powerset)
…
![Page 5: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/5.jpg)
Lessons from DB/KR Research
• Large-scale Ontologies
– expensive to create
– very hard to maintain
– conflict with distributed authorship
• KBs are brittle: “can only be used for tasks whose knowledge needs have been anticipated in advance” (Halevy IJCAI ‘03)
5
![Page 6: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/6.jpg)
Open KBs
“When Saddam Hussain invaded Kuwait in 1990, the international..”
(Saddam Hussain, invaded, Kuwait)
Open IE
Broad-coverage KBs with light-weight structure
(no annotation per relation)
(Google, acquired, Youtube)(Oranges, contain, Vitamin C)
(Edison, invented, phonograph)…
![Page 8: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/8.jpg)
Overview
KBFactExtraction
![Page 9: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/9.jpg)
Open Information Extraction
• 2007: Textrunner (~Open IE 1.0)– CRF and self-training
• 2010: ReVerb (~Open IE 2.0)– POS-based relation pattern
• 2012: OLLIE (~Open IE 3.0)– Dep-parse based extraction; nouns; attribution
• 2014: Open IE 4.0– SRL-based extraction; temporal, spatial…
• 2016 [@IITD]: Open IE 5.0– compound noun phrases, numbers, lists
increasingprecision,recall,expressiveness
![Page 10: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/10.jpg)
Fundamental Hypothesis
∃ semantically tractable subset of English
• Characterized relations & arguments via POS
• Characterization is compact, domain independent
• Covers 85% of binary relations in sample
12
![Page 11: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/11.jpg)
ReVerb
Identify Relations from Verbs.
1. Find longest phrase matching a simple syntactic constraint:
13
![Page 12: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/12.jpg)
14
Sample of ReVerb Relations
invented acquired by has a PhD in
inhibits tumor
growth invoted in favor of won an Oscar for
has a maximum
speed of
died from
complications ofmastered the art of
gained fame asgranted political
asylum to
is the patron
saint of
was the first
person to
identified the cause
ofwrote the book on
![Page 13: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/13.jpg)
DARPA MR Domains <50
NYU, Yago <100
NELL ~500
DBpedia 3.2 940
PropBank 3,600
VerbNet 5,000
WikiPedia InfoBoxes, f > 10 ~5,000
TextRunner (phrases) 100,000+
ReVerb (phrases) 1,500,000+
15
NUMBER OF RELATIONS
Number of Relations
![Page 14: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/14.jpg)
ReVerb: Error Analysis
• Larry Page, the CEO of Google, talks about multi-screen opportunities offered by Google.
• After winning the Superbowl, the Giants are now the top dogs of the NFL.
• Ahmadinejad was elected as the new President of Iran.
OLLIE: Open Language Learningfor Information Extraction
![Page 15: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/15.jpg)
ReVerb
Seed Tuples
Training Data
Open PatternLearning
Bootstrapper
Pattern Templates
Pattern Matching Context AnalysisSentence Tuples Ext. Tuples
Extraction
Learning
![Page 16: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/16.jpg)
ReVerb
Seed Tuples
Training Data
Open PatternLearning
Bootstrapper
Pattern Templates
Pattern Matching Context AnalysisSentence Tuples Ext. Tuples
Extraction
Learning
![Page 17: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/17.jpg)
Bootstrapping Approach
Other Syntactic rels
Verb-basedrelations
Semantic rels
![Page 18: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/18.jpg)
Bootstrapping Approach
Other Syntactic rels
Verb-basedrelations
Reverb’sVerb-basedrelations
Semantic rels
Federer is coached by Paul Annacone.
![Page 19: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/19.jpg)
Bootstrapping Approach
Other Syntactic rels
Verb-basedrelations
Reverb’sVerb-basedrelations
Semantic rels
Federer is coached by Paul Annacone.
Now coached by Paul Annacone, Federer has …
![Page 20: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/20.jpg)
Bootstrapping Approach
Other Syntactic rels
Verb-basedrelations
Reverb’sVerb-basedrelations
Semantic rels
Federer is coached by Paul Annacone.
Now coached by Paul Annacone, Federer has …
Paul Annacone, the coach of Federer,
![Page 21: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/21.jpg)
Bootstrapping Approach
Other Syntactic rels
Verb-basedrelations
Reverb’sVerb-basedrelations
Semantic rels
Federer is coached by Paul Annacone.
Now coached by Paul Annacone, Federer has …
Paul Annacone, the coach of Federer,
Federer hired Annacone as his new coach.
![Page 22: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/22.jpg)
0.5
0.6
0.7
0.8
0.9
1
0 100 200 300 400 500 600
OLLIE
ReVerb
WOE
Yield
Pre
cisi
on
parse
Evaluation[Mausam, Schmitz, Bart, Soderland, Etzioni - EMNLP’12]
![Page 23: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/23.jpg)
Context Analysis
“Early astronomers believed that the earth is the center of the universe.”
(earth, is the center of, universe)
“If she wins California, Hillary will be the nominated presidential candidate.”
(Hillary, will be nominated, presidential candidate)
“John refused to visit Vegas.”
(John, visit, Vegas)
![Page 24: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/24.jpg)
Context Analysis
“Early astronomers believed that the earth is the center of the universe.”
[(earth, is the center of, universe) Attribution: early astronomers]
“If she wins California, Hillary will be the nominated presidential candidate.”
[(Hillary, will be nominated, presidential candidate) Modifier: if she wins California]
“John refused to visit Vegas.”
(John, refused to visit, Vegas)
![Page 25: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/25.jpg)
Open Information Extraction
• 2007: Textrunner (~Open IE 1.0)– CRF and self-training
• 2010: ReVerb (~Open IE 2.0)– POS-based relation pattern
• 2012: OLLIE (~Open IE 3.0)– Dep-parse based extraction; nouns; attribution
• 2014: Open IE 4.0– SRL-based extraction; temporal, spatial…
• 2016 [@IITD]: Open IE 5.0– compound noun phrases, numbers, lists
increasingprecision,recall,expressiveness
![Page 26: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/26.jpg)
Compound Noun Extraction[Pal & Mausam - AKBC’16]
• NIH Director Francis Collins
(Francis Collins, is the Director of, NIH)
• Challenges
– New York Banker Association
– German Chancellor Angela Merkel
– Prime Minister Modi
– GM Vice Chairman Bob Lutz
ORG NAMES
DEMONYMS
COMPOUND RELATIONAL NOUNS
![Page 27: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/27.jpg)
Numerical IE[Madaan, Mittal, Mausam, Ramakrishnan, Sarawagi AAAI’16]
“Venezuela with its inflation rate 96% is suffering from a major…”
(Venezuela, inflation rate, 96 %)
Numerical Open IE
![Page 28: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/28.jpg)
Overview
KBFactExtraction
Useful for- Summarization- Comparisons- Interaction w/ data
![Page 29: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/29.jpg)
Extractions: a great way to summarize
![Page 30: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/30.jpg)
Extractions: a great way to compare[Contractor, Mausam, Singla - NAACL’16]
![Page 31: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/31.jpg)
Extractions: a great way to compare[Contractor, Mausam, Singla - NAACL’16]
![Page 32: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/32.jpg)
Extractions: a great way to compare[Contractor, Mausam, Singla - NAACL’16]
![Page 33: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/33.jpg)
Overview
KBFactExtraction
![Page 34: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/34.jpg)
Pipeline for Semantics Tasks
Tokenization LemmatizationPOS
TaggingChunking
Syntactic ParsingNERSRLInfo
Extraction
Semantics Tasks(WSD, QA, Sentiment, Coreference, Events, …)
![Page 35: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/35.jpg)
Pipeline for Semantics Tasks
Semantics Tasks(WSD, QA, Sentiment, Coreference, Events, …)
Sentence Preprocessing
![Page 36: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/36.jpg)
Abstracted Pipeline
Preprocessing
Task-specific model/features
Sentence
Intermediate Structure
Semantics Tasks
(words, parses, SRL frames)
![Page 37: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/37.jpg)
Research Question[Stanovsky, Dagan, Mausam - ACL’15]
Can Open IE be useful as an alternative intermediate structure?
![Page 38: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/38.jpg)
Lexical Similarity/Analogies
• We experiment by switching representations
– We compute Open IE based embeddings instead of lexical or syntactic context-based embeddings
“John refused to visit Vegas”
![Page 39: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/39.jpg)
Results
• Lexical similarity
Functionalsimilarity
Near-state-of-artfor the amount of training data
![Page 40: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/40.jpg)
Results
• Lexical analogy a:a* :: b:b*?
Open IE Lexical Deps SRL
Google(Add) 0.714 0.651 0.34 0.352
Google (Mult) 0.729 0.656 0.367 0.362
MSR (Add) 0.529 0.438 0.4 0.389
MSR (Mult) 0.55 0.455 0.434 0.406
State of the art
![Page 41: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/41.jpg)
Why does Open IE do better?
• Word Analogy– Captures domain and functional similarity
(gentlest: gentler), (loudest:?)
• Lexical: higher-pitched
• Syntactic: thinnest
• SRL: unbelievable
• Open-IE: louder
X
X
[Domain Similar]
[Functionally Similar]
X [Functionally Similar?]
![Page 42: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/42.jpg)
Other NLP Applications
• Atomic relations Event schemas
• Unsupervised sentence similarity
• Reading comprehension tasks
• Open IE Closed IE– Software (OREO):
http://homes.cs.washington.edu/~mausam/software.html
![Page 43: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/43.jpg)
Future Work
KBFactExtraction
Inference
![Page 44: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/44.jpg)
Key Future Direction
• Large-scale inference over Open IE
(iron, is a good conductor of, electricity)
(iron nail, conducts, electricity)
(David Beckham, was born in, London)
(David Beckham, was born in, England)
![Page 45: Open Information Extraction Systems and Downstream ...mausam/papers/ijcai16-earlycareer.pdfOpen Information Extraction Systems and Downstream Applications Joint work with Oren Etzioni,](https://reader033.fdocuments.net/reader033/viewer/2022060320/5f0ce4cc7e708231d437a7c0/html5/thumbnails/45.jpg)
Thanks