Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging...

89
Leveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson Premkumar, Chris Manning Stanford University July 27, 2015 Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 0 / 24

Transcript of Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging...

Page 1: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Leveraging Linguistic Structure for Open DomainInformation Extraction

Gabor Angeli, Melvin Johnson Premkumar, Chris Manning

Stanford University

July 27, 2015

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 0 / 24

Page 2: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Motivation: Question Answering

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 1 / 24

Page 3: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Motivation: Question Answering

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 1 / 24

Page 4: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Motivation: Question Answering

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 1 / 24

Page 5: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Motivation: Question Answering

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 1 / 24

Page 6: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Information [Relation] Extraction

Input: Sentences containing (subject, object).Output: Relation between subject and object.

I ’m Australian =⇒ per:origin

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 2 / 24

Page 7: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Information [Relation] Extraction

Input: Sentences containing (subject, object).Output: Relation between subject and object.

I ’m Australian =⇒ per:origin

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 2 / 24

Page 8: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Information [Relation] Extraction

Input: Sentences containing (subject, object).Output: Relation between subject and object.

I ’m Australian =⇒ per:origin

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 2 / 24

Page 9: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Information [Relation] Extraction

Input: Sentences containing (subject, object).Output: Relation between subject and object.

I ’m Australian =⇒ per:origin

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 2 / 24

Page 10: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

What about...

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 3 / 24

Page 11: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

What about...

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 3 / 24

Page 12: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

What about...

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 3 / 24

Page 13: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

What about...

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 3 / 24

Page 14: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Open Information Extraction

More to life than a fixed relation schema

(Chris, taught at, Carnegie Mellon)(Chris, taught at, University of Sydney)(his research, is on, A broad range of statistical natural language topics)(Obama, was born in, Hawaii)(young rabbits, drink, milk)(Heinz Fischer, visits, United States)

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 4 / 24

Page 15: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Open Information Extraction

More to life than a fixed relation schema

(Chris, taught at, Carnegie Mellon)(Chris, taught at, University of Sydney)(his research, is on, A broad range of statistical natural language topics)

(Obama, was born in, Hawaii)(young rabbits, drink, milk)(Heinz Fischer, visits, United States)

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 4 / 24

Page 16: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Open Information Extraction

More to life than a fixed relation schema

(Chris, taught at, Carnegie Mellon)(Chris, taught at, University of Sydney)(his research, is on, A broad range of statistical natural language topics)(Obama, was born in, Hawaii)(young rabbits, drink, milk)(Heinz Fischer, visits, United States)

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 4 / 24

Page 17: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Prior Work

OpenIE (UW)

TextRunner, ReVerb, Ollie, OpenIE 4.

Learn surface and/or dependency patterns for triples.

NELL (CMU)

Bootstrapping an ontology from a small number of seed examples.

Useful for matrix factorization, MLNs, QA, etc.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 5 / 24

Page 18: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Prior Work

OpenIE (UW)

TextRunner, ReVerb, Ollie, OpenIE 4.

Learn surface and/or dependency patterns for triples.

NELL (CMU)

Bootstrapping an ontology from a small number of seed examples.

Useful for matrix factorization, MLNs, QA, etc.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 5 / 24

Page 19: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Prior Work

OpenIE (UW)

TextRunner, ReVerb, Ollie, OpenIE 4.

Learn surface and/or dependency patterns for triples.

NELL (CMU)

Bootstrapping an ontology from a small number of seed examples.

Useful for matrix factorization, MLNs, QA, etc.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 5 / 24

Page 20: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Challenge: Long Sentences

Short sentences are easy:

Obama was born in Hawaii.

nmod:in

cop

nsubj

case

But most sentences are longer:

Born in a small town, she took the midnight train going anywhere.

nmod:in

amod

det

vmod

nsubj

dobj

nn

det

vmod dobj

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 6 / 24

Page 21: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Challenge: Long Sentences

Short sentences are easy:

Obama was born in Hawaii.

nmod:in

cop

nsubj

case

But most sentences are longer:

Born in a small town, she took the midnight train going anywhere.

nmod:in

amod

det

vmod

nsubj

dobj

nn

det

vmod dobj

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 6 / 24

Page 22: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Challenge: Lost Context

Sometimes annoying:

She was born in the small town of Springfield.

nmod:in

cop

nmod:in

amod

det nmod:of

Sometimes logically invalid:

All young rabbits drink milk.

nsubj dobjamod

det

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 7 / 24

Page 23: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Challenge: Lost Context

Sometimes annoying:

She was born in the small town of Springfield.

nmod:in

cop

nmod:in

amod

det nmod:of

Sometimes logically invalid:

All young rabbits drink milk.

nsubj dobjamod

det

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 7 / 24

Page 24: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Challenge: Too Much Context

Heinz Fischer of Austria visits the United States.

nsubj dobj

nnnmod:of

nn

det

(Heinz Fischer of Austria; visits; the United States)

Is this about Heinz Fischer or Austria?

Is the subject a PERSON or LOCATION?(United States president Obama; visits; China)

Downstream applications don’t want to deal with this.

Downstream applications have less context to figure this out.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 8 / 24

Page 25: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Challenge: Too Much Context

Heinz Fischer of Austria visits the United States.

nsubj dobj

nnnmod:of

nn

det

(Heinz Fischer of Austria; visits; the United States)

Is this about Heinz Fischer or Austria?

Is the subject a PERSON or LOCATION?(United States president Obama; visits; China)

Downstream applications don’t want to deal with this.

Downstream applications have less context to figure this out.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 8 / 24

Page 26: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Challenge: Too Much Context

Heinz Fischer of Austria visits the United States.

nsubj dobj

nnnmod:of

nn

det

(Heinz Fischer of Austria; visits; the United States)

Is this about Heinz Fischer or Austria?

Is the subject a PERSON or LOCATION?(United States president Obama; visits; China)

Downstream applications don’t want to deal with this.

Downstream applications have less context to figure this out.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 8 / 24

Page 27: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Approach Open IE As Entailment

Challenge: Long Sentences

Yield short, entailed clauses from sentences.

Challenge: Lost Context

Shorten these clauses only when logically valid.

Challenge: Too Much Context

Shorten these clauses as much as possible.

No Longer A Challenge

Segment these short clauses into triples.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 9 / 24

Page 28: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Approach Open IE As Entailment

Challenge: Long Sentences

Yield short, entailed clauses from sentences.

Challenge: Lost Context

Shorten these clauses only when logically valid.

Challenge: Too Much Context

Shorten these clauses as much as possible.

No Longer A Challenge

Segment these short clauses into triples.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 9 / 24

Page 29: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Approach Open IE As Entailment

Challenge: Long Sentences

Yield short, entailed clauses from sentences.

Challenge: Lost Context

Shorten these clauses only when logically valid.

Challenge: Too Much Context

Shorten these clauses as much as possible.

No Longer A Challenge

Segment these short clauses into triples.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 9 / 24

Page 30: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Approach Open IE As Entailment

Challenge: Long Sentences

Yield short, entailed clauses from sentences.

Challenge: Lost Context

Shorten these clauses only when logically valid.

Challenge: Too Much Context

Shorten these clauses as much as possible.

No Longer A Challenge

Segment these short clauses into triples.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 9 / 24

Page 31: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Yield clauses

Input: Long sentence.Born in a small town, she took the midnight train going anywhere.

Output: Short clauses.she Born in a small town.

Born in a small town, she took the midnight train going anywhere.

nmod:in

amod

det

vmod

nsubj

dobj

nn

det

vmod dobj

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 10 / 24

Page 32: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Yield clauses

Input: Long sentence.Born in a small town, she took the midnight train going anywhere.

Output: Short clauses.she Born in a small town.

Born in a small town, she took the midnight train going anywhere.

nmod:in

amod

det

vmod

nsubj

dobj

nn

det

vmod dobj

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 10 / 24

Page 33: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Yield clauses

Input: Long sentence.Born in a small town, she took the midnight train going anywhere.

Output: Short clauses.she Born in a small town.

Born in a small town, she took the midnight train going anywhere.

nmod:in

amod

det

vmod

nsubj

dobj

nn

det

vmod dobj

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 10 / 24

Page 34: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Clause Classifier

Born in a small town, she took the midnight train going anywhere.

nmod:in

amod

det

vmod

nsubj

dobj

nn

det

vmod dobj

Input: Dependency arc.Output: Action to take.

Yield (you should brush your teeth)

Yield (Subject Controller) (Obama Born in Hawaii)

Yield (Object Controller) (Fred leave the room)

Yield (Parent Subject) (Obama is our 44th president)

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 11 / 24

Page 35: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Clause Classifier

Dentists suggest that you should brush your teeth.

ccomp

nsubj

mark

nsubj

aux

dobj

nmod:poss

Input: Dependency arc.Output: Action to take.

Yield (you should brush your teeth)

Yield (Subject Controller) (Obama Born in Hawaii)

Yield (Object Controller) (Fred leave the room)

Yield (Parent Subject) (Obama is our 44th president)

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 11 / 24

Page 36: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Clause Classifier

Born in Hawaii , Obama is a US citizen.

vmod

nsubj

cop

det

amod

nmod:in

Input: Dependency arc.Output: Action to take.

Yield (you should brush your teeth)

Yield (Subject Controller) (Obama Born in Hawaii)

Yield (Object Controller) (Fred leave the room)

Yield (Parent Subject) (Obama is our 44th president)

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 11 / 24

Page 37: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Clause Classifier

I persuaded Fred to leave the room.

xcomp

dobjnsubj mark

dobj

det

Input: Dependency arc.Output: Action to take.

Yield (you should brush your teeth)

Yield (Subject Controller) (Obama Born in Hawaii)

Yield (Object Controller) (Fred leave the room)

Yield (Parent Subject) (Obama is our 44th president)

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 11 / 24

Page 38: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Clause Classifier

Obama, our 44th president.

appos

nmod:poss

amod

Input: Dependency arc.Output: Action to take.

Yield (you should brush your teeth)

Yield (Subject Controller) (Obama Born in Hawaii)

Yield (Object Controller) (Fred leave the room)

Yield (Parent Subject) (Obama is our 44th president)

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 11 / 24

Page 39: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

A Search Problem

Breadth First Search:

Born in a small town, she took the midnight train going anywhere.

nmod:in

amod

det

vmod

nsubj

dobj

nn

det

vmod dobj

Decision:

Yielded Clauses:

Born in a small town, she took the midnight train going anywhere

she Born in a small town

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 12 / 24

Page 40: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

A Search Problem

Breadth First Search:

Born in a small town, she took the midnight train going anywhere.

nmod:in

amod

det

vmod

nsubj

dobj

nn

det

vmod dobj

Decision: Edge: vmod Action: Yield (subject controller)

Yielded Clauses:

Born in a small town, she took the midnight train going anywhere

she Born in a small town

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 12 / 24

Page 41: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

A Search Problem

Breadth First Search:

Born in a small town, she took the midnight train going anywhere.

nmod:in

amod

det

vmod

nsubj

dobj

nn

det

vmod dobj

Decision: Edge: vmod Action: Yield (subject controller)

Yielded Clauses:

Born in a small town, she took the midnight train going anywhere

she Born in a small town

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 12 / 24

Page 42: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

A Search Problem

Breadth First Search:

Born in a small town, she took the midnight train going anywhere.

nmod:in

amod

det

vmod

nsubj

dobj

nn

det

vmod dobj

Decision: Edge: nsubj Action: Stop

Yielded Clauses:

Born in a small town, she took the midnight train going anywhere

she Born in a small town

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 12 / 24

Page 43: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

A Search Problem

Breadth First Search:

Born in a small town, she took the midnight train going anywhere.

nmod:in

amod

det

vmod

nsubj

dobj

nn

det

vmod dobj

Decision: Edge: dobj Action: Stop

Yielded Clauses:

Born in a small town, she took the midnight train going anywhere

she Born in a small town

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 12 / 24

Page 44: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

A Search Problem

Breadth First Search:

Born in a small town, she took the midnight train going anywhere.

nmod:in

amod

det

vmod

nsubj

dobj

nn

det

vmod dobj

Decision: Edge: nmod:in Action: Stop

Yielded Clauses:

Born in a small town, she took the midnight train going anywhere

she Born in a small town

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 12 / 24

Page 45: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Classifier Training

Training Data Generation1 Take 66880 sentences (newswire, newsgroups, Wikipedia).

2 Apply distant supervision to label relations in sentence.3 Run exhaustive search.4 Positive Labels: A sequence of actions which yields a known relation.

Negative Labels: All other sequences of actions.

Features:

Edge label; incoming edge label.

Neighbors of governor; neighbors of dependent; number of neighbors.

Existence of subject/object edges at governor; dependent.

POS tag of governor; dependent.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 13 / 24

Page 46: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Classifier Training

Training Data Generation1 Take 66880 sentences (newswire, newsgroups, Wikipedia).2 Apply distant supervision to label relations in sentence.

3 Run exhaustive search.4 Positive Labels: A sequence of actions which yields a known relation.

Negative Labels: All other sequences of actions.

Features:

Edge label; incoming edge label.

Neighbors of governor; neighbors of dependent; number of neighbors.

Existence of subject/object edges at governor; dependent.

POS tag of governor; dependent.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 13 / 24

Page 47: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Classifier Training

Training Data Generation1 Take 66880 sentences (newswire, newsgroups, Wikipedia).2 Apply distant supervision to label relations in sentence.3 Run exhaustive search.

4 Positive Labels: A sequence of actions which yields a known relation.Negative Labels: All other sequences of actions.

Features:

Edge label; incoming edge label.

Neighbors of governor; neighbors of dependent; number of neighbors.

Existence of subject/object edges at governor; dependent.

POS tag of governor; dependent.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 13 / 24

Page 48: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Classifier Training

Training Data Generation1 Take 66880 sentences (newswire, newsgroups, Wikipedia).2 Apply distant supervision to label relations in sentence.3 Run exhaustive search.4 Positive Labels: A sequence of actions which yields a known relation.

Negative Labels: All other sequences of actions.

Features:

Edge label; incoming edge label.

Neighbors of governor; neighbors of dependent; number of neighbors.

Existence of subject/object edges at governor; dependent.

POS tag of governor; dependent.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 13 / 24

Page 49: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Classifier Training

Training Data Generation1 Take 66880 sentences (newswire, newsgroups, Wikipedia).2 Apply distant supervision to label relations in sentence.3 Run exhaustive search.4 Positive Labels: A sequence of actions which yields a known relation.

Negative Labels: All other sequences of actions.

Features:

Edge label; incoming edge label.

Neighbors of governor; neighbors of dependent; number of neighbors.

Existence of subject/object edges at governor; dependent.

POS tag of governor; dependent.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 13 / 24

Page 50: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Approach Open IE As Entailment

Challenge: Long Sentences

Yield short, entailed clauses from sentences.

Challenge: Lost Context

Shorten these clauses only when logically valid.

Challenge: Too Much Context

Shorten these clauses as much as possible.

No Longer A Challenge

Segment these short clauses into triples.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 14 / 24

Page 51: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Maximally Shorten Clauses

Some strange, nuanced function:

Heinz Fischer of Austria =⇒ Heinz FischerUnited States president Obama =⇒ ObamaAll young rabbits drink milk 6=⇒ All rabbits drink milkSome young rabbits drink milk =⇒ Some rabbits drink milkEnemies give fake praise 6=⇒ Enemies give praiseFriends give true praise =⇒ Friends give praise

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 15 / 24

Page 52: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Maximally Shorten Clauses

An entailment function:

Heinz Fischer of Austria =⇒ Heinz FischerUnited States president Obama =⇒ ObamaAll young rabbits drink milk 6=⇒ All rabbits drink milkSome young rabbits drink milk =⇒ Some rabbits drink milkEnemies give fake praise 6=⇒ Enemies give praiseFriends give true praise =⇒ Friends give praise

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 15 / 24

Page 53: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Maximally Shorten Clauses

A natural logic entailment function:

Heinz Fischer of Austria =⇒ Heinz FischerUnited States president Obama =⇒ ObamaAll young rabbits drink milk 6=⇒ All rabbits drink milkSome young rabbits drink milk =⇒ Some rabbits drink milkEnemies give fake praise 6=⇒ Enemies give praiseFriends give true praise =⇒ Friends give praise

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 15 / 24

Page 54: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Natural Logic

If I mutate a sentence in this way, do I preserve its truth?

Braindead for humans, but not computers

All young rabbits drink milk 6=⇒ All rabbits drink milkSome young rabbits drink milk =⇒ Some rabbits drink milk

Hard even for first order logic

Most cats eat mice =⇒ Most cats eat rodents

All students who know a foreign language learned it at university=⇒ They learned it at school.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 16 / 24

Page 55: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Natural Logic

If I mutate a sentence in this way, do I preserve its truth?

Braindead for humans, but not computers

All young rabbits drink milk 6=⇒ All rabbits drink milkSome young rabbits drink milk =⇒ Some rabbits drink milk

Hard even for first order logic

Most cats eat mice =⇒ Most cats eat rodents

All students who know a foreign language learned it at university=⇒ They learned it at school.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 16 / 24

Page 56: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Natural Logic

If I mutate a sentence in this way, do I preserve its truth?

Braindead for humans, but not computers

All young rabbits drink milk 6=⇒ All rabbits drink milkSome young rabbits drink milk =⇒ Some rabbits drink milk

Hard even for first order logic

Most cats eat mice =⇒ Most cats eat rodents

All students who know a foreign language learned it at university=⇒ They learned it at school.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 16 / 24

Page 57: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Natural Logic and Polarity

Order phrases into a partial order.

>

animal

feline

cat

dog

Polarity is the direction a lexical item can move in the ordering.

animal

feline

cat

house cat

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 17 / 24

Page 58: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Natural Logic and Polarity

Order phrases into a partial order.

>

animal

feline

cat

dog

Polarity is the direction a lexical item can move in the ordering.

animal

feline

cat

house cat

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 17 / 24

Page 59: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Natural Logic and Polarity

Order phrases into a partial order.

>

animal

feline

cat

dog

Polarity is the direction a lexical item can move in the ordering.

animal

feline

↑ cat

house cat

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 17 / 24

Page 60: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Natural Logic and Polarity

Order phrases into a partial order.

>

animal

feline

cat

dog

Polarity is the direction a lexical item can move in the ordering.

living thing

animal

↑ feline

cat

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 17 / 24

Page 61: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Natural Logic and Polarity

Order phrases into a partial order.

>

animal

feline

cat

dog

Polarity is the direction a lexical item can move in the ordering.

thing

living thing

↑ animal

feline

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 17 / 24

Page 62: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Natural Logic and Polarity

Order phrases into a partial order.

>

animal

feline

cat

dog

Polarity is the direction a lexical item can move in the ordering.

thing

living thing

↓ animal

feline

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 17 / 24

Page 63: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Natural Logic and Polarity

Order phrases into a partial order.

>

animal

feline

cat

dog

Polarity is the direction a lexical item can move in the ordering.

living thing

animal

↓ feline

cat

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 17 / 24

Page 64: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Natural Logic and Polarity

Order phrases into a partial order.

>

animal

feline

cat

dog

Polarity is the direction a lexical item can move in the ordering.

animal

feline

↓ cat

house cat

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 17 / 24

Page 65: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Natural Logic For Clause Shortening

Quantifiers determines the polarity (↑ or ↓) of words.

Mutations must respect polarity.

Polarity determines valid deletions.

↑ Some↑↑

mammals

rabbits

↑ young rabbits

baby rabbits

consume

↑ drink

slurp

something

liquid

↑ milk

Lucerne

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 17 / 24

Page 66: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Natural Logic For Clause Shortening

Quantifiers determines the polarity (↑ or ↓) of words.

Mutations must respect polarity.

Polarity determines valid deletions.

↑ Some↑↑

animals

mammals

↑ rabbits

young rabbits

consume

↑ drink

slurp

something

liquid

↑ milk

Lucerne

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 17 / 24

Page 67: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Natural Logic For Clause Shortening

Quantifiers determines the polarity (↑ or ↓) of words.

Mutations must respect polarity.

Polarity determines valid deletions.

↑ All↓↑

mammals

rabbits

↓ young rabbits

baby rabbits

consume

↑ drink

slurp

something

liquid

↑ milk

Lucerne

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 17 / 24

Page 68: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Natural Logic For Clause Shortening

Quantifiers determines the polarity (↑ or ↓) of words.

Mutations must respect polarity.

Polarity determines valid deletions.

↑ All↓↑

animals

mammals

↓ rabbits

young rabbits

consume

↑ drink

slurp

something

liquid

↑ milk

Lucerne

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 17 / 24

Page 69: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Approach Open IE As Entailment

Challenge: Long Sentences

Yield short, entailed clauses from sentences.

Challenge: Lost Context

Shorten these clauses only when logically valid.

Challenge: Too Much Context

Shorten these clauses as much as possible.

No Longer A Challenge

Segment these short clauses into triples.

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 18 / 24

Page 70: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

No Longer A Challenge

Heinz Fischer visited US =⇒ (Heinz Fischer; visited; US)

Obama born in Hawaii =⇒ (Obama; born in; Hawaii)Cats are cute =⇒ (Cats; are; cute)Cats are sitting next to dogs =⇒ (Cats; are sitting next to; dogs)

. . .

6 dependency patterns (+ 8 nominal patterns)

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 19 / 24

Page 71: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

No Longer A Challenge

Heinz Fischer visited US =⇒ (Heinz Fischer; visited; US)Obama born in Hawaii =⇒ (Obama; born in; Hawaii)

Cats are cute =⇒ (Cats; are; cute)Cats are sitting next to dogs =⇒ (Cats; are sitting next to; dogs)

. . .

6 dependency patterns (+ 8 nominal patterns)

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 19 / 24

Page 72: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

No Longer A Challenge

Heinz Fischer visited US =⇒ (Heinz Fischer; visited; US)Obama born in Hawaii =⇒ (Obama; born in; Hawaii)Cats are cute =⇒ (Cats; are; cute)

Cats are sitting next to dogs =⇒ (Cats; are sitting next to; dogs). . .

6 dependency patterns (+ 8 nominal patterns)

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 19 / 24

Page 73: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

No Longer A Challenge

Heinz Fischer visited US =⇒ (Heinz Fischer; visited; US)Obama born in Hawaii =⇒ (Obama; born in; Hawaii)Cats are cute =⇒ (Cats; are; cute)Cats are sitting next to dogs =⇒ (Cats; are sitting next to; dogs)

. . .

6 dependency patterns (+ 8 nominal patterns)

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 19 / 24

Page 74: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

No Longer A Challenge

Heinz Fischer visited US =⇒ (Heinz Fischer; visited; US)Obama born in Hawaii =⇒ (Obama; born in; Hawaii)Cats are cute =⇒ (Cats; are; cute)Cats are sitting next to dogs =⇒ (Cats; are sitting next to; dogs)

. . .

6 dependency patterns (+ 8 nominal patterns)

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 19 / 24

Page 75: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Useful Without Triples

Simple, short sentences are themselves useful

. . . for relation extraction (Miwa et al. 2010).

. . . for textual entailment (Hickl and Bensley, 2007).

. . . for summarization (Siddharthan et al. 2004).

Two use-cases:

Triples for Logical Reasoning Text for Surface Reasoning

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 20 / 24

Page 76: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Useful Without Triples

Simple, short sentences are themselves useful

. . . for relation extraction (Miwa et al. 2010).

. . . for textual entailment (Hickl and Bensley, 2007).

. . . for summarization (Siddharthan et al. 2004).

Two use-cases:

Triples for Logical Reasoning Text for Surface Reasoning

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 20 / 24

Page 77: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Problem

How do you evaluate open domain triples?

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 21 / 24

Page 78: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Extrinsic Evaluation: Knowledge Base Population

Unstructured Text

Structured Knowledge Base

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 21 / 24

Page 79: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Extrinsic Evaluation: Knowledge Base Population

Relation Extraction Task:

Fixed schema of 41 relations.

Precision: answers marked correct by humans.

Recall: answers returned by any team (including LDC annotators).

Comparison: Open Information Extraction to KBP Relations in 3 Hours.(Soderland et. al)

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 21 / 24

Page 80: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Extrinsic Evaluation: Knowledge Base Population

Relation Extraction Task:

Fixed schema of 41 relations.

Precision: answers marked correct by humans.

Recall: answers returned by any team (including LDC annotators).

Comparison: Open Information Extraction to KBP Relations in 3 Hours.(Soderland et. al)

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 21 / 24

Page 81: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Prerequisite Task: Open IE→ KBP Relations

1 Hand-coded mapping.(Same as UW; both over 1-2 weeks)

2 Learned relation mapping.For each type signature t1, t2;For an open IE relation ro and KBP relation rk ;Compute:

p(rk , ro | t1, t2) = count(rk ,ro ,t1,t2)∑r ′k ,r

′o

count(r ′k ,r′o ,t1,t2)

.

Rank by PMI2(ro, rk | t1, t2):PMI2(rk , ro | t1, t2) = log

(p(rk ,ro |t1,t2)2

p(rk |t1,t2)·p(ro |t1,t2)

).

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 22 / 24

Page 82: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Prerequisite Task: Open IE→ KBP Relations

1 Hand-coded mapping.(Same as UW; both over 1-2 weeks)

2 Learned relation mapping.For each type signature t1, t2;For an open IE relation ro and KBP relation rk ;

Compute:

p(rk , ro | t1, t2) = count(rk ,ro ,t1,t2)∑r ′k ,r

′o

count(r ′k ,r′o ,t1,t2)

.

Rank by PMI2(ro, rk | t1, t2):PMI2(rk , ro | t1, t2) = log

(p(rk ,ro |t1,t2)2

p(rk |t1,t2)·p(ro |t1,t2)

).

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 22 / 24

Page 83: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Prerequisite Task: Open IE→ KBP Relations

1 Hand-coded mapping.(Same as UW; both over 1-2 weeks)

2 Learned relation mapping.For each type signature t1, t2;For an open IE relation ro and KBP relation rk ;Compute:

p(rk , ro | t1, t2) = count(rk ,ro ,t1,t2)∑r ′k ,r

′o

count(r ′k ,r′o ,t1,t2)

.

Rank by PMI2(ro, rk | t1, t2):PMI2(rk , ro | t1, t2) = log

(p(rk ,ro |t1,t2)2

p(rk |t1,t2)·p(ro |t1,t2)

).

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 22 / 24

Page 84: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Prerequisite Task: Open IE→ KBP Relations

1 Hand-coded mapping.(Same as UW; both over 1-2 weeks)

2 Learned relation mapping.For each type signature t1, t2;For an open IE relation ro and KBP relation rk ;Compute:

p(rk , ro | t1, t2) = count(rk ,ro ,t1,t2)∑r ′k ,r

′o

count(r ′k ,r′o ,t1,t2)

.

Rank by PMI2(ro, rk | t1, t2):PMI2(rk , ro | t1, t2) = log

(p(rk ,ro |t1,t2)2

p(rk |t1,t2)·p(ro |t1,t2)

).

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 22 / 24

Page 85: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Prerequisite Task: Open IE→ KBP Relations

KBP Relation Open IE Relation PMI2

Per:Date Of Birth be bear on 1.83bear on 1.28

Per:Date Of Death die on 0.70be assassinate on 0.65

Per:LOC Of Birth be bear in 1.21Per:LOC Of Death *elect president of 2.89Per:Religion speak about 0.67

popular for 0.60Per:Parents daughter of 0.54

son of 1.52Per:LOC Residence of 1.48

*independent from 1.18

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 22 / 24

Page 86: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Results

TAC-KBP 2013 Slot Filling Challenge:

End-to-end task – includes IR + consistency.

Precision: facts LDC evaluators judged as correct.Recall: facts other teams (including LDC annotators) also found.

System P R F1

UW Submission 69.8 11.4 19.6Ollie 57.7 11.8 19.6

Our System 61.9 13.9 22.7Median Team 18.6Our System + + 58.6 18.6 28.3

Top Team 45.7 35.8 40.2

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 23 / 24

Page 87: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Results

TAC-KBP 2013 Slot Filling Challenge:

End-to-end task – includes IR + consistency.

Precision: facts LDC evaluators judged as correct.Recall: facts other teams (including LDC annotators) also found.

System P R F1

UW Submission 69.8 11.4 19.6Ollie 57.7 11.8 19.6Our System 61.9 13.9 22.7

Median Team 18.6Our System + + 58.6 18.6 28.3

Top Team 45.7 35.8 40.2

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 23 / 24

Page 88: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Results

TAC-KBP 2013 Slot Filling Challenge:

End-to-end task – includes IR + consistency.

Precision: facts LDC evaluators judged as correct.Recall: facts other teams (including LDC annotators) also found.

System P R F1

UW Submission 69.8 11.4 19.6Ollie 57.7 11.8 19.6Our System 61.9 13.9 22.7Median Team 18.6Our System + + 58.6 18.6 28.3

Top Team 45.7 35.8 40.2

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 23 / 24

Page 89: Leveraging Linguistic Structure for Open Domain …angeli/talks/2015-acl-openie.pdfLeveraging Linguistic Structure for Open Domain Information Extraction Gabor Angeli, Melvin Johnson

Takeaways

Open IE is a sentence simplification task

Sentence simplification is an entailment task

Put burden on Open IE, not downstream tasks

Released in Stanford CoreNLPhttp://nlp.stanford.edu/software/openie.shtml

annotators = tokenize,ssplit,pos,lemma,parse,natlog,openieCollection<RelationTriple> triples

= sentence.get(RelationTriplesAnnotation.class)

Angeli, Premkumar, Manning (Stanford) Linguistics for Open IE July 27, 2015 24 / 24