Inductive Entity Typing Alignment

22
Inductive Entity Typing Alignment Inductive Entity Typing Alignment Giuseppe Rizzo, Marieke van Erp, Raphaël Troncy @merpeltje @rtroncy @giusepperizzo

description

Talk "Inductive Entity Typing Alignment" at LD4IE2014 (ISWC'14), Riva del Garda, Italy

Transcript of Inductive Entity Typing Alignment

Page 1: Inductive Entity Typing Alignment

Inductive Entity Typing AlignmentInductive Entity Typing Alignment

Giuseppe Rizzo, Marieke van Erp, Raphaël Troncy

@merpeltje @rtroncy@giusepperizzo

Page 2: Inductive Entity Typing Alignment

Oct 20, 2014 2/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

Research Question

Can we inductively learn the mappings from the classification output of a NER system to the target schema of a given NER benchmark?

Page 3: Inductive Entity Typing Alignment

Oct 20, 2014 3/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

Inductive Alignment

e1 conll2003:MISC dbo:TopicalConcept

e2

conll2003:MISC dbo:TopicalConcept

... e

n conll2003:MISC dbo:TopicalConcept

∀ ei : dbo:TopicalConcept → conll2003:MISC

?

entity GS (expected_type) extractor_type

Page 4: Inductive Entity Typing Alignment

Oct 20, 2014 4/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

Why?

to reduce the need for manual mapping

to offer an opportunity to "fix" mistakes made by the extractors during the schema mapping step

to compare results of different extractors

Page 5: Inductive Entity Typing Alignment

Oct 20, 2014 5/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

What AlchemiAPI is classifying as City in CoNLL2003?

Target schema: CoNLL2003, train set

?

TGS

fa

LOC 2155

ORG 1038

MISC 33

PER 78

OTHER 29

In >50%, AlchemyAPI is typing as City what is not a LOC

Page 6: Inductive Entity Typing Alignment

Oct 20, 2014 6/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

Is Younes El Aynaoui a City?

Martin PER PersonDamm PER Person( O OCzech LOC CountryRepublic LOC Country) O Obeat O O6 O O- O OYounes PER CityEl PER CityAynaoui PER City

Token GS AlchemyAPI

?

Page 7: Inductive Entity Typing Alignment

Oct 20, 2014 7/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

A : TS → T

GS

OS : the schema used by the source extractor (S)

(E,T)S : ordered list of pairs of <entities,types> given by S

(E,T)GS

: ordered list of pairs entities and types listed in the Gold Standard (GS)

Page 8: Inductive Entity Typing Alignment

Oct 20, 2014 8/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

Experimental Setup: DatasetsCoNLL2003 – English newswire corpus

MSM2013 microposts corpus

Basave, A.E.C., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.S. (2013), Making Sense of Microposts (#MSM2013) Concept Extraction Challenge. In: #MSM2013

Tjong Kim Sang, E.F., Meulder, F.D. (2003), Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: CoNLL’03

Taxonomy: http://www.cnts.ua.ac.be/conll2003/ner/annotation.txt

Taxonomy: http://oak.dcs.shef.ac.uk/msm2013/challenge.html

Page 9: Inductive Entity Typing Alignment

Oct 20, 2014 9/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

Experimental Setup: ExtractorsService: NER; settings: default; naming: alchemyapi

Service: Spotter; settings: confidence=0, support=0, spotter=CoOccurrenceBasedSelector,Version=0.6; naming: dbspotlight

Service: Lupedia; settings: default; naming: lupedia

Service: dataTXT NER; settings: min_confidence=0.6; naming: datatxt

Service: /services/rest/0.0/; Settings: markup_limit=10; settings: default; naming: zemanta

Service: English Semantic Metadata; settings: default; naming: opencalais

Service: NER; settings: default; naming: textrazor

Page 10: Inductive Entity Typing Alignment

Oct 20, 2014 10/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

Types overlap in Train/Test

The overlap is computed at exact match of the type granularity level returned by the extractors. Figures are in percentage.

Overlap in types between the training and test data for each extractor

Page 11: Inductive Entity Typing Alignment

Oct 20, 2014 11/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

Inductive Matcher

Statistical Induction– Most frequent type → it always considers

alchemyapi:City as LOC

Machine Learning Induction (k-NN, NB)– Predict the best alignment based on the set of samples

learned from the training → it maps alchemyapi:City to a type t according to the prior knowledge and the used ML method

Page 12: Inductive Entity Typing Alignment

Oct 20, 2014 12/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

Statistical Induction

for e in extractors; do # step_1 NER on the training set ; for in ; do compute the most frequent type given by the extractor ; done; # step_2 NER on the test set ; for in ; do type with the learned in the step_1 done; done;

Page 13: Inductive Entity Typing Alignment

Oct 20, 2014 13/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

Statistical Induction

F1 of correct mappings over the test sets. Figures are in percentage.

Page 14: Inductive Entity Typing Alignment

Oct 20, 2014 14/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

Machine Learning Induction

tokenextractor

typeGS type(class)

training vector

Person Martin PERPerson Damm PERO ( OCountry Czech LOCCountry Republic LOCO ) OO beat OO 6 OO - OCity Younes PERCity El PERCity Aynaoui PER

Excerpt of the training vectors using AlchemyAPI over the CoNLL2003 training set

Feature Vector

AlchemyAPI Token GS

Page 15: Inductive Entity Typing Alignment

Oct 20, 2014 15/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

Traversing the classification schemas

Extractor Type (aka URIType)– simply consider the type given by the extractor

Extractor Type → NERDType– map the extractor type to the corresponding class of the extended

NERD Ontology v0.5 http://nerd.eurecom.fr/ontology Extractor Type → URIType First

– map the extractor type to its super class Extractor Type → URIType Second

– map the extractor type to its super-super class Extractor Type → URIType Third

– map the extractor type to its super-super-super class

Page 16: Inductive Entity Typing Alignment

Oct 20, 2014 16/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

Machine Learning Induction results

Full list of plots are reported in: Rizzo G., Erp M., Troncy R. (2014), Inductive Entity Typing Alignment. In (ISWC'14) 2nd International Workshop on Linked Data for Information Extraction (LD4IE'14), Riva del Garda, Italy

CoNLL2003

MSM2013

statistical induction

CoNLL2003

Page 17: Inductive Entity Typing Alignment

Oct 20, 2014 17/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

Evaluation

N ERD-MLRizzo G., Erp M., Troncy R. (2014), Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web. In LREC'14

Inputs: induced mappings. They replace the extractor types

extractor1type

CRFlinguisticvector ... extractorN

typeGStype

training vector

token1

predicted_type

token2 predicted_type

...

Page 18: Inductive Entity Typing Alignment

Oct 20, 2014 18/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

CoNLL2003 MSM2013

Results

Page 19: Inductive Entity Typing Alignment

Oct 20, 2014 19/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

Is the Golden Standard Gold ?

Gothenburg ORG CityGothenburg LOC CityGothenburg ORG City...Amsterdam ORG CityAMSTERDAM LOC CityAMSTERDAM LOC City

Aroyo L., Truth is a Lie: 7 Myths about Human Annotation, Cognitive Computing Forum 2014

Token GS AlchemyAPI

CoNLL2003

Page 20: Inductive Entity Typing Alignment

Oct 20, 2014 20/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

Future Work

Refining the alignment strategy giving priorities to the extractors that perform better

Applying an ensemble learning strategy over different models (for each extractor)

Fixing the GS with the induced mappings

Page 21: Inductive Entity Typing Alignment

Oct 20, 2014 21/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

Acknowledgments

The research leading to this paper was partially supportedby the European Union’s 7th Framework Programme via the projects LinkedTV (GA 287911) and NewsReader (ICT-316404)

Page 22: Inductive Entity Typing Alignment

Oct 20, 2014 22/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

Thank You For Listening

http://www.slideshare.net/giusepperizzo

http://github.com/giusepperizzo/nerd-inductive