Inductive Entity Typing Alignment

Inductive Entity Typing AlignmentInductive Entity Typing Alignment

Giuseppe Rizzo, Marieke van Erp, Raphaël Troncy

@merpeltje @rtroncy@giusepperizzo

Oct 20, 2014 2/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)

Research Question

Can we inductively learn the mappings from the classification output of a NER system to the target schema of a given NER benchmark?


Inductive Alignment

e1 conll2003:MISC dbo:TopicalConcept

e2

conll2003:MISC dbo:TopicalConcept

... e

n conll2003:MISC dbo:TopicalConcept

∀ ei : dbo:TopicalConcept → conll2003:MISC

?

entity GS (expected_type) extractor_type


Why?

to reduce the need for manual mapping

to offer an opportunity to "fix" mistakes made by the extractors during the schema mapping step

to compare results of different extractors


What AlchemiAPI is classifying as City in CoNLL2003?

Target schema: CoNLL2003, train set

?

TGS

fa

LOC 2155

ORG 1038

MISC 33

PER 78

OTHER 29

In >50%, AlchemyAPI is typing as City what is not a LOC


Is Younes El Aynaoui a City?

Martin PER PersonDamm PER Person( O OCzech LOC CountryRepublic LOC Country) O Obeat O O6 O O- O OYounes PER CityEl PER CityAynaoui PER City

Token GS AlchemyAPI

?


A : TS → T

GS

OS : the schema used by the source extractor (S)

(E,T)S : ordered list of pairs of <entities,types> given by S

(E,T)GS

: ordered list of pairs entities and types listed in the Gold Standard (GS)


Experimental Setup: DatasetsCoNLL2003 – English newswire corpus

MSM2013 microposts corpus

Basave, A.E.C., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.S. (2013), Making Sense of Microposts (#MSM2013) Concept Extraction Challenge. In: #MSM2013

Tjong Kim Sang, E.F., Meulder, F.D. (2003), Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: CoNLL’03

Taxonomy: http://www.cnts.ua.ac.be/conll2003/ner/annotation.txt

Taxonomy: http://oak.dcs.shef.ac.uk/msm2013/challenge.html

http://www.cnts.ua.ac.be/conll2003/ner/annotation.txt

http://oak.dcs.shef.ac.uk/msm2013/challenge.html


Experimental Setup: ExtractorsService: NER; settings: default; naming: alchemyapi

Service: Spotter; settings: confidence=0, support=0, spotter=CoOccurrenceBasedSelector,Version=0.6; naming: dbspotlight

Service: Lupedia; settings: default; naming: lupedia

Service: dataTXT NER; settings: min_confidence=0.6; naming: datatxt

Service: /services/rest/0.0/; Settings: markup_limit=10; settings: default; naming: zemanta

Service: English Semantic Metadata; settings: default; naming: opencalais

Service: NER; settings: default; naming: textrazor


Types overlap in Train/Test

The overlap is computed at exact match of the type granularity level returned by the extractors. Figures are in percentage.

Overlap in types between the training and test data for each extractor


Inductive Matcher

Statistical Induction– Most frequent type → it always considers

alchemyapi:City as LOC

Machine Learning Induction (k-NN, NB)– Predict the best alignment based on the set of samples

learned from the training → it maps alchemyapi:City to a type t according to the prior knowledge and the used ML method


Statistical Induction

for e in extractors; do # step_1 NER on the training set ; for in ; do compute the most frequent type given by the extractor ; done; # step_2 NER on the test set ; for in ; do type with the learned in the step_1 done; done;


Statistical Induction

F1 of correct mappings over the test sets. Figures are in percentage.


Machine Learning Induction

tokenextractor

typeGS type(class)

training vector

Person Martin PERPerson Damm PERO ( OCountry Czech LOCCountry Republic LOCO ) OO beat OO 6 OO - OCity Younes PERCity El PERCity Aynaoui PER

Excerpt of the training vectors using AlchemyAPI over the CoNLL2003 training set

Feature Vector

AlchemyAPI Token GS


Traversing the classification schemas

Extractor Type (aka URIType)– simply consider the type given by the extractor

Extractor Type → NERDType– map the extractor type to the corresponding class of the extended

NERD Ontology v0.5 http://nerd.eurecom.fr/ontology Extractor Type → URIType First

– map the extractor type to its super class Extractor Type → URIType Second

– map the extractor type to its super-super class Extractor Type → URIType Third

– map the extractor type to its super-super-super class

http://nerd.eurecom.fr/ontology


Machine Learning Induction results

Full list of plots are reported in: Rizzo G., Erp M., Troncy R. (2014), Inductive Entity Typing Alignment. In (ISWC'14) 2nd International Workshop on Linked Data for Information Extraction (LD4IE'14), Riva del Garda, Italy

CoNLL2003

MSM2013

statistical induction

CoNLL2003


Evaluation

N ERD-MLRizzo G., Erp M., Troncy R. (2014), Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web. In LREC'14

Inputs: induced mappings. They replace the extractor types

extractor1type

CRFlinguisticvector ... extractorN

typeGStype

training vector

token1

predicted_type

token2 predicted_type

...


CoNLL2003 MSM2013

Results


Is the Golden Standard Gold ?

Gothenburg ORG CityGothenburg LOC CityGothenburg ORG City...Amsterdam ORG CityAMSTERDAM LOC CityAMSTERDAM LOC City

Aroyo L., Truth is a Lie: 7 Myths about Human Annotation, Cognitive Computing Forum 2014

Token GS AlchemyAPI

CoNLL2003


Future Work

Refining the alignment strategy giving priorities to the extractors that perform better

Applying an ensemble learning strategy over different models (for each extractor)

Fixing the GS with the induced mappings


Acknowledgments

The research leading to this paper was partially supportedby the European Union’s 7th Framework Programme via the projects LinkedTV (GA 287911) and NewsReader (ICT-316404)


Thank You For Listening

http://www.slideshare.net/giusepperizzo

http://github.com/giusepperizzo/nerd-inductive

http://www.slideshare.net/giusepperizzo

http://github.com/giusepperizzo/nerd-inductive

Inductive Entity Typing Alignment

Education

Transcript of Inductive Entity Typing Alignment