Inductive Entity Typing Alignment
-
Upload
giuseppe-rizzo -
Category
Education
-
view
157 -
download
0
description
Transcript of Inductive Entity Typing Alignment
Inductive Entity Typing AlignmentInductive Entity Typing Alignment
Giuseppe Rizzo, Marieke van Erp, Raphaël Troncy
@merpeltje @rtroncy@giusepperizzo
Oct 20, 2014 2/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
Research Question
Can we inductively learn the mappings from the classification output of a NER system to the target schema of a given NER benchmark?
Oct 20, 2014 3/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
Inductive Alignment
e1 conll2003:MISC dbo:TopicalConcept
e2
conll2003:MISC dbo:TopicalConcept
... e
n conll2003:MISC dbo:TopicalConcept
∀ ei : dbo:TopicalConcept → conll2003:MISC
?
entity GS (expected_type) extractor_type
Oct 20, 2014 4/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
Why?
to reduce the need for manual mapping
to offer an opportunity to "fix" mistakes made by the extractors during the schema mapping step
to compare results of different extractors
Oct 20, 2014 5/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
What AlchemiAPI is classifying as City in CoNLL2003?
Target schema: CoNLL2003, train set
?
TGS
fa
LOC 2155
ORG 1038
MISC 33
PER 78
OTHER 29
In >50%, AlchemyAPI is typing as City what is not a LOC
Oct 20, 2014 6/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
Is Younes El Aynaoui a City?
Martin PER PersonDamm PER Person( O OCzech LOC CountryRepublic LOC Country) O Obeat O O6 O O- O OYounes PER CityEl PER CityAynaoui PER City
Token GS AlchemyAPI
?
Oct 20, 2014 7/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
A : TS → T
GS
OS : the schema used by the source extractor (S)
(E,T)S : ordered list of pairs of <entities,types> given by S
(E,T)GS
: ordered list of pairs entities and types listed in the Gold Standard (GS)
Oct 20, 2014 8/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
Experimental Setup: DatasetsCoNLL2003 – English newswire corpus
MSM2013 microposts corpus
Basave, A.E.C., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.S. (2013), Making Sense of Microposts (#MSM2013) Concept Extraction Challenge. In: #MSM2013
Tjong Kim Sang, E.F., Meulder, F.D. (2003), Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: CoNLL’03
Taxonomy: http://www.cnts.ua.ac.be/conll2003/ner/annotation.txt
Taxonomy: http://oak.dcs.shef.ac.uk/msm2013/challenge.html
Oct 20, 2014 9/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
Experimental Setup: ExtractorsService: NER; settings: default; naming: alchemyapi
Service: Spotter; settings: confidence=0, support=0, spotter=CoOccurrenceBasedSelector,Version=0.6; naming: dbspotlight
Service: Lupedia; settings: default; naming: lupedia
Service: dataTXT NER; settings: min_confidence=0.6; naming: datatxt
Service: /services/rest/0.0/; Settings: markup_limit=10; settings: default; naming: zemanta
Service: English Semantic Metadata; settings: default; naming: opencalais
Service: NER; settings: default; naming: textrazor
Oct 20, 2014 10/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
Types overlap in Train/Test
The overlap is computed at exact match of the type granularity level returned by the extractors. Figures are in percentage.
Overlap in types between the training and test data for each extractor
Oct 20, 2014 11/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
Inductive Matcher
Statistical Induction– Most frequent type → it always considers
alchemyapi:City as LOC
Machine Learning Induction (k-NN, NB)– Predict the best alignment based on the set of samples
learned from the training → it maps alchemyapi:City to a type t according to the prior knowledge and the used ML method
Oct 20, 2014 12/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
Statistical Induction
for e in extractors; do # step_1 NER on the training set ; for in ; do compute the most frequent type given by the extractor ; done; # step_2 NER on the test set ; for in ; do type with the learned in the step_1 done; done;
Oct 20, 2014 13/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
Statistical Induction
F1 of correct mappings over the test sets. Figures are in percentage.
Oct 20, 2014 14/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
Machine Learning Induction
tokenextractor
typeGS type(class)
training vector
Person Martin PERPerson Damm PERO ( OCountry Czech LOCCountry Republic LOCO ) OO beat OO 6 OO - OCity Younes PERCity El PERCity Aynaoui PER
Excerpt of the training vectors using AlchemyAPI over the CoNLL2003 training set
Feature Vector
AlchemyAPI Token GS
Oct 20, 2014 15/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
Traversing the classification schemas
Extractor Type (aka URIType)– simply consider the type given by the extractor
Extractor Type → NERDType– map the extractor type to the corresponding class of the extended
NERD Ontology v0.5 http://nerd.eurecom.fr/ontology Extractor Type → URIType First
– map the extractor type to its super class Extractor Type → URIType Second
– map the extractor type to its super-super class Extractor Type → URIType Third
– map the extractor type to its super-super-super class
Oct 20, 2014 16/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
Machine Learning Induction results
Full list of plots are reported in: Rizzo G., Erp M., Troncy R. (2014), Inductive Entity Typing Alignment. In (ISWC'14) 2nd International Workshop on Linked Data for Information Extraction (LD4IE'14), Riva del Garda, Italy
CoNLL2003
MSM2013
statistical induction
CoNLL2003
Oct 20, 2014 17/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
Evaluation
N ERD-MLRizzo G., Erp M., Troncy R. (2014), Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web. In LREC'14
Inputs: induced mappings. They replace the extractor types
extractor1type
CRFlinguisticvector ... extractorN
typeGStype
training vector
token1
predicted_type
token2 predicted_type
...
Oct 20, 2014 18/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
CoNLL2003 MSM2013
Results
Oct 20, 2014 19/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
Is the Golden Standard Gold ?
Gothenburg ORG CityGothenburg LOC CityGothenburg ORG City...Amsterdam ORG CityAMSTERDAM LOC CityAMSTERDAM LOC City
Aroyo L., Truth is a Lie: 7 Myths about Human Annotation, Cognitive Computing Forum 2014
Token GS AlchemyAPI
CoNLL2003
Oct 20, 2014 20/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
Future Work
Refining the alignment strategy giving priorities to the extractors that perform better
Applying an ensemble learning strategy over different models (for each extractor)
Fixing the GS with the induced mappings
Oct 20, 2014 21/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
Acknowledgments
The research leading to this paper was partially supportedby the European Union’s 7th Framework Programme via the projects LinkedTV (GA 287911) and NewsReader (ICT-316404)
Oct 20, 2014 22/222nd International Workshop on Linked Data for Information Extraction (LD4IE'14)
Thank You For Listening
http://www.slideshare.net/giusepperizzo
http://github.com/giusepperizzo/nerd-inductive