Named Entity Recognition from a Data-Driven...

Named Entity Recognition from a Data-Driven Perspective

JingboShangComputerScienceEngineering&HalıcıoğluDataScienceInstitute

UniversityofCalifornia,SanDiego

Outline

q Background

q What’sNamedEntityRecognition(NER)?

q What’s“Data-Driven”?

q Data-DrivenNERMethods

What’s Named Entity Recognition?

q Wikipedia:q Named-entityrecognition (NER) isasubtaskof informationextraction

(IE) thatseekstolocate andclassify namedentities intextintopre-definedcategories.

q InIE,A namedentity isareal-worldobject.q Exampleq Inputq Jimbought300sharesofAcmeCorp.in2006.

q Outputq [Jim]Person bought300sharesof[AcmeCorp.]Organization in[2006]Time.

Supervised Methods: Training Data

q Sequencelabelingframeworkq Twopopularschemesq BIO:Begin,In,Outq BIOES:Begin,In,Out,End,Singletonq BIOESisarguablybetterthanBIO(Ratinov andRoth,ACL09)

q Example:q LABELS: [Jim]Person bought 300 shares of [Acme Corp.]Organization in [2006]Time.q TOKNES: Jim bought 300 shares of Acme Corp. in 2006 .q BIO: B-PER O O O O B-ORG I-ORG O B-Time Oq BIOES: S-PER O O O O B-ORG E-ORG O S-Time O

Supervised Methods: Neural Models

q Twopioneer modelsq LSTM-CRF(Lample etal.,NAACL’16)q LSTM-CNN-CRF(MaandHovy,ACL’16)

q Thefirstneuralmodelthatoutperformsthemodelsbasedonhandcraftedfeatures

LSTM-CRF LSTM-CNN-CRF

Word-Level Bidirectional LSTMs Bidirection LSTMs

Character-Level Bidirectional LSTMs ConvolutionalNN

“Data-Driven” Philosophy

q Keyq EnhanceNERperformancewithoutintroducinganyadditionalhuman

annotations

q Questionsq Canmassiverawtextshelp?q Candictionarieshelp?q Arehumanannotationsalwayscorrect?q IsTokenizer alwaysgood?q …

Questions

q Canmassiverawtextshelp?

q Candictionaries help?

q Arehumanannotationsalwayscorrect?

q IsTokenizer alwaysgood?

Word Embedding à Language Model (LM)

q Using Language Model for better representations:q Word-level Language Model:q ELMo (Petersetal.,NAACL’18, bestpaper)q LD-Net (Liu et al., EMNLP’18)

q Char-level Language Model:q LM-LSTM-CRF (Liu et al., AAAI’ 18)q Flair(Akbik etal.,COLING’18)

q Hybrid LanguageModel:q Cross View Training (Clark etal.,EMNLP’ 2018)q BERT (Devlin et al., NAACL’19, best paper)

What’s (Neural) Language Model?

Input words: Obama was born

Target words: was born in

0.3-0.40.5

0.70.30.2

0.20.1-0.4

0.50.41.52.2

1.0-3.32.21.5

0.11.21.12.4

word embedding

q Describing the generation of text:q predicting the next word based on

previous contexts

q Pros:q Does not require any human annotationsq Nearlyunlimitedtrainingdata!

q Resulting models can generate sentencesof an unexpectedlyhighquality

Neural LM: Example Generations

q Char-by-CharMarkdownGenerations:

'''Seealso''':[[Listofethicalconsentprocessing]]

==Seealso==*[[Iender domeoftheED]]*[[Anti-autism]]

===[[Religion|Religion]]===*[[FrenchWritings]]*[[Maria]]*[[Revelation]]

ValidSyntax!

Neural LM: Example Generations

q Deep“DonaldTrump”:MimicPresidentTrump

Fooledmanytwitterusers

LM-LSTM-CRF: Co-Train Neural LM

q Propose to use Character-level language model as a Co-Training objectiveq Whycharacter-level?q Moreefficient& More robust to pre-processing

ELMo: Pre-train Word-Level Neural LM

q Add ELMo at the input of RNN. For some tasks (SNLI, SQuAD), includingELMo at the output brings further improvements

q Keypoints:q Freeze the weight of the biLMq Regularization are necessary

␣ V i n k e n ␣

embedding

down-projection

c1, c2,0 c2,1 c2,2 c2,3 c2,4 c2,5 c2,

Vinken

Pierre ,

Pierre ,Vinken

E-PERy2

……

… …

CRF for Sequence LabelingBackward PTLM

Forward PTLM

fixed lstm

[ ] concatenate

LD-Net: An efficient version of ELMo

q Make the contextualizedrepresentefficient withoutmuch loss of effectiveness

q How is this even possible?q Pre-trained language

model containsabundant information,however, for a specifictask, only part of it couldbe useful.

Flair: Pre-Train Neural LM at All Levels

q Even for character-level language model, pre-training is very important.q The structure is the same with LM-LSTM-CRF, the difference is the pre-

training conducted on additional training corpus.

BERT: Introduce Transformer

q Introduce Transformers, use masked language model + next sentenceprediction

q Conduct fine-tuning after pre-training on each task (necessary forsentence-level tasks, NER is a word level task).

New State-of-the-artsq Using Language Model for better representations:q Word-level Language Model:q ELMo(Petersetal.,NAACL’18, bestpaper)q LD-Net (Liu et al., EMNLP’18)

q Char-level Language Model:q LM-LSTM-CRF (Liu et al., AAAI’ 18)q Flair(Akbiketal.,COLING’18)

q HybridLanguageModel:q Cross View Training (Clark etal.,EMNLP’ 2018)q BERT (Devlin et al., NAACL’19, best paper)

92.292.0, ~5X faster

91.493.1

92.692.4 / 92.8

F1 on CoNLL03

Questions

q Canmassiverawtextshelp?à Neurallanguagemodel

q Candictionaries help?

Distantly Supervised NER

q Inputq UnlabeledRawTextsq AnEntityDictionaryq entitytype,canonicalname,[synonyms_1,synonyms_2,…,synonyms_k]

q Outputq ANERmodeltorecognizetheentitiesoftheentitytypesappearedinthe

givendictionary.qNotethattheentitiestoberecognizedcanbeunseenentities.

Distantly Supervised NER Methods

q String-match/rule-baseddistantsupervisiongeneration

q AutoEntity,SwellShark,ClusType,…q Leavetheentityspandetectiontoexpertsq POSTagRule-based(e.g.,regularexpressions)

q Distant-LSTM-CRFq LeverageAutoPhrase toextract“aspectterms”

q AutoNERq Anovel“Tie-or-Break”labelingscheme+tailoredneuralmodel

SwellShark: Distantly Supervised Typing

q DataProgrammingforTyping

q EntitySpanDetection:Regularexpressionsbasedonpart-of-speech(POS)tags

q Requiresexperteffortsq CandidateGenerators

Distant-LSTM-CRF: Use Phrase Mining as Supervision + LSTM-CRF

q AutoPhrase+LSTM-CRFq AutoPhrasegenerateslabelsq Heuristicallysetthresholds

q LSTM-CRFbuildsmodelsq Bothword&charinfoareused

q Problemq Highthresholdsneededfor

cleanpositivelabelsèmanyfalse-negativelabels

AutoNER: Dual Dictionaries

q Acoredictionaryq Leadstohigh-precisionbutlow-recallmatches

q A“full”dictionaryq Leadstohigh-recallbutlow-precisionmatchesq Introduceout-of-dictionaryhigh-qualityphrasesasnewentitiesq Theirtypesare“unknown”q ItcouldbeanyIOBES+anytype

AutoNER: Fuzzy-LSTM-CRF Baseline

AutoNER: “Tie or Break”

q Insteadoflabelingeachtoken,wechoosetotagtheconnectionbetweentwoadjacenttokens.

q Foreverytwoadjacenttokens,theconnectionbetweenthemislabeledasq (1)Tie,whenthetwotokensarematchedtothesameentityq (2)Unknown,ifatleastoneofthetokensbelongstoanunknown-typed

high-qualityphrase;q (3)Break,otherwise.

AutoNER: Tailored Neural Model

Comparison – Biomedical Domain

Questions

q Candictionaries help? à Distantsupervisedsetting

Typical Annotation Mistakes in CoNLL03

q State-of-the-artF1scoreonthistestsetisalreadyaround93%q ~5.38% test sentenceshaveannotationmistakesq Significantamount!

Evaluation on Corrected Test Set

q HigherF1scorewithsmallervarianceq Betterreflectstherealperformanceq Thiscorrectedtestsetshouldbeadoptedinfutureresearch

CrossWeigh: Handle Noisy Training Set

Key Problem: How to Partition k-Folds?

q RandomPartitionmaybeineffectiveq NeuralNERmodelswilloverfit theannotationmistakesobserved

duringtraining

q EntityDisjointFilteringq Ineachfold,ifa“training”sentencecontainsanyentitiesappearedin

the“testing”set,itwillbediscardedduringthe“training”

CrossWeigh: Evaluation

q CrossWeigh iseffectivewithmanyNERmodels

q EntityDisjointFilteringisimportant q Twitter&Low-resource

CrossWeigh: Identify Annotation Mistakes

q CoNLL03 train,dev &testasasupertrainingsetq Apply CrossWeigh toidentifyannotationmistakes onthetestsetq Evaluateagainst186humancorrectionsq Almost80%ofmistakescanbedetected

Questions

q Candictionaries help? à Distantsupervisedsetting

q Arehumanannotationsalwayscorrect?à Auto-Correction

Typical NER Pipeline System

q Pre-processingtoolsareappliedfirst

An Interesting Observation

q BroadTwitterCorpus(BTC)q AtwitterNERdataset

q spaCyq ApopularPython NLPlib

q spaCy tokenization+BTCdatasetq àWordboundariesofmorethan45% namedentitieswillbe

incorrectlyidentified!

Neural-Char-CRF: Raw-to-End

q WeproposetoconductNERtraininginaraw-to-endmannerq Rawtextastheinput&Predictionsatthecharacterlevel

Neural-Char-CRF: String Match

q PrefertomatchthewordswithhigherInverseDocumentFrequency(IDF)

Neural-Char-CRF: Character-Level LM

q Character-level neurallanguagemodelisleveragedq Pre-training +Contextualizedrepresentations

Comparison – Twitter NER Datasets

q Tokenizer mattersq NLTKisthebestonboth

datasetsq Raw-to-Endwinsq StringMatch isevenbetter

Summary & Q&A

q Usingneurallanguagemodel,massiverawtextscanhelp!

q High-qualitydictionariescanhelp!

q HumanannotationsareNOTalwayscorrect!

q Tokenizer isnotthatimportantandsometimesevenhurts!

Named Entity Recognition from a Data-Driven...

Documents

Transcript of Named Entity Recognition from a Data-Driven...

Named Entity Recognition - ACL 2011 Presentation

Analysis of named entity recognition and linking for tweetseprints.whiterose.ac.uk/92764/8/Analysis of named entity.pdf · 2. Named Entity Recognition Named entity recognition (NER)

Named Entity Recognition - Amazon Simple Storage Service · Named Entity Recognition NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan. ... Using

Bessere Suchergebnisse durch Named Entity Recognition

Information Extraction and Named Entity Recognition

MULTITHREAD IN NAMED ENTITY RECOGNITION

Adaptive Co-attention Network for Named Entity Recognition ... · Named entity recognition (NER) tries to identify the named entities in text. Named entities fall into pre-deﬁned

Named-Entity Recognition with Character-Level Models

Semi-Supervised Named Entity Recognition:

Named Entity Recognition in Greek Web Pages

Named Entity Recognition for Digitised Historical Texts

OpenHPI 6.6 - Named Entity Recognition

LeNER-Br: A Dataset for Named Entity Recognition in ...teodecampos/LeNER-Br/luz_etal... · Named Entity Recognition Named Entity Recognition (NER) is a sub-task of Information Extraction

Crowdsourcing Named Entity Recognition and Entity Linking ... · Crowdsourcing Named Entity Recognition and Entity Linking Corpora 3 Fig. 1 The methodological framework for crowdsourcing

Named Entity Recognition Using BERT and ELMo · Group 8 : Mikaela Guerrero Vikash Kumar Nitya Sampath Saumya Shah. Introduction to Named Entity Recognition Named entity recognition

Named Entity Recognition System for Sindhi Language

Dictionary-based named entity recognition

Named Entity Recognition gate.ac.uk/ nlp.shef.ac.uk/ Hamish Cunningham

POLYGLOT-NER: Massive Multilingual Named Entity Recognition

Information Extraction and Named Entity Recognition-2