Named Entity Recognition from a Data-Driven...

42
Named Entity Recognition from a Data-Driven Perspective Jingbo Shang Computer Science Engineering & Halıcıoğlu Data Science Institute University of California, San Diego 1

Transcript of Named Entity Recognition from a Data-Driven...

Page 1: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Named Entity Recognition from a Data-Driven Perspective

JingboShangComputerScienceEngineering&HalıcıoğluDataScienceInstitute

UniversityofCalifornia,SanDiego

1

Page 2: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Outline

2

q Background

q What’sNamedEntityRecognition(NER)?

q What’s“Data-Driven”?

q Data-DrivenNERMethods

Page 3: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

What’s Named Entity Recognition?

3

q Wikipedia:q Named-entityrecognition (NER) isasubtaskof informationextraction

(IE) thatseekstolocate andclassify namedentities intextintopre-definedcategories.

q InIE,A namedentity isareal-worldobject.q Exampleq Inputq Jimbought300sharesofAcmeCorp.in2006.

q Outputq [Jim]Person bought300sharesof[AcmeCorp.]Organization in[2006]Time.

Page 4: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Supervised Methods: Training Data

4

q Sequencelabelingframeworkq Twopopularschemesq BIO:Begin,In,Outq BIOES:Begin,In,Out,End,Singletonq BIOESisarguablybetterthanBIO(Ratinov andRoth,ACL09)

q Example:q LABELS: [Jim]Person bought 300 shares of [Acme Corp.]Organization in [2006]Time.q TOKNES: Jim bought 300 shares of Acme Corp. in 2006 .q BIO: B-PER O O O O B-ORG I-ORG O B-Time Oq BIOES: S-PER O O O O B-ORG E-ORG O S-Time O

Page 5: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Supervised Methods: Neural Models

5

q Twopioneer modelsq LSTM-CRF(Lample etal.,NAACL’16)q LSTM-CNN-CRF(MaandHovy,ACL’16)

q Thefirstneuralmodelthatoutperformsthemodelsbasedonhandcraftedfeatures

LSTM-CRF LSTM-CNN-CRF

Word-Level Bidirectional LSTMs Bidirection LSTMs

Character-Level Bidirectional LSTMs ConvolutionalNN

Page 6: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

“Data-Driven” Philosophy

6

q Keyq EnhanceNERperformancewithoutintroducinganyadditionalhuman

annotations

q Questionsq Canmassiverawtextshelp?q Candictionarieshelp?q Arehumanannotationsalwayscorrect?q IsTokenizer alwaysgood?q …

Page 7: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Questions

7

q Canmassiverawtextshelp?

q Candictionaries help?

q Arehumanannotationsalwayscorrect?

q IsTokenizer alwaysgood?

Page 8: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Word Embedding à Language Model (LM)

8

q Using Language Model for better representations:q Word-level Language Model:q ELMo (Petersetal.,NAACL’18, bestpaper)q LD-Net (Liu et al., EMNLP’18)

q Char-level Language Model:q LM-LSTM-CRF (Liu et al., AAAI’ 18)q Flair(Akbik etal.,COLING’18)

q Hybrid LanguageModel:q Cross View Training (Clark etal.,EMNLP’ 2018)q BERT (Devlin et al., NAACL’19, best paper)

Page 9: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

What’s (Neural) Language Model?

9

Input words: Obama was born

Target words: was born in

1000

0100

0010

0.3-0.40.5

0.70.30.2

0.20.1-0.4

0.50.41.52.2

1.0-3.32.21.5

0.11.21.12.4

word embedding

q Describing the generation of text:q predicting the next word based on

previous contexts

q Pros:q Does not require any human annotationsq Nearlyunlimitedtrainingdata!

q Resulting models can generate sentencesof an unexpectedlyhighquality

Page 10: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Neural LM: Example Generations

10

q Char-by-CharMarkdownGenerations:

'''Seealso''':[[Listofethicalconsentprocessing]]

==Seealso==*[[Iender domeoftheED]]*[[Anti-autism]]

===[[Religion|Religion]]===*[[FrenchWritings]]*[[Maria]]*[[Revelation]]

ValidSyntax!

Page 11: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Neural LM: Example Generations

11

q Deep“DonaldTrump”:MimicPresidentTrump

Fooledmanytwitterusers

Page 12: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

LM-LSTM-CRF: Co-Train Neural LM

12

q Propose to use Character-level language model as a Co-Training objectiveq Whycharacter-level?q Moreefficient& More robust to pre-processing

Page 13: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

ELMo: Pre-train Word-Level Neural LM

13

q Add ELMo at the input of RNN. For some tasks (SNLI, SQuAD), includingELMo at the output brings further improvements

q Keypoints:q Freeze the weight of the biLMq Regularization are necessary

␣ V i n k e n ␣

embedding

lstm

down-projection

c1, c2,0 c2,1 c2,2 c2,3 c2,4 c2,5 c2,

Vinken

Vinken

Pierre ,

Pierre ,

Pierre ,Vinken

[ ]

E-PERy2

……

… …

CRF for Sequence LabelingBackward PTLM

Forward PTLM

fixed lstm

[ ] concatenate

w2

Page 14: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

LD-Net: An efficient version of ELMo

14

q Make the contextualizedrepresentefficient withoutmuch loss of effectiveness

q How is this even possible?q Pre-trained language

model containsabundant information,however, for a specifictask, only part of it couldbe useful.

Page 15: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Flair: Pre-Train Neural LM at All Levels

15

q Even for character-level language model, pre-training is very important.q The structure is the same with LM-LSTM-CRF, the difference is the pre-

training conducted on additional training corpus.

Page 16: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

BERT: Introduce Transformer

16

q Introduce Transformers, use masked language model + next sentenceprediction

q Conduct fine-tuning after pre-training on each task (necessary forsentence-level tasks, NER is a word level task).

Page 17: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

New State-of-the-artsq Using Language Model for better representations:q Word-level Language Model:q ELMo(Petersetal.,NAACL’18, bestpaper)q LD-Net (Liu et al., EMNLP’18)

q Char-level Language Model:q LM-LSTM-CRF (Liu et al., AAAI’ 18)q Flair(Akbiketal.,COLING’18)

q HybridLanguageModel:q Cross View Training (Clark etal.,EMNLP’ 2018)q BERT (Devlin et al., NAACL’19, best paper)

92.292.0, ~5X faster

91.493.1

92.692.4 / 92.8

F1 on CoNLL03

17

Page 18: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Questions

18

q Canmassiverawtextshelp?à Neurallanguagemodel

q Candictionaries help?

q Arehumanannotationsalwayscorrect?

q IsTokenizer alwaysgood?

Page 19: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Distantly Supervised NER

19

q Inputq UnlabeledRawTextsq AnEntityDictionaryq entitytype,canonicalname,[synonyms_1,synonyms_2,…,synonyms_k]

q Outputq ANERmodeltorecognizetheentitiesoftheentitytypesappearedinthe

givendictionary.qNotethattheentitiestoberecognizedcanbeunseenentities.

Page 20: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Distantly Supervised NER Methods

20

q String-match/rule-baseddistantsupervisiongeneration

q AutoEntity,SwellShark,ClusType,…q Leavetheentityspandetectiontoexpertsq POSTagRule-based(e.g.,regularexpressions)

q Distant-LSTM-CRFq LeverageAutoPhrase toextract“aspectterms”

q AutoNERq Anovel“Tie-or-Break”labelingscheme+tailoredneuralmodel

Page 21: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

SwellShark: Distantly Supervised Typing

21

q DataProgrammingforTyping

q EntitySpanDetection:Regularexpressionsbasedonpart-of-speech(POS)tags

q Requiresexperteffortsq CandidateGenerators

Page 22: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Distant-LSTM-CRF: Use Phrase Mining as Supervision + LSTM-CRF

22

q AutoPhrase+LSTM-CRFq AutoPhrasegenerateslabelsq Heuristicallysetthresholds

q LSTM-CRFbuildsmodelsq Bothword&charinfoareused

q Problemq Highthresholdsneededfor

cleanpositivelabelsèmanyfalse-negativelabels

Page 23: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

AutoNER: Dual Dictionaries

23

q Acoredictionaryq Leadstohigh-precisionbutlow-recallmatches

q A“full”dictionaryq Leadstohigh-recallbutlow-precisionmatchesq Introduceout-of-dictionaryhigh-qualityphrasesasnewentitiesq Theirtypesare“unknown”q ItcouldbeanyIOBES+anytype

Page 24: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

AutoNER: Fuzzy-LSTM-CRF Baseline

24

Page 25: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

AutoNER: “Tie or Break”

25

q Insteadoflabelingeachtoken,wechoosetotagtheconnectionbetweentwoadjacenttokens.

q Foreverytwoadjacenttokens,theconnectionbetweenthemislabeledasq (1)Tie,whenthetwotokensarematchedtothesameentityq (2)Unknown,ifatleastoneofthetokensbelongstoanunknown-typed

high-qualityphrase;q (3)Break,otherwise.

Page 26: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

AutoNER: Tailored Neural Model

26

Page 27: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Comparison – Biomedical Domain

27

Page 28: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Questions

28

q Canmassiverawtextshelp?à Neurallanguagemodel

q Candictionaries help? à Distantsupervisedsetting

q Arehumanannotationsalwayscorrect?

q IsTokenizer alwaysgood?

Page 29: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Typical Annotation Mistakes in CoNLL03

29

q State-of-the-artF1scoreonthistestsetisalreadyaround93%q ~5.38% test sentenceshaveannotationmistakesq Significantamount!

Page 30: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Evaluation on Corrected Test Set

30

q HigherF1scorewithsmallervarianceq Betterreflectstherealperformanceq Thiscorrectedtestsetshouldbeadoptedinfutureresearch

Page 31: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

CrossWeigh: Handle Noisy Training Set

31

Page 32: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Key Problem: How to Partition k-Folds?

32

q RandomPartitionmaybeineffectiveq NeuralNERmodelswilloverfit theannotationmistakesobserved

duringtraining

q EntityDisjointFilteringq Ineachfold,ifa“training”sentencecontainsanyentitiesappearedin

the“testing”set,itwillbediscardedduringthe“training”

Page 33: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

CrossWeigh: Evaluation

33

q CrossWeigh iseffectivewithmanyNERmodels

q EntityDisjointFilteringisimportant q Twitter&Low-resource

Page 34: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

CrossWeigh: Identify Annotation Mistakes

34

q CoNLL03 train,dev &testasasupertrainingsetq Apply CrossWeigh toidentifyannotationmistakes onthetestsetq Evaluateagainst186humancorrectionsq Almost80%ofmistakescanbedetected

Page 35: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Questions

35

q Canmassiverawtextshelp?à Neurallanguagemodel

q Candictionaries help? à Distantsupervisedsetting

q Arehumanannotationsalwayscorrect?à Auto-Correction

q IsTokenizer alwaysgood?

Page 36: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Typical NER Pipeline System

36

q Pre-processingtoolsareappliedfirst

Page 37: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

An Interesting Observation

37

q BroadTwitterCorpus(BTC)q AtwitterNERdataset

q spaCyq ApopularPython NLPlib

q spaCy tokenization+BTCdatasetq àWordboundariesofmorethan45% namedentitieswillbe

incorrectlyidentified!

Page 38: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Neural-Char-CRF: Raw-to-End

38

q WeproposetoconductNERtraininginaraw-to-endmannerq Rawtextastheinput&Predictionsatthecharacterlevel

Page 39: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Neural-Char-CRF: String Match

39

q PrefertomatchthewordswithhigherInverseDocumentFrequency(IDF)

Page 40: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Neural-Char-CRF: Character-Level LM

40

q Character-level neurallanguagemodelisleveragedq Pre-training +Contextualizedrepresentations

Page 41: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Comparison – Twitter NER Datasets

41

q Tokenizer mattersq NLTKisthebestonboth

datasetsq Raw-to-Endwinsq StringMatch isevenbetter

Page 42: Named Entity Recognition from a Data-Driven Perspectivecseweb.ucsd.edu/classes/fa19/cse259-a/files/jingbo.pdf · What’s Named Entity Recognition? 3 q Wikipedia: q Named-entity recognition

Summary & Q&A

42

q Usingneurallanguagemodel,massiverawtextscanhelp!

q High-qualitydictionariescanhelp!

q HumanannotationsareNOTalwayscorrect!

q Tokenizer isnotthatimportantandsometimesevenhurts!