Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based...
Transcript of Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based...
Stacking With Auxiliary Features: Improved Ensembling for Natural Language and Vision
NazneenRajaniPhDProposal
November7,2016Committeemembers:RayMooney,KatrinErk,GregDurrettandKenBarker
Outline• Introduction• Background&RelatedWork• CompletedWork
– StackedEnsemblesofInformationExtractorsforKnowledgeBasePopulation(ACL2015)– StackingWithAuxiliaryFeatures(Underreview)–CombiningSupervisedandUnsupervisedEnsemblesforKnowledgeBasePopulation(EMNLP2016)
• ProposedWork– Short-termproposals– Long-termproposals
2
Introduction
3
System 1
f( )
System 2
System N-1
System N
input
input
input
input
output
• Ensembling:Usedbythe$1MwinningteamfortheNetflixcompetition
Introduction• Makeauxiliaryinformationaccessibletotheensemble
4
System 1
f( )
System 2
System N-1
System N
input
input
input
input
output
Auxiliary information about task and systems
Background and Related Work
5
Cold Start Slot Filling (CSSF)• KnowledgeBasePopulation(KBP)isataskofdiscoveringentityfactsandaddingtoaKB
• Relationextraction,aKBPsub-task,usingfixedontologyisslotfilling
• CSSFisanannualNISTevaluationofbuildingKBfromscratch- queryentitiesandpre-definedslots- textcorpus 6
Cold Start Slot Filling (CSSF)• Someslotsaresingle-valued(per:age)whilesomearelist-valued(per:children)
• Entitytypes:PER,ORG,GPE• Alongwithfills,systemsmustprovide- confidencescore- provenance—docid:startoffset-endoffset
7
Cold Start Slot Filling (CSSF)
8
1. city_of_headquarters:2. website:3. subsidiaries:4. employees:5. shareholders:
Microsoftisatechnologycompany,headquarteredinRedmond,Washingtonthatdevelops…
city_of_headquarters:Redmondprovenance:
confidencescore:1.0
org:Microsoft
Cold Start Slot Filling (CSSF)
Distant Supervision
Bootstrapping
Universal Schema
Multi-Instance Multi-Learning
Training Data
Source Corpus
Query
Query Expansion
Document level IR
Aliasing
Answer 9
Entity Discovery and Linking (EDL)• KBPsub-taskinvolvingtwoNLPproblems- NamedEntityRecognition(NER)- Disambiguation
• EDLisanannualNISTevaluationin3languages:English,SpanishandChinese
• Tri-lingualEntityDiscoveryandLinking(TEDL)10
Tri-lingual Entity Discovery and Linking (TEDL)
• Detectallentitymentionsincorpus• LinkmentionstoEnglishKB(FreeBase)• IfnoKBentryfound,clusterintoaNILID• Entitytypes—PER,ORG,GPE,FAC,LOC• Systemsmustalsoprovideconfidencescore
11
Tri-lingual Entity Discovery and Linking (TEDL)
12
FreeBase entry: Hillary Diane Rodham Clinton is a US Secretary of State, U.S. Senator, and First Lady of the United States. From 2009 to 2013, she was the 67th Secretary of State, serving under President Barack Obama. She previously represented New York in the U.S. Senate. Source Corpus Document:
Hillary Clinton Not Talking About ’92 Clinton-Gore Confederate Campaign Button.. FreeBase entry:
WilliamJefferson"Bill"ClintonisanAmericanpoli5cianwhoservedasthe42ndPresidentoftheUnitedStatesfrom1993to2001.ClintonwasGovernorofArkansasfrom1979to1981and1983to1992,andArkansasAJorneyGeneralfrom1977to1979.
Tri-lingual Entity Discovery and Linking (TEDL)
Unsupervised Similarity
Supervised Classification
Graph Based
Joint Approach
FreeBase KB
Query
Query Expansion
Answer
Candidate Generation and Ranking
13
ImageNet Object Detection• WidelyknownannualcompetitioninCVforlarge-scaleobjectrecognition
• Objectdetection- detectallinstancesofobjectcategories(total200)inimages
- localizeusingaxis-alignedBoundingBoxes(BB)• ObjectcategoriesareWordNetsynsets• Systemsalsoprovideconfidencescores 14
ImageNet Object Detection
15
Ensemble Algorithms(Wolpert, 1992)
16
• StackingSystem 1
System 2
System N-1
System N
Trained classifier
Accept?
conf 1
conf 2
conf N-1
conf N
Ensemble Algorithms• BipartiteGraph-basedConsensusMaximization(BGCM)(Gaoetal.,2009)- ensembling->optimizationoverbipartitegraph- combiningsupervisedandunsupervisedmodels
• MixturesofExperts(ME)(Jacobsetal.,1991)- partitiontheproblemintosub-spaces- learntoswitchexpertsbasedoninputusingagatingnetwork- DeepMixturesofExperts(Eigenetal.,2013)
17
Completed Work:I. Stacked Ensembles of Information Extractors for
Knowledge Base Population (ACL2015)
18
Stacking(Wolpert, 1992)
Foragivenproposedslot-fill,e.g.spouse(Barack,Michelle),combineconfidencesfrommulgplesystems:
System 1
System 2
System N-1
System N
Trained linear SVM
conf 1
conf 2
conf N-1
conf N Accept? 19
Stacking with FeaturesForagivenproposedslot-fill,e.g.spouse(Barack,Michelle),combineconfidencesfrommulgplesystems:
System 1
System 2
System N-1
System N
conf 2
conf N-1
conf N
Trained linear SVM
Accept?
Slot Type conf 1
20
Stacking with FeaturesForagivenproposedslot-fill,e.g.spouse(Barack,Michelle),combineconfidencesfrommulgplesystems:
System 1
System 2
System N-1
System N
Trained linear SVM
Accept?
Slot Type Provenance conf 1
conf 2
conf N-1
conf N 21
Document Provenance Feature• Foragivenqueryandslot,foreachsystem,i,thereisafeatureDPi:- Nsystemsprovideafillfortheslot.- Ofthese,ngivesameprovenancedocidasi.- DPi=n/Nisthedocumentprovenancescore.
• Measuresextenttowhichsystemsagreeondocumentprovenanceoftheslotfill.
22
Offset Provenance Feature• Degreeofoverlapbetweensystems’provenancestrings.• UsesJaccardsimilaritycoefficient.
• SystemswithdifferentdocidhavezeroOP
23
Offset Provenance Feature
24
Offsets System 1 System 2 System 3
Start Offset 1 4 5
End Offset 9 7 12
OP1 =12×49+512
"
#$
%
&'
1 2 3 4 5 6 7 8 9 10 11 12 13
System 2
System 3
Results
Approach Precision Recall F1
Union 0.176 0.647 0.277
Voting 0.694 0.256 0.374
Best ESF system in 2014 (Stanford) 0.585 0.298 0.395
Stacking 0.606 0.402 0.483
Stacking + Relation 0.607 0.406 0.486
Stacking + Provenance + Relation 0.541 0.466 0.501
25
• Usingthe10commonsystemsbetween2013and2014
(>=3)
Takeaways• Stackedmeta-classifierbeatsthebestperforming2014KBPSF
systembyanF1gainof11points.• Featuresthatutilizeauxiliaryinformationimprovestacking
performance.• Ensemblinghasclearadvantagesbutnaiveapproachessuchas
votingdonotperformaswell.• Althoughsystemschangeeveryyear,thereareadvantagesin
trainingonpastdata.26
Completed Work:II. Stacking With Auxiliary Features (under review)
27
Stacking With Auxiliary Features (SWAF)
System1
System2
SystemN
TrainedMeta-classifier
ProvenanceFeatures
conf2
confN Accept?
SystemN-1 confN-1
conf1
AuxiliaryFeatures
InstanceFeatures
• Stackingusingtwotypesofauxiliaryfeatures:
28
Instance Features• Enablesstackertodiscriminatebetweeninputinstancetypes
• Somesystemsarebetteratcertaininputtypes• CSSF—slottype(per:age)• TEDL—entitytype(PER/ORG/GPE/FAC/LOC)• Objectdetection—objectcategoryandSIFTfeaturedescriptors
29
Provenance Features• Enablesthestackertodiscriminatebetweensystems
• Outputisreliableifsystemsagreeonsource• CSSFsameasslotfilling• TEDL—measuresoverlapofamention
30
Provenance Features• Objectdetection—measureBBoverlap
31
+
Post-processing• CSSF
- singlevaluedslotfills—resolveconflicts- listvaluesslotfills—alwaysinclude
• TEDL- KBID—includeinoutput- *NILID—mergeacrosssystemsifatleastoneoverlap
• Objectdetection- Foreachsystem,measuremaximumsumoverlapwithothersystems- Union/intersection—penalizedbyevaluationmetric
32
Results• 2015CSSF—10sharedsystems
Approach Precision Recall F1ME(Jacobsetal.,1991) 0.479 0.184 0.266Oraclevoting(>=3) 0.438 0.272 0.336
Toprankedsystem(Angelietal.,2015) 0.399 0.306 0.346Stacking 0.497 0.282 0.359
Stacking+instancefeatures 0.498 0.284 0.360
Stacking+provenancefeatures 0.508 0.286 0.366
SWAF 0.466 0.331 0.38733
Results• 2015TEDL—6sharedsystems
Approach Precision Recall F1Oraclevoting(>=4) 0.514 0.601 0.554
ME(Jacobsetal.,1991) 0.721 0.494 0.587Toprankedsystem(Siletal.,2015) 0.693 0.547 0.611
Stacking 0.729 0.528 0.613Stacking+instancefeatures 0.783 0.511 0.619
Stacking+provenancefeatures 0.814 0.508 0.625
SWAF 0.814 0.515 0.63034
Results• 2015ImageNetobjectdetection—3sharedsystems
Approach MeanAP MedianAPOraclevoting(>=1) 0.366 0.368
Beststandalonesystem(VGG+selectivesearch) 0.434 0.430Stacking 0.451 0.441
Stacking+instancefeatures 0.461 0.45MixturesofExperts(Jacobsetal.,1991) 0.494 0.489
Stacking+provenancefeatures 0.502 0.494
SWAF 0.506 0.497 35
Results on object detection
36
Takeaways• SWAFproducedSOTAonCSSFandTEDL;significant
improvementsonobjectdetection• OurapproachismorerobustthanMEintermsofnumberof
componentsystems• Workswellforimageswithmultipleinstancesofthesame
object
37
Completed Work:III. Combining Supervised and Unsupervised Ensembles for
Knowledge Base Population (EMNLP2016)
38
Combining supervised & unsupervised ensembles
SupSystem1
SupSystem2
SupSystemN
UnsupSystem1
TrainedlinearSVM
AuxiliaryFeatures
conf1
conf2
confN
UnsupSystem2 Calibratedconf
UnsupSystemM
ConstrainedOp@miza@on(Wengetal,2013)
Accept?
39
Unsupervised ensemble(Wang et al., 2013)
• Approachtoaggregaterawconfidencevalues• Re-weighttheconfidencescoreofaninstance- numberofsystemsthatproduceit- rankofthosesystems
• Uniformweightsforallsystems• Ourworkextendstoentitylinking
40
Results• 2015CSSF—#supsystems=10,#unsupsystems=13
Approach Precision Recall F1Constrainedoptimization 0.1712 0.3998 0.2397
Oraclevoting(>=3) 0.4384 0.2720 0.3357Toprankedsystem(Angelietal.,2015) 0.3989 0.3058 0.3462
SWAF 0.4656 0.3312 0.3871BGCMforcombiningsup+unsup 0.4902 0.3363 0.3989Stackingforcombiningsup+unsup
(BGCM) 0.5901 0.3021 0.3996
Stackingforcombiningsup+unsup(constrainedoptimization) 0.4676 0.4314 0.4489
41
Results• 2015TEDL—#supsystems=6,#unsupsystems=4
Approach Precision Recall F1Constrainedoptimization 0.176 0.445 0.252
Oraclevoting(>=4) 0.514 0.601 0.554Toprankedsystem(Siletal.,2015) 0.693 0.547 0.611
SWAF 0.813 0.515 0.630BGCMforcombiningsup+unsup 0.810 0.517 0.631Stackingforcombiningsup+unsup
(BGCM) 0.803 0.525 0.635
Stackingforcombiningsup+unsup(constrainedoptimization) 0.686 0.624 0.653
42
Takeaways• Manyhighrankingsystemsw/otrainingdata• Approximately1/3ofpossibleoutputsproducedbyunsupervisedensemble
• Combinationimprovesrecallsubstantially
43
Proposed Work:I. Short-term proposals — Semantic Instance-level Features
44
Instance-level features• Completedworkincludedonlysuperficialinstancefeatures
• Focusmoreontheinstancefeatures—taskspecific• Specifically,moresemanticfeatures• Basedontheresults,thesefeatures:- helpimproveperformancebythemselves,- usedalongwithprovenance
45
EDL instance-level features(Francis et al., 2016)
• UsedcontextualinformationtodisambiguateentitymentionsusingCNNsforEDL
• Computessimilaritiesbetweenamention'ssourcedocumentanditspotentialentitytargetsatmultiplegranularities.
• CNNs:textblocktopicvector46
EDL instance-level features
Men$onHillaryClintonContextForahostofreasons,HillaryClintonsomehowhasfailedtodeveloptherhetoricalorinterpersonalskillsthatmadeherhusband,andBarackObama,soappealingonthecampaigntrail.DocumentIdon'tdisagreewiththegistofDowd'sarBcle.Forahostofreasons,HillaryClintonsomehowhasfailedtodeveloptherhetoricalorinterpersonalskillsthatmadeherhusband,andBarackObama,soappealingonthecampaigntrail.Clintonhasalso,forreasonsgoodandbad,madeanumberoferrorsinjudgementinherrun-uptohercurrentcampaign...
TitleHillaryDianeRodhamClintonAr$cleHillaryClintonisaformerUnitedStatesSecretaryofState,U.S.Senator,andFirstLadyoftheUnitedStates.From2009to2013,shewasthe67thSecretaryofState,servingunderPresidentBarackObama.ShepreviouslyrepresentedNewYorkintheU.S.Senate.Beforethat,asthewifeofPresidentBillClinton,shewasFirstLadyfrom1993to2001.Inthe2008elecBon,ClintonwasaleadingcandidatefortheDemocraBcpresidenBalnominaBon.AnaBveofIllinois,HillaryRodhamwasthefirststudentcommencementspeakeratWellesleyCollegein1969.ShethenearnedaJ.D.fromYaleLawSchoolin1973.AXerabriefsBntasaCongressionallegalcounsel,shemovedtoArkansasandmarriedBillClintonin1975.RodhamcofoundedtheArkansasAdvocatesforChildrenandFamiliesin1977..
smenBon tBtle
tarBcle
scontext
sdocument
• Examplesourceandtargetgranularitiesforaninstanceinthe2016NISTKBPdataset.
47
Object detection instance-level features• ImageNetprovidesattributesdatasetforcertaincategories• Annotatedwithpre-definedsetsofattributes:- Color:black,blue,brown,gray,green,orange,pink,red,violet,white,yellow
- Pattern:spotted,striped- Shape:long,round,rectangular,square- Texture:furry,smooth,rough,shiny,metallic,vegetation,wooden,wet
48
Proposed Work:I. Short-term proposals — Improve Foreign Language KBP
49
Foreign language features• ThisworkwillonlyapplytotheKBPtasks• Resultsonthe2016TEDLtask
Language Precision Recall F1
English 0.805 0.508 0.623
Spanish 0.79 0.443 0.568
Chinese 0.792 0.495 0.609
Combined 0.789 0.481 0.59750
Foreign language features• TEDL-foreignlanguagetrainingdata• AuxiliaryfeaturesdonottranslatetoChineseandSpanish
• Straightforwardfeature—languageindicator• Uselanguageindependentfeatures- non-lexical
51
Language Independent Entity Linking (LIEL) solution to TEDL
(Sil and Florian, 2016)
• EntitycategoryPMI• Categoricalrelationfrequency• Titleco-occurrencefrequency
52
Proposed Work:II. Long-term proposals — Visual Question Answering
53
Visual Question Answering (VQA)(Antol et al., 2015)
• UnderstandhowDNNsdoobjectdetection
54
Visual Question Answering (VQA)• VQAinvolvesbothlanguageandvision• DemonstrateSWAFonVQA• Ensemblebasedontheanswers- Multiplechoicequestions- Openendedanswers—90%one-wordanswers
• Useexplanationsasauxiliaryfeatures 55
Proposed Work:II. Long-term proposals — Explanations as auxiliary features
56
Explanation as auxiliary features• Completedworkfocusedonusingprovenance• Captured“where”aspectoftheoutput• RecentworkongeneratingexplanationstointerpretDNNs:- TowardsTransparentAIsystems- Generatingvisualexplanations- VisualQuestionAnswering(VQA)
• DARPAprogramforexplainableAI(XAI) 57
Explanation as auxiliary features• Useexplanationsasauxiliaryfeatures• Capture“why”aspectoftheoutput• Twotypesofexplanations:- Textual- Visual
58
Text as Explanation(Hendricks et al., 2016)
• Generatingvisualexplanations• Jointlypredictvisualclassandgeneratetextasexplanation
• Usesdescriptivepropertiesvisibleintheimage
59
Text as Explanation
60
ThisisaKentuckywarblerbecausethisisayellowbirdwithashorttail
ThisisaKentuckywarblerbecausethisisayellowbirdwithablackcheekpatchandablackcrown
SystemA(Berkeley) SystemBInputimage
Text as Explanation• Trustagreementbetweensystemswithsimilarexplanations
• MTmetrics—BLEU/METEORforsimilarity• MinimumBayesRisk(MBR)decoding• Embeddingsofwordsintheexplanation
61
Images as Explanation• DNNsattendtorelevantpartsofimagewhiledoingVQA(Goyaletal.,2016)
• Heat-maptovisualizeattentioninimages• Humanstrustsystemswithbetterexplanationsmoreevenwhentheyallpredictthesameoutput(Selvarajuetal.,2016)
• Enablethestackertolearntorelyonsystemsthat“look”attherightregionoftheimagewhilepredictingtheanswer
62
Images as Explanation
63
SystemA SystemBInputimage
Q:Whatcoloristhecat? A:Brown A:Brown
Images as Explanation• UsevisualexplanationtoimproveVQA• Measureagreementbetweensystems’heat-maps- KL-divergence- Measurecorrelation
• Usingvisualexplanation- improveperformance- modelwithbetterexplanations 64
Conclusion
65
Conclusion• Generalproblemofcombiningoutputsfromdiversesystems
• SWAFonthreedifficulttasks• Provenancecaptures“where”oftheoutput• Combiningsupervisedandunsupervisedensemblesimprovesrecall
• Short-term:betterauxiliaryfeatures• Long-term:focuson“why”oftheoutput
66
67
System1
System2
System3
Meta-classifier
You!Questions?
ThankYou!
Questions?
ThankYou!Questions?
Backup slides
68
Results on CSSF
69
Results on TEDL
70
Number of systems in 2016Supervised Unsupervised
English Chinese Spanish English Chinese Spanish
TEDL 5 4 4 7 3 3
CSSF 8 2 3 8 1 0
71
Learning Curve• Systemschangeeachyear.• Stillusefultotrainonpastdata.
72
Incremental Training on Systems• Sortthecommonsystemsbasedontheirperformance.• Traintheclassifieraddingonesystemateachstep.• Teston2014data.
73
Unsupervised ensemble• Mutualexclusionproperty
• Listvaluedslotfillreplace1by
• Forentity-linking,1isreplacedwith
avg no. of correct slot fills
total no. of slot fills
avg no. of correct mentions for an entity type
total no. of mentions for that entity type
74
Ratio of sup and unsup systems• Unsupervised~1/3ofthecombination• Commonoutput:22%forCSSFand15%forTEDL
75
KBP instance-level features• Embedthewordsinad-dimensionalspace- d=300withwindowsize=21
• Wordsvectorusingaconv-netfilterMg
• SimilarsemanticfeaturesbetweenquerydocumentandprovenancedocumentfortheCSSFtask
76
Language Independent Entity Linking (LIEL) solution to TEDL (Sil and Florian, 2016)
• EntitycategoryPMI- CalculatesthePMIbetweenpairofentities(e1,e2)thatco-occurinadocument
• Categoricalrelationfrequency- CountthenumberofKBrelationsthatexistsbetweenpairofentities(e1,e2)
• Titleco-occurrencefrequency- Foreverypairofconsecutiveentities(e,e’),computesthenumberoftimese’appearsasalinkintheKBpagefore 77