Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based...

77
Stacking With Auxiliary Features: Improved Ensembling for Natural Language and Vision Nazneen Rajani PhD Proposal November 7, 2016 Committee members: Ray Mooney, Katrin Erk, Greg Durrett and Ken Barker

Transcript of Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based...

Page 1: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Stacking With Auxiliary Features: Improved Ensembling for Natural Language and Vision

NazneenRajaniPhDProposal

November7,2016Committeemembers:RayMooney,KatrinErk,GregDurrettandKenBarker

Page 2: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Outline• Introduction• Background&RelatedWork• CompletedWork

– StackedEnsemblesofInformationExtractorsforKnowledgeBasePopulation(ACL2015)– StackingWithAuxiliaryFeatures(Underreview)–CombiningSupervisedandUnsupervisedEnsemblesforKnowledgeBasePopulation(EMNLP2016)

• ProposedWork– Short-termproposals– Long-termproposals

2

Page 3: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Introduction

3

System 1

f( )

System 2

System N-1

System N

input

input

input

input

output

• Ensembling:Usedbythe$1MwinningteamfortheNetflixcompetition

Page 4: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Introduction• Makeauxiliaryinformationaccessibletotheensemble

4

System 1

f( )

System 2

System N-1

System N

input

input

input

input

output

Auxiliary information about task and systems

Page 5: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Background and Related Work

5

Page 6: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Cold Start Slot Filling (CSSF)• KnowledgeBasePopulation(KBP)isataskofdiscoveringentityfactsandaddingtoaKB

• Relationextraction,aKBPsub-task,usingfixedontologyisslotfilling

• CSSFisanannualNISTevaluationofbuildingKBfromscratch- queryentitiesandpre-definedslots- textcorpus 6

Page 7: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Cold Start Slot Filling (CSSF)• Someslotsaresingle-valued(per:age)whilesomearelist-valued(per:children)

• Entitytypes:PER,ORG,GPE• Alongwithfills,systemsmustprovide- confidencescore- provenance—docid:startoffset-endoffset

7

Page 8: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Cold Start Slot Filling (CSSF)

8

1. city_of_headquarters:2. website:3. subsidiaries:4. employees:5. shareholders:

Microsoftisatechnologycompany,headquarteredinRedmond,Washingtonthatdevelops…

city_of_headquarters:Redmondprovenance:

confidencescore:1.0

org:Microsoft

Page 9: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Cold Start Slot Filling (CSSF)

Distant Supervision

Bootstrapping

Universal Schema

Multi-Instance Multi-Learning

Training Data

Source Corpus

Query

Query Expansion

Document level IR

Aliasing

Answer 9

Page 10: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Entity Discovery and Linking (EDL)• KBPsub-taskinvolvingtwoNLPproblems- NamedEntityRecognition(NER)- Disambiguation

• EDLisanannualNISTevaluationin3languages:English,SpanishandChinese

• Tri-lingualEntityDiscoveryandLinking(TEDL)10

Page 11: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Tri-lingual Entity Discovery and Linking (TEDL)

• Detectallentitymentionsincorpus• LinkmentionstoEnglishKB(FreeBase)• IfnoKBentryfound,clusterintoaNILID• Entitytypes—PER,ORG,GPE,FAC,LOC• Systemsmustalsoprovideconfidencescore

11

Page 12: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Tri-lingual Entity Discovery and Linking (TEDL)

12

FreeBase entry: Hillary Diane Rodham Clinton is a US Secretary of State, U.S. Senator, and First Lady of the United States. From 2009 to 2013, she was the 67th Secretary of State, serving under President Barack Obama. She previously represented New York in the U.S. Senate. Source Corpus Document:

Hillary Clinton Not Talking About ’92 Clinton-Gore Confederate Campaign Button.. FreeBase entry:

WilliamJefferson"Bill"ClintonisanAmericanpoli5cianwhoservedasthe42ndPresidentoftheUnitedStatesfrom1993to2001.ClintonwasGovernorofArkansasfrom1979to1981and1983to1992,andArkansasAJorneyGeneralfrom1977to1979.

Page 13: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Tri-lingual Entity Discovery and Linking (TEDL)

Unsupervised Similarity

Supervised Classification

Graph Based

Joint Approach

FreeBase KB

Query

Query Expansion

Answer

Candidate Generation and Ranking

13

Page 14: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

ImageNet Object Detection• WidelyknownannualcompetitioninCVforlarge-scaleobjectrecognition

• Objectdetection- detectallinstancesofobjectcategories(total200)inimages

- localizeusingaxis-alignedBoundingBoxes(BB)• ObjectcategoriesareWordNetsynsets• Systemsalsoprovideconfidencescores 14

Page 15: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

ImageNet Object Detection

15

Page 16: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Ensemble Algorithms(Wolpert, 1992)

16

• StackingSystem 1

System 2

System N-1

System N

Trained classifier

Accept?

conf 1

conf 2

conf N-1

conf N

Page 17: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Ensemble Algorithms• BipartiteGraph-basedConsensusMaximization(BGCM)(Gaoetal.,2009)- ensembling->optimizationoverbipartitegraph- combiningsupervisedandunsupervisedmodels

• MixturesofExperts(ME)(Jacobsetal.,1991)- partitiontheproblemintosub-spaces- learntoswitchexpertsbasedoninputusingagatingnetwork- DeepMixturesofExperts(Eigenetal.,2013)

17

Page 18: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Completed Work:I. Stacked Ensembles of Information Extractors for

Knowledge Base Population (ACL2015)

18

Page 19: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Stacking(Wolpert, 1992)

Foragivenproposedslot-fill,e.g.spouse(Barack,Michelle),combineconfidencesfrommulgplesystems:

System 1

System 2

System N-1

System N

Trained linear SVM

conf 1

conf 2

conf N-1

conf N Accept? 19

Page 20: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Stacking with FeaturesForagivenproposedslot-fill,e.g.spouse(Barack,Michelle),combineconfidencesfrommulgplesystems:

System 1

System 2

System N-1

System N

conf 2

conf N-1

conf N

Trained linear SVM

Accept?

Slot Type conf 1

20

Page 21: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Stacking with FeaturesForagivenproposedslot-fill,e.g.spouse(Barack,Michelle),combineconfidencesfrommulgplesystems:

System 1

System 2

System N-1

System N

Trained linear SVM

Accept?

Slot Type Provenance conf 1

conf 2

conf N-1

conf N 21

Page 22: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Document Provenance Feature• Foragivenqueryandslot,foreachsystem,i,thereisafeatureDPi:- Nsystemsprovideafillfortheslot.- Ofthese,ngivesameprovenancedocidasi.- DPi=n/Nisthedocumentprovenancescore.

• Measuresextenttowhichsystemsagreeondocumentprovenanceoftheslotfill.

22

Page 23: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Offset Provenance Feature• Degreeofoverlapbetweensystems’provenancestrings.• UsesJaccardsimilaritycoefficient.

• SystemswithdifferentdocidhavezeroOP

23

Page 24: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Offset Provenance Feature

24

Offsets System 1 System 2 System 3

Start Offset 1 4 5

End Offset 9 7 12

OP1 =12×49+512

"

#$

%

&'

1 2 3 4 5 6 7 8 9 10 11 12 13

System 2

System 3

Page 25: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Results

Approach Precision Recall F1

Union 0.176 0.647 0.277

Voting 0.694 0.256 0.374

Best ESF system in 2014 (Stanford) 0.585 0.298 0.395

Stacking 0.606 0.402 0.483

Stacking + Relation 0.607 0.406 0.486

Stacking + Provenance + Relation 0.541 0.466 0.501

25

• Usingthe10commonsystemsbetween2013and2014

(>=3)

Page 26: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Takeaways• Stackedmeta-classifierbeatsthebestperforming2014KBPSF

systembyanF1gainof11points.• Featuresthatutilizeauxiliaryinformationimprovestacking

performance.• Ensemblinghasclearadvantagesbutnaiveapproachessuchas

votingdonotperformaswell.• Althoughsystemschangeeveryyear,thereareadvantagesin

trainingonpastdata.26

Page 27: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Completed Work:II. Stacking With Auxiliary Features (under review)

27

Page 28: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Stacking With Auxiliary Features (SWAF)

System1

System2

SystemN

TrainedMeta-classifier

ProvenanceFeatures

conf2

confN Accept?

SystemN-1 confN-1

conf1

AuxiliaryFeatures

InstanceFeatures

• Stackingusingtwotypesofauxiliaryfeatures:

28

Page 29: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Instance Features• Enablesstackertodiscriminatebetweeninputinstancetypes

• Somesystemsarebetteratcertaininputtypes• CSSF—slottype(per:age)• TEDL—entitytype(PER/ORG/GPE/FAC/LOC)• Objectdetection—objectcategoryandSIFTfeaturedescriptors

29

Page 30: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Provenance Features• Enablesthestackertodiscriminatebetweensystems

• Outputisreliableifsystemsagreeonsource• CSSFsameasslotfilling• TEDL—measuresoverlapofamention

30

Page 31: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Provenance Features• Objectdetection—measureBBoverlap

31

+

Page 32: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Post-processing• CSSF

- singlevaluedslotfills—resolveconflicts- listvaluesslotfills—alwaysinclude

• TEDL- KBID—includeinoutput- *NILID—mergeacrosssystemsifatleastoneoverlap

• Objectdetection- Foreachsystem,measuremaximumsumoverlapwithothersystems- Union/intersection—penalizedbyevaluationmetric

32

Page 33: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Results• 2015CSSF—10sharedsystems

Approach Precision Recall F1ME(Jacobsetal.,1991) 0.479 0.184 0.266Oraclevoting(>=3) 0.438 0.272 0.336

Toprankedsystem(Angelietal.,2015) 0.399 0.306 0.346Stacking 0.497 0.282 0.359

Stacking+instancefeatures 0.498 0.284 0.360

Stacking+provenancefeatures 0.508 0.286 0.366

SWAF 0.466 0.331 0.38733

Page 34: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Results• 2015TEDL—6sharedsystems

Approach Precision Recall F1Oraclevoting(>=4) 0.514 0.601 0.554

ME(Jacobsetal.,1991) 0.721 0.494 0.587Toprankedsystem(Siletal.,2015) 0.693 0.547 0.611

Stacking 0.729 0.528 0.613Stacking+instancefeatures 0.783 0.511 0.619

Stacking+provenancefeatures 0.814 0.508 0.625

SWAF 0.814 0.515 0.63034

Page 35: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Results• 2015ImageNetobjectdetection—3sharedsystems

Approach MeanAP MedianAPOraclevoting(>=1) 0.366 0.368

Beststandalonesystem(VGG+selectivesearch) 0.434 0.430Stacking 0.451 0.441

Stacking+instancefeatures 0.461 0.45MixturesofExperts(Jacobsetal.,1991) 0.494 0.489

Stacking+provenancefeatures 0.502 0.494

SWAF 0.506 0.497 35

Page 36: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Results on object detection

36

Page 37: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Takeaways• SWAFproducedSOTAonCSSFandTEDL;significant

improvementsonobjectdetection• OurapproachismorerobustthanMEintermsofnumberof

componentsystems• Workswellforimageswithmultipleinstancesofthesame

object

37

Page 38: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Completed Work:III. Combining Supervised and Unsupervised Ensembles for

Knowledge Base Population (EMNLP2016)

38

Page 39: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Combining supervised & unsupervised ensembles

SupSystem1

SupSystem2

SupSystemN

UnsupSystem1

TrainedlinearSVM

AuxiliaryFeatures

conf1

conf2

confN

UnsupSystem2 Calibratedconf

UnsupSystemM

ConstrainedOp@miza@on(Wengetal,2013)

Accept?

39

Page 40: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Unsupervised ensemble(Wang et al., 2013)

• Approachtoaggregaterawconfidencevalues• Re-weighttheconfidencescoreofaninstance- numberofsystemsthatproduceit- rankofthosesystems

• Uniformweightsforallsystems• Ourworkextendstoentitylinking

40

Page 41: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Results• 2015CSSF—#supsystems=10,#unsupsystems=13

Approach Precision Recall F1Constrainedoptimization 0.1712 0.3998 0.2397

Oraclevoting(>=3) 0.4384 0.2720 0.3357Toprankedsystem(Angelietal.,2015) 0.3989 0.3058 0.3462

SWAF 0.4656 0.3312 0.3871BGCMforcombiningsup+unsup 0.4902 0.3363 0.3989Stackingforcombiningsup+unsup

(BGCM) 0.5901 0.3021 0.3996

Stackingforcombiningsup+unsup(constrainedoptimization) 0.4676 0.4314 0.4489

41

Page 42: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Results• 2015TEDL—#supsystems=6,#unsupsystems=4

Approach Precision Recall F1Constrainedoptimization 0.176 0.445 0.252

Oraclevoting(>=4) 0.514 0.601 0.554Toprankedsystem(Siletal.,2015) 0.693 0.547 0.611

SWAF 0.813 0.515 0.630BGCMforcombiningsup+unsup 0.810 0.517 0.631Stackingforcombiningsup+unsup

(BGCM) 0.803 0.525 0.635

Stackingforcombiningsup+unsup(constrainedoptimization) 0.686 0.624 0.653

42

Page 43: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Takeaways• Manyhighrankingsystemsw/otrainingdata• Approximately1/3ofpossibleoutputsproducedbyunsupervisedensemble

• Combinationimprovesrecallsubstantially

43

Page 44: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Proposed Work:I. Short-term proposals — Semantic Instance-level Features

44

Page 45: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Instance-level features• Completedworkincludedonlysuperficialinstancefeatures

• Focusmoreontheinstancefeatures—taskspecific• Specifically,moresemanticfeatures• Basedontheresults,thesefeatures:- helpimproveperformancebythemselves,- usedalongwithprovenance

45

Page 46: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

EDL instance-level features(Francis et al., 2016)

• UsedcontextualinformationtodisambiguateentitymentionsusingCNNsforEDL

• Computessimilaritiesbetweenamention'ssourcedocumentanditspotentialentitytargetsatmultiplegranularities.

• CNNs:textblocktopicvector46

Page 47: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

EDL instance-level features

Men$onHillaryClintonContextForahostofreasons,HillaryClintonsomehowhasfailedtodeveloptherhetoricalorinterpersonalskillsthatmadeherhusband,andBarackObama,soappealingonthecampaigntrail.DocumentIdon'tdisagreewiththegistofDowd'sarBcle.Forahostofreasons,HillaryClintonsomehowhasfailedtodeveloptherhetoricalorinterpersonalskillsthatmadeherhusband,andBarackObama,soappealingonthecampaigntrail.Clintonhasalso,forreasonsgoodandbad,madeanumberoferrorsinjudgementinherrun-uptohercurrentcampaign...

TitleHillaryDianeRodhamClintonAr$cleHillaryClintonisaformerUnitedStatesSecretaryofState,U.S.Senator,andFirstLadyoftheUnitedStates.From2009to2013,shewasthe67thSecretaryofState,servingunderPresidentBarackObama.ShepreviouslyrepresentedNewYorkintheU.S.Senate.Beforethat,asthewifeofPresidentBillClinton,shewasFirstLadyfrom1993to2001.Inthe2008elecBon,ClintonwasaleadingcandidatefortheDemocraBcpresidenBalnominaBon.AnaBveofIllinois,HillaryRodhamwasthefirststudentcommencementspeakeratWellesleyCollegein1969.ShethenearnedaJ.D.fromYaleLawSchoolin1973.AXerabriefsBntasaCongressionallegalcounsel,shemovedtoArkansasandmarriedBillClintonin1975.RodhamcofoundedtheArkansasAdvocatesforChildrenandFamiliesin1977..

smenBon tBtle

tarBcle

scontext

sdocument

• Examplesourceandtargetgranularitiesforaninstanceinthe2016NISTKBPdataset.

47

Page 48: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Object detection instance-level features• ImageNetprovidesattributesdatasetforcertaincategories• Annotatedwithpre-definedsetsofattributes:- Color:black,blue,brown,gray,green,orange,pink,red,violet,white,yellow

- Pattern:spotted,striped- Shape:long,round,rectangular,square- Texture:furry,smooth,rough,shiny,metallic,vegetation,wooden,wet

48

Page 49: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Proposed Work:I. Short-term proposals — Improve Foreign Language KBP

49

Page 50: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Foreign language features• ThisworkwillonlyapplytotheKBPtasks• Resultsonthe2016TEDLtask

Language Precision Recall F1

English 0.805 0.508 0.623

Spanish 0.79 0.443 0.568

Chinese 0.792 0.495 0.609

Combined 0.789 0.481 0.59750

Page 51: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Foreign language features• TEDL-foreignlanguagetrainingdata• AuxiliaryfeaturesdonottranslatetoChineseandSpanish

• Straightforwardfeature—languageindicator• Uselanguageindependentfeatures- non-lexical

51

Page 52: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Language Independent Entity Linking (LIEL) solution to TEDL

(Sil and Florian, 2016)

• EntitycategoryPMI• Categoricalrelationfrequency• Titleco-occurrencefrequency

52

Page 53: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Proposed Work:II. Long-term proposals — Visual Question Answering

53

Page 54: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Visual Question Answering (VQA)(Antol et al., 2015)

• UnderstandhowDNNsdoobjectdetection

54

Page 55: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Visual Question Answering (VQA)• VQAinvolvesbothlanguageandvision• DemonstrateSWAFonVQA• Ensemblebasedontheanswers- Multiplechoicequestions- Openendedanswers—90%one-wordanswers

• Useexplanationsasauxiliaryfeatures 55

Page 56: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Proposed Work:II. Long-term proposals — Explanations as auxiliary features

56

Page 57: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Explanation as auxiliary features• Completedworkfocusedonusingprovenance• Captured“where”aspectoftheoutput• RecentworkongeneratingexplanationstointerpretDNNs:- TowardsTransparentAIsystems- Generatingvisualexplanations- VisualQuestionAnswering(VQA)

• DARPAprogramforexplainableAI(XAI) 57

Page 58: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Explanation as auxiliary features• Useexplanationsasauxiliaryfeatures• Capture“why”aspectoftheoutput• Twotypesofexplanations:- Textual- Visual

58

Page 59: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Text as Explanation(Hendricks et al., 2016)

• Generatingvisualexplanations• Jointlypredictvisualclassandgeneratetextasexplanation

• Usesdescriptivepropertiesvisibleintheimage

59

Page 60: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Text as Explanation

60

ThisisaKentuckywarblerbecausethisisayellowbirdwithashorttail

ThisisaKentuckywarblerbecausethisisayellowbirdwithablackcheekpatchandablackcrown

SystemA(Berkeley) SystemBInputimage

Page 61: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Text as Explanation• Trustagreementbetweensystemswithsimilarexplanations

• MTmetrics—BLEU/METEORforsimilarity• MinimumBayesRisk(MBR)decoding• Embeddingsofwordsintheexplanation

61

Page 62: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Images as Explanation• DNNsattendtorelevantpartsofimagewhiledoingVQA(Goyaletal.,2016)

• Heat-maptovisualizeattentioninimages• Humanstrustsystemswithbetterexplanationsmoreevenwhentheyallpredictthesameoutput(Selvarajuetal.,2016)

• Enablethestackertolearntorelyonsystemsthat“look”attherightregionoftheimagewhilepredictingtheanswer

62

Page 63: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Images as Explanation

63

SystemA SystemBInputimage

Q:Whatcoloristhecat? A:Brown A:Brown

Page 64: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Images as Explanation• UsevisualexplanationtoimproveVQA• Measureagreementbetweensystems’heat-maps- KL-divergence- Measurecorrelation

• Usingvisualexplanation- improveperformance- modelwithbetterexplanations 64

Page 65: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Conclusion

65

Page 66: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Conclusion• Generalproblemofcombiningoutputsfromdiversesystems

• SWAFonthreedifficulttasks• Provenancecaptures“where”oftheoutput• Combiningsupervisedandunsupervisedensemblesimprovesrecall

• Short-term:betterauxiliaryfeatures• Long-term:focuson“why”oftheoutput

66

Page 67: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

67

System1

System2

System3

Meta-classifier

You!Questions?

ThankYou!

Questions?

ThankYou!Questions?

Page 68: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Backup slides

68

Page 69: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Results on CSSF

69

Page 70: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Results on TEDL

70

Page 71: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Number of systems in 2016Supervised Unsupervised

English Chinese Spanish English Chinese Spanish

TEDL 5 4 4 7 3 3

CSSF 8 2 3 8 1 0

71

Page 72: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Learning Curve• Systemschangeeachyear.• Stillusefultotrainonpastdata.

72

Page 73: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Incremental Training on Systems• Sortthecommonsystemsbasedontheirperformance.• Traintheclassifieraddingonesystemateachstep.• Teston2014data.

73

Page 74: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Unsupervised ensemble• Mutualexclusionproperty

• Listvaluedslotfillreplace1by

• Forentity-linking,1isreplacedwith

avg no. of correct slot fills

total no. of slot fills

avg no. of correct mentions for an entity type

total no. of mentions for that entity type

74

Page 75: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Ratio of sup and unsup systems• Unsupervised~1/3ofthecombination• Commonoutput:22%forCSSFand15%forTEDL

75

Page 76: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

KBP instance-level features• Embedthewordsinad-dimensionalspace- d=300withwindowsize=21

• Wordsvectorusingaconv-netfilterMg

• SimilarsemanticfeaturesbetweenquerydocumentandprovenancedocumentfortheCSSFtask

76

Page 77: Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based Consensus Maximization (BGCM) (Gao et al., 2009) - ensembling -> optimization over bipartite

Language Independent Entity Linking (LIEL) solution to TEDL (Sil and Florian, 2016)

• EntitycategoryPMI- CalculatesthePMIbetweenpairofentities(e1,e2)thatco-occurinadocument

• Categoricalrelationfrequency- CountthenumberofKBrelationsthatexistsbetweenpairofentities(e1,e2)

• Titleco-occurrencefrequency- Foreverypairofconsecutiveentities(e,e’),computesthenumberoftimese’appearsasalinkintheKBpagefore 77