CS388: Natural Language Processing Lecture 22: Queson ...gdurrett/courses/fa2019/... · Same en;ty...

48
CS388: Natural Language Processing Greg Durre8 Lecture 22: Ques;on Answering 2

Transcript of CS388: Natural Language Processing Lecture 22: Queson ...gdurrett/courses/fa2019/... · Same en;ty...

  • CS388:NaturalLanguageProcessing

    GregDurre8

    Lecture22:Ques;on 
Answering2

  • Recall:SQuAD

    ‣ Single-document,single-sentenceques;on-answeringtaskwheretheanswerisalwaysasubstringofthepassage

    Rajpurkaretal.(2016)

    ‣ Predictstartandendindicesoftheanswerinthepassage

  • Recall:Bidirec;onalA8en;onFlow

    Seoetal.(2016)

    Eachpassagewordnow“knowsabout”thequery

  • Recall:QAwithBERT

    Devlinetal.(2019)

    WhatwasMarieCuriethefirstfemalerecipientof?[SEP]OneofthemostfamouspeopleborninWarsawwasMarie…

    ‣ Predictstartandendposi;onsofanswerinpassage‣ NoneedforcrazyBiDAF-stylelayers

  • Recall:SQuADSOTA

    ‣ HarderQAsecngsareneeded

    ‣ Performanceisverysaturated

  • ThisLecture

    ‣ Retrieval-basedQA/mul;-hopQA

    ‣ ProblemsinQA,especiallyrelatedtoanswertypeoverficng

    ‣ NewQAfron;ers

  • ProblemsinQA

  • AdversarialSQuAD‣ SQuADques;onsareoeeneasy:“whatwasshetherecipientof?”passage:“…recipientofNobelPrize…”

    JiaandLiang(2017)

  • AdversarialSQuAD

    WhatwasMarieCuriethefirstfemalerecipientof?[SEP]…firstfemalerecipientoftheNobelPrize…

    ‣ BERTeasilylearnssurface-levelcorrespondenceslikethiswithself-a8en;on

  • AdversarialSQuAD‣ SQuADques;onsareoeeneasy:“whatwasshetherecipientof?”passage:“…recipientofNobelPrize…”

    JiaandLiang(2017)

    ‣ Canwemakethemharderbyaddingadistractoranswerinaverysimilarcontext?

    ‣ Takeques;on,modifyittolooklikeananswer(butit'snot),thenappendittothepassage

  • AdversarialSQuAD

    JiaandLiang(2017)

    ‣ Distractor“looks”moreliketheques;onthantherightanswerdoes,evenifen;;esarewrong

  • WeaknesstoAdversaries

    JiaandLiang(2017)

    ‣ Performanceofbasicallyeverymodeldropstobelow60%(whenthemodeldoesn'ttrainonthese)

    ‣ BERTvariantsalsoweaktothesekindsofadversaries

    ‣ Unlikeotheradversarialmodels,wedon’tneedtocustomizetheadversarytothemodel;thissinglesentencebreakseverySQuADmodel

  • UniversalAdversarial“Triggers”

    Wallaceetal.(2019)

    ‣ Adding“whyhowbecausetokillamericanpeople”causesSQuADmodelstoreturnthisanswer10-50%ofthe;mewhengivena“why"ques;on

    ‣ Similara8acksonotherques;ontypeslike“who”

    ‣ SimilartoJiaandLiang,butinsteadaddthesameadversarytoeverypassage

  • HowtofixQA?‣ Be8ermodels?

    ‣ Be8erdatasets

    ‣ Butamodeltrainedonweakdatawilloeens;llbeweaktoadversaries‣ TrainingonJia+Liangadversariescanhelp,butthereareplentyofothersimilara8ackswhichthatdoesn'tsolve

    ‣ Sameques;onsbutwithmoredistractorsmaychallengeourmodels

    ‣ HarderQAtasks‣ Askques;onswhichcannotbeansweredinasimpleway

    ‣ Nextup:retrieval-basedQAmodels

    ‣ Aeerwards:mul=-hopQAandotherQAsecngs

  • RetrievalModels

  • Open-domainQA

    ‣ SQuAD-styleQAisveryar;ficial,notreallyarealapplica;on

    ‣ RealQAsystemsshouldbeabletohandlemorethanjustaparagraphofcontext—theore;callyshouldworkoverthewholeweb?

    Q:WhatwasMarieCurietherecipientof?

    MarieCuriewasawardedtheNobelPrizeinChemistryandtheNobelPrizeinPhysics…

    MotherTeresareceivedtheNobelPeacePrizein…

    CuriereceivedhisdoctorateinMarch1895…

    Skłodowskareceivedaccoladesforherearlywork…

  • Open-domainQA

    ‣ SQuAD-styleQAisveryar;ficial,notreallyarealapplica;on

    ‣ RealQAsystemsshouldbeabletohandlemorethanjustaparagraphofcontext—theore;callyshouldworkoverthewholeweb?

    ‣ QApipeline:givenaques;on:

    ‣ RetrievesomedocumentswithanIRsystem

    ‣ ZeroinontheanswerinthosedocumentswithaQAmodel

    ‣ Thisalsointroducesmorecomplexdistractors(badanswers)andshouldrequirestrongerQAsystems

  • DrQA

    Chenetal.(2017)

    ‣ Howoeendoestheretrievedcontextcontaintheanswer?(usesLucene)

    ‣ FullretrievalresultsusingaQAmodeltrainedonSQuAD:taskismuchharder

  • RetrievalwithBERT

    Leeetal.(2019)

    ‣ Canwedobe8erthanasimpleIRsystem?

    ‣ EncodethequerywithBERT,pre-encodeallparagraphswithBERT,queryisbasicallynearestneighbors

  • Problems

    Leeetal.(2019)

    ‣ManySQuADques;onsarenotsuitedtothe“open”secngbecausethey’reunderspecified

    ‣ SQuADques;onswerewri8enbypeoplelookingatthepassage—encouragesaques;onstructurewhichmimicsthepassageanddoesn’tlooklike“real”ques;ons

    ‣WheredidtheSuperBowltakeplace?

    ‣WhichplayerontheCarolinaPantherswasnamedMVP?

  • NaturalQues;ons

    Kwiatkowskietal.(2019)

    ‣ Ques;onsarosenaturally,unlikeSQuADques;onswhichwerewri8enbypeoplelookingatapassage.Thismakesthemmuchharder

    ‣ ShortanswerF1s<60,longanswerF1s

  • Mul;-HopQues;onAnswering

  • Mul;-HopQues;onAnswering

    Welbletal.(2018),Yangetal.(2018)

    ‣ VeryfewSQuADques;onsrequireactuallycombiningmul;plepiecesofinforma;on—thisisanimportantcapabilityQAsystemsshouldhave

    ‣ Severaldatasetstestmul=-hopreasoning:abilitytoanswerques;onsthatdrawonseveralsentencesorseveraldocumentstoanswer

  • WikiHop

    FigurefromWelbletal.(2018)

    ‣ AnnotatorsshownWikipediaandaskedtoposeasimpleques;onlinkingtwoen;;esthatrequireathird(bridging)en;tytoassociate

    ‣ Amodelshouldn’tbeabletoanswerthesewithoutdoingsomereasoningabouttheintermediateen;ty

  • HotpotQA

    MeetCorlissArcherisanAmericantelevisionsitcomthatairedonCBS…

    Ques%on:Whatgovernmentposi=onwasheldbythewomanwhoportrayedCorlissArcherinthefilmKissandTell?

    ShirleyTempleBlackwasanAmericanactress,businesswoman,andsinger…

    KissandTellisacomedyfilminwhich17-year-oldShirleyTempleactsasCorlissArcher.

    Asanadult,sheservedasChiefofProtocoloftheUnitedStatesDoc1

    Doc2

    Sameen;ty

    Doc3

    Sameen;ty

    ExamplepickedfromHotpotQA[Yangetal.,2018]

    ‣Muchlongerandmoreconvolutedques;ons

  • Mul;-hopReasoning

    TheOberoifamilyisanIndianfamilythatisfamousforitsinvolvementinhotels,namelythroughTheOberoiGroup

    Ques%on:TheOberoifamilyispartofahotelcompanythathasaheadofficeinwhatcity?

    TheOberoiGroupisahotelcompanywithitsheadofficeinDelhi.

    Doc1

    Doc2

    Sameen;ty

    Sameen;ty

    ExamplepickedfromHotpotQA[Yangetal.,2018]

    Thisisanidealizedversionofmul;-hopreasoning.Domodelsneedtodothistodowellonthistask?

  • Mul;-hopReasoning

    TheOberoifamilyisanIndianfamilythatisfamousforitsinvolvementinhotels,namelythroughTheOberoiGroup

    Ques%on:TheOberoifamilyispartofahotelcompanythathasaheadofficeinwhatcity?

    TheOberoiGroupisahotelcompanywithitsheadofficeinDelhi.

    Doc1

    Doc2

    ExamplepickedfromHotpotQA(Yang2018)

    Modelcanignorethebridgingen;tyanddirectlypredicttheanswer

    Highlexicaloverlap

  • Mul;-hopReasoning

    MeetCorlissArcherisanAmericantelevisionsitcomthatairedonCBS…

    Ques%on:Whatgovernmentposi=onwasheldbythewomanwhoportrayedCorlissArcherinthefilmKissandTell?

    ShirleyTempleBlackwasanAmericanactress,businesswoman,andsinger…

    KissandTellisacomedyfilminwhich17-year-oldShirleyTempleactsasCorlissArcher.

    Asanadult,sheservedasChiefofProtocoloftheUnitedStatesDoc1

    Doc2

    Sameen;ty

    Doc3

    Sameen;ty

    Nosimplelexicaloverlap.…butonlyonegovernmentposi;onappearsinthecontext!

    ExamplepickedfromHotpotQA[Yangetal.,2018]

  • Inves;ga;on

    Canamodeliden;fytheanswerwithonlyasetofcandidates?

    Canamodeliden;fywheretheanswerisinasinglehop?

    Governmentposi=on ChiefofProtocol,actress,singer

    OberoiFamily Delhi

    ChenandDurre8(2019)

  • FindingtheanswerdirectlyQues2on:Whatgovernmentposi;onwasheldbythewomanwhoportrayedCorlissArcherinthefilmKissandTell?

    KaushikandLipton(2018)

    ChiefofProtocolbusinesswoman…actress

    MeetCorlissArcherisanAmericantelevisionsitcomthatairedonCBS…

    ShirleyTempleBlackwasanAmericanactress,businesswoman,andsinger…

    KissandTellisacomedyfilminwhich17-year-oldShirleyTempleactsasCorlissArcher.

    Asanadult,sheservedasChiefofProtocoloftheUnitedStatesDoc1

    Doc2

    Doc3

  • NoContextBaseline

    DotProduct

    Ques%on:Whatgovernmentposi=onwasheldbythewomanwhoportrayedCorlissArcherinthefilmKissandTell?

    ChiefofProtocol businesswoman actress

    Ques;onEncoder

    AnswerEncoder

    ...

    ChiefofProtocol0.7

    businesswoman0.2actress0.1

    ChenandDurre8(2019)

  • Accuracy

    30

    50

    70

    59.3

    67.466.464.8

    42.938.8

    NoContextEn;ty-GCN

    CFC BAGMajority-candidate

    BiDAF

    ‣ SOTAmodelstrainedonthismaybelearningques;on-answercorrespondences,notmul;-hopreasoningasadver;sed

    Morethanhalfofques;onscanbeansweredwithoutevenusingthecontext!

    state-of-the-artweakbaselines

    ResultsonWikiHop

    NoContextEn;ty-GCN

    CFC BAGMajority-candidate

    BiDAF

    ChenandDurre8(2019)

  • Inves;ga;on

    Canamodeliden;fytheanswerwithonlyasetofcandidates?

    Canamodeliden;fywheretheanswerisinasinglehop?

    Governmentposi=on ChiefofProtocol,actress,singer

    OberoiFamily Delhi

    ChenandDurre8(2019)

  • SentenceFactoredModel

    Findtheanswerbycomparingeachsentencewiththeques;onseparately!

    Ques%on:TheOberoifamilyispartofahotelcompanythathasaheadofficeinwhatcity?

    TheOberoiGroupisahotelcompanywithitsheadofficeinDelhi.

    Doc2FutureFibreTechnologiesafibertechnologiescompany…

    Doc3

    TheOberoifamilyisanIndianfamilythatis…

    Doc1

    ChenandDurre8(2019)

  • SentenceFactoredModel

    TheOberoifamily…whatcity?

    TheOberoiGroup…inDelhi.

    FutureFibreTechnologiesisafibre…

    TheOberoifamily…

    Answerpredic;on:Delhi ‣Soemaxoverallsentencesistheonlycross-sentenceinterac;on

    BiDAF BiDAFBiDAF

    ChenandDurre8(2019)

  • BiDAF++ QFE GRN DFGN SentenceFactored

    F1

    0

    35

    70

    50.8

    69.769.068.1

    58.7

    Asimplesinglesentencereasoningmodelcansolvemorethanhalfques;onsonHotpotQA.

    ResultsonHotpotQA

    BiDAF++ QFE GRN DFGN SentenceFactored

    ChenandDurre8(2019)

  • OtherWork

    ‣Minetal.ACL2019“Composi;onalQues;onsdonotNecessitateMul;-hopReasoning”

    ‣ FocusesjustonHotpotQA

    ‣ Addi;onallytriestoadversariallyhardenHotpotagainstthesea8acks.Somelimitedsuccess,butdoesn'tsolvetheproblem

  • Ques;onAnsweringwithChains

    ChainExtractor

    QAmodel(BERT)

    FinalAnswerSpan

    ReasoningChain

    ‣Maybewecanstrengthenourmodelstoavoidtheseweaknesses.Forcethemtoexplicitlyextractareasoningchaintomakethembe8er

    Q:Whatgovernmentposi2onwasheld…

    Sent1 Sent2

    …Shebeganherdiploma=ccareer…

    AKissforCorlisswas…

    ShirleyTempleBlackwasa…

    KissandTellisacomedyfilminwhich17-year-oldShirleyTempleactsasCorlissArcher.

    Asanadult,sheservedasChiefofProtocoloftheUnitedStates

    Sent5

    Chenetal.(2019)

  • Ques;onAnsweringwithChainsQues%on:Whatgovernmentposi=onwasheldbythewomanwhoportrayedCorlissArcherinthefilmKissandTell?Answer:ChiefofProtocol

    ShirleyTempleBlackwasanAmericanactress,businesswoman,anddiplomat…

    Asanadult,sheservedastheChiefofProtocoloftheUnitedStates…

    Shebeganherdiploma=ccareerin1969,whensherepresented…

    KissandTellisafilminwhich17-year-oldShirleyTempleactsasCorlissArcher.

    “AKissforCorliss”isasequeltothefilm“KissandTell”.

    ItstarsShirleyTempleinherfinalstarringrole…

    DO

    C 1

    DO

    C 2

    DO

    C 3

    SharedEn;ty

    ShirleyTemple

    CorlissArcher

    ReasoningChain1

    In-DocCoref

    ‣ Strongconnec;onbetweentheen;;esusedhereChenetal.(2019)

  • Ques;onAnsweringwithChains

    SharedEn;ty

    ShirleyTemple

    ReasoningChain2

    In-DocCoref

    KissandTell

    ‣Morespecula;vethantheotherchainbuts;llleadstotheanswer

    Ques%on:Whatgovernmentposi=onwasheldbythewomanwhoportrayedCorlissArcherinthefilmKissandTell?Answer:ChiefofProtocol

    ShirleyTempleBlackwasanAmericanactress,businesswoman,anddiplomat…

    Asanadult,sheservedastheChiefofProtocoloftheUnitedStates…

    Shebeganherdiploma=ccareerin1969,whensherepresented…

    KissandTellisafilminwhich17-year-oldShirleyTempleactsasCorlissArcher.

    “AKissforCorliss”isasequeltothefilm“KissandTell”.

    ItstarsShirleyTempleinherfinalstarringrole…

    DO

    C 1

    DO

    C 2

    DO

    C 3

    Chenetal.(2019)

  • ChainSupervision

    ‣ Extractpseudogoldchainsbasedon:

    SharedEn;ty

    ShirleyTemple

    CorlissArcher

    In-DocCoref

    ‣Within-documentcoreference:wedon’trunacoreferencesystembutinsteadlinkallsentenceswithinaparagraph

    ‣ Shareden;;es:enableconnec;onsbetweendifferentsources

    ‣ Giventhesechains,welearnamodeltoextractthem.Attest2me,noannota2onsareneeded

    Chenetal.(2019)

  • ChainExtrac;onandQA‣ ParagraphsareencodedwithBERTtocomputesentencerepresenta;ons

    BERT

    BERT

    ‣ Apointernetworkselectsasequenceofsentences

    s3

    s3

    s7

    s3

    STOP

    ‣ AfinalBERTmodel 
thenextractsananswerspanfromoneormorechains

    s3 s7

    s3 s8

    s1 s2

    BERT Ans

    Chenetal.(2019)

  • QAResults

    ‣ HighperformanceonWikiHop(*pastsystemsdidn'tuseBERT)andHotpot‣ AlsolargegainsonhardexamplesinHotpotQA(ourmodelfrompart1couldnotfindanswersinasinglehop)

    WikiHop(English)

    50

    60

    70

    80

    90

    76.571.470.970.669.067.6

    50

    60

    70

    80

    90

    74.169.768.169.6

    HotpotQA(English)

    DecompRC QFE Ours

    Accuracy

    F1

    GCN BAG CFC JDReader DynSAN Ours DFGN

    ‣ Ongoingwork:howcanreasoningchainsbetakenbelowthesentencelevelandbemorestrongly;edtointerpretablelogicalinference?

  • NewTypesofQA

  • DROP

    Duaetal.(2019)

    ‣ Ques;ontypes:subtrac;on,comparison(whichdidhevisitfirst),coun;ngandsor;ng(whichkickerkickedmorefieldgoals),

    ‣ Invitesadhocsolu;ons(structurethemodelaroundpredic;ngdifferencesbetweennumbers)

    ‣ Onethreadofresearch:let’sbuildQAdatasetstohelpthecommunityfocusonmodelingpar;cularthings

  • Mul;QA

    TalmorandBerant(2019)

    ‣MaybeweshouldjustlookatlotsofQAdatasetsinstead?

    ‣ BERTtrainedonSQuADgets

  • Narra;veQA

    Kočiskýetal.(2017)

    ‣ Humansseeasummaryofabook:…Peter’sformergirlfriendDanaBarre`hashadason,Oscar…

    ‣ Ques;on:HowisOscarrelatedtoDana?

    ‣ Answeringtheseques;onsfromthesourcetext(notsummary)requirescomplexinferencesandisextremelychallenging;noprogressonthisdatasetin2years

  • Takeaways

    ‣ModelscanoeenworkwellforoneQAtaskbutdon’tgeneralize

    ‣Wes;lldon’thave(solvable)QAsecngswhichseemtorequirereallycomplexreasoningasopposedtosurface-levelpa8ernrecogni;on

    ‣ LotsofproblemswithcurrentQAsecngs,lotsofnewdatasets