Intrinsic(and Extrinsic(Approaches( to(Recognizing(Textual ... · Scope(• Logic$Entailment –...

52
Intrinsic and Extrinsic Approaches to Recognizing Textual Entailment Rui Wang DFKI GmbH & Saarland University

Transcript of Intrinsic(and Extrinsic(Approaches( to(Recognizing(Textual ... · Scope(• Logic$Entailment –...

  • Intrinsic  and  Extrinsic  Approaches  to  Recognizing  Textual  Entailment  

    Rui  Wang  DFKI  GmbH  &  Saarland  University  

  • Intrinsic  vs.  Extrinsic  

    5/23/11   Oslo,  Norway   2  

  • Intrinsic  vs.  Extrinsic  

    5/23/11   Oslo,  Norway   3  

  • Recognizing  Textual  Entailment  (RTE)  

    •  Textual  Entailment  (Chierchia  and  McConnell-‐Ginet,  2000)  – One  text  can  be  inferred  by  another  text.  

    •  An  example  – Text  (T):  Google  files  for  its  long  awaited  IPO.  – Hypothesis  (H):  Google  goes  public.  

    5/23/11   Oslo,  Norway   4  

  • Scope  •  Logic  Entailment  

    – H  is  true  OR  T  is  not  true  

    •  LinguisXc  ImplicaXon  – ConvenXonal  Implicature  – ConversaXonal  Implicature  

    •  Modality  

    5/23/11   Oslo,  Norway   5  

  • Machine  Transla@on  (MT)  Triangle    

    5/23/11   Oslo,  Norway   6  

  • RTE  Rectangle  

    5/23/11   Oslo,  Norway   7  

  • Outline  

    5/23/11   Oslo,  Norway   8  

  • Outline  (cont.)  •  Intrinsic  Approaches  

    –  Architecture  –  RTE  with  Event  Triples  –  RTE  with  Inference  Rules  

    •  Extrinsic  Approaches  – MoXvaXon  – MulX-‐Dimensional  ClassificaXon  Model  

    •  Summary  and  PerspecXves  5/23/11   Oslo,  Norway   9  

  • Intrinsic  Approaches  

  • An  Example  •  T:  Bush  used  his  weekly  radio  address  to  try  to  build  support  for  his  plan  to  allow  workers  to  divert  part  of  their  Social  Security  payroll  taxes  into  private  investment  accounts.  

    •  H:  Mr.  Bush  is  proposing  that  workers  be  allowed  to  divert  their  payroll  taxes  into  private  accounts.  

    5/23/11   Oslo,  Norway   11  

  • Performance  of  the  Exis@ng  Systems  

    5/23/11  

    0.8  

    0.746   0.735  

    0.685  0.706  

    0.669  

    0.5  

    0.55  

    0.6  

    0.65  

    0.7  

    0.75  

    0.8  

    0.85  

    RTE-‐3   RTE-‐4   RTE-‐5  

    1st  

    2nd  

    3rd  

    4th  

    5th  

    Wang  

    Oslo,  Norway   12  

  • Architecture  

    5/23/11   Oslo,  Norway   13  

  • Architecture  

    5/23/11   Oslo,  Norway   14  

  • Specialized  RTE  Module  •  Two  requirements  

    – A  good  target  – A  good  tackle  

    •  Two  examples  – Event  tuples  containing  named-‐enXXes  – Tree  skeleton  with  inference  rules  

    5/23/11   Oslo,  Norway   15  

  • Event  Tuple  •  <  Event  type,        Time,  

         LocaXon,  

         List  >  • Persons  • OrganizaXons  

    5/23/11   Oslo,  Norway   16  

  • An  Example  •  T:  Released  in  1995,  Tyson  returned  to  boxing,  winning  the  World  Boxing  Council  Ktle  in  1996.  The  same  year,  however,  he  lost  to  Evander  Holyfield,  and  in  a  1997  rematch  bit  Holyfield’s  ear,  for  which  he  was  temporarily  banned  from  boxing.  

    •  H:  In  1996  Mike  Tyson  bit  Holyfield’s  ear.  

    5/23/11   Oslo,  Norway   17  

  • Event  Time  Pair  •  T:  

    –   –   –   –   

    •  H:  –   

    5/23/11   Oslo,  Norway   18  

  • Event  Tuple  •  T:  

    –   –   –   –   

    •  H:  –   

    5/23/11   Oslo,  Norway   19  

  • Entailment  Recogni@on  •  Event  type  comparison  

    – Lexical  resources,  e.g.  WordNet,  VerbOcean  

    •  Named-‐enXty  comparison  – Time  expressions  normalizaXon  and  anchoring  – Ontology  of  geographical  terms  – Person  name  /  organizaXon  name  parXal  matching  

    5/23/11   Oslo,  Norway   20  

  • Inference  Rules  

    5/23/11  

    •  DIRT  collecXon  (Lin  and  Pantel,  2001)  

    Oslo,  Norway   21  

  • Tree  Skeletons  

    T:  Doctor  Robin  Warren  and  Barry  Marshall  received  Nobel  Prize  …  

    H:  Robin  Warren  was  awarded  a  Nobel  Prize.  

    5/23/11   Oslo,  Norway   22  

  • Results  

    Systems  RTE-‐2  

    BoW   TACTE   with  ExDIRT  

    Coverage   100%   10.2%   8.1%  

    Accuracy*   0.579   0.684   0.654  

    5/23/11  

    *On covered data

    RTE-‐3  

    BoW   TACTE   with  ExDIRT  

    100%   8.1%   5.8%  

    0.611   0.655   0.721  

    Systems  

    Coverage  

    Accuracy*  

    Oslo,  Norway   23  

  • Summary  and  Future  Extensions  

    Target   Tackle   Preferences  

    Event  tuples   NE  recogniXon   NE  relaXon/resoluXon   “No”  

    Tree  skeleton  with  inference  rules  

    Matched  by  DIRT  rules  

    Match  tree  skeleton  with  DIRT  rules  

    “Yes”  

    Future  Extensions  

    NegaXon  and  modality  NegaXon  words  or  modal  verbs  

    Scope  and  entailment  rules  

    N/A  

    Logic  inferencer   High  confidence   Theorem  prover   N/A  

    External  knowledge  bases  

    Covered  cases  Knowledge  applicaXon  

    N/A  

    5/23/11   Oslo,  Norway   24  

  • Extrinsic  Approaches  

  • Relevance  and  Necessity  

    5/23/11  

    No “necessity”

    Oslo,  Norway   26  

  • An  Example  •  T:  At  least  five  people  have  been  killed  in  a  head-‐on  train  collision  in  north-‐eastern  France,  while  others  are  sKll  trapped  in  the  wreckage.  All  the  vicKms  are  adults.  

    •  H:  A  French  train  crash  killed  children.  

    •  Contradictory  but  relevant!

    5/23/11   Oslo,  Norway   27  

  • Textual  Seman@c  Rela@on  (TSR)  

    5/23/11  

    Entailment

    Unknown

    Oslo,  Norway   28  

  • Related  Tasks  •  ContradicXon  recogniXon  

    –  ContradicKon  vs.  Others  (de  Marneffe  et  al.,  2008)  

    •  Paraphrase  acquisiXon  –  Paraphrase  vs.  Others  (a  survey  by  Androutsopoulos  and  MalakasioKs  (2010))  

    •  DirecXonality  recogniXon  –  Entailment  vs.  Paraphrase  (e.g.,  Kotlerman  et  al.,  2009)  

    5/23/11   Oslo,  Norway   29  

  • RTE    TSR  Rectangle  

    5/23/11   Oslo,  Norway   30  

  • Meaning  Representa@on  •  The  TACTE  system:  Event  Xme  pairs  •  The  ExTACTE  system:  Event  tuples  •  The  ExDIRT  system:  Tree  skeletons  

    •  Dependency  triple  sets:  {DEPT}  and  {DEPH}  – SyntacXc  dependency  tree  – SemanXc  dependency  graph  –  Joint  Dependency  graph  

    5/23/11   Oslo,  Norway   31  

  • Meaning  Representa@on  (cont.)  •  H:  Value  is  quesKoned.  

    •  SyntacXc  dependency  –   –   

    •  SemanXc  dependency  –   

    5/23/11   Oslo,  Norway   32  

  • Meaning  Representa@on  (cont.)  •  T:  Devotees  of  the  market  quesKon  the  value  of  the  work.  

    5/23/11   Oslo,  Norway   33  

  • Features  •    H_NULL?  (Boolean):  whether  H  has  dependencies  •    T_NULL?  (Boolean)  :  whether  T  has  dependencies  •    DIR?  (Boolean)  :  whether  T,  H  have  same  direcXon  •    MULTI?  (Boolean)  :  Add  "m_"  to  REL_PAIR  if  yes  •    DEP_SAME?  (Boolean)  :  whether  same  dependency  •    REL_SIM?  (Boolean)  :  whether  similar  relaXon  •    REL_SAME?  (Boolean)  :  whether  same  relaXon  •    REL_PAIR  (String)  :  dependency  relaXon  names  

    5/23/11   Oslo,  Norway   34  

  • Direct  Classifica@on  

    5/23/11   Oslo,  Norway   35  

  • Two-‐Stage  Classifica@on  

    5/23/11  

    3.4%

    Oslo,  Norway   36  

  • Performance  of  the  Exis@ng  Systems  

    5/23/11  

    •  Three-way RTE: Entailment, Contradiction, Unknown 0.685   0.683  

    0.637  

    0.614  

    0.5  

    0.52  

    0.54  

    0.56  

    0.58  

    0.6  

    0.62  

    0.64  

    0.66  

    0.68  

    0.7  

    RTE-‐3   RTE-‐4  

    1st  

    2nd  

    3rd  

    4th  

    5th  

    Wang  

    Oslo,  Norway   37  

  • The  3-‐Dimensional  Model  

    5/23/11   Oslo,  Norway   38  

  • The  3-‐Dimensional  Model  (cont.)  •  Two-‐stage  classificaXon  

    •  Three  measurements  –  Relatedness  –  Consistency  –  Equivalence  

    Relatedness   Consistency   Equivalence  

    Paraphrase  (P)   +   +   +  

    Entailment  (E)   +   +   -‐  

    Contradic@on  (C)   +   -‐   -‐  

    Unknown  (U)   -‐   +   -‐  

    5/23/11   Oslo,  Norway   39  

  • Corpora  •  Overview  

    – The  RTE  corpora  – The  PETE  corpus  – The  MSR  corpus  

    •  Two  issues  – The  annotaXon  scheme  – Data  sources  

    5/23/11   Oslo,  Norway   40  

  • The  TSR  Corpus  •  Adjacent  sentence  pairs  from  the  RST  Discourse  Treebank  

    •  Six  categories:  backward  entailment,  forward  entailment,  equality,  contradicKon,  overlapping,  and  independent  

    5/23/11   Oslo,  Norway   41  

  • The  AMT  Corpus  •  Crowd-‐sourcing  with  Amazon  Mechanical  Turk  •  Facts  vs.  Counter-‐Facts  

    5/23/11   Oslo,  Norway   42  

  • Datasets  Corpora   Paraphrase  (P)   Entailment  (E)   Contradic@on  (C)   Unknown  (U)  

    AMT  (584)   /   Facts  (406)  Counter-‐facts  

    (178)  /  

    MSR  (5841)  Paraphrase  (3940)  

    Non-‐Paraphrase  (1901)  

    PETE  (367)   /   YES  (194)   NO  (173)  

    RTE  (2200)   ENTAILMENT  (1100)  CONTRADICTION  

    (330)  UNKNOWN  

    (770)  

    TSR  (260)   Equality  (3)  

    Forward/Backward  Entailment  (10/27)  

    ContradicXon  (17)  

    Overlapping  &  Independent  

    (203)  

    Total  (9252)   3943   637   525   973  

    5/23/11   Oslo,  Norway   43  

  • Results  (RTE)  

    Systems  4-‐Way   3-‐Way   2-‐Way  

    (C,  E,  P,  U)   (C,  E&P,  U)   (E&P,  C&U)  

    Direct  BoW   39.3%   54.5%   63.2%  

    Direct  Joint   42.3%   50.9%   66.8%  

    Only  Relatedness  (Our  Prev.)   /   59.1%   69.2%  

    3-‐D  Model   45.9%   58.2%   69.9%  

    MacCartney  and  Manning  (2007)*   /   /   59.4%  

    Heilman  and  Smith  (2010)*   /   /   62.8%  

    5/23/11  

    *Different test data

    Oslo,  Norway   44  

  • Results  (Paraphrase)  

    P  vs.  C&E&U   Accuracy   Precision   Recall  

    3-‐D  Model   79.6%   57.2%   72.8%  

    Das  and  Smith  (2009)  (QG)*   73.9%   74.9%   91.3%  

    Das  and  Smith  (2009)  (PoE)*   76.1%   79.6%   86.0%  

    Heilman  and  Smith  (2010)*   73.2%   75.7%   87.8%  

    5/23/11  

    *Different test data

    Oslo,  Norway   45  

  • Results  (cont.)  

    5/23/11   Oslo,  Norway   46  

  • Summary  and  Perspec@ves  

  • Intrinsic  Approaches  •  Event  tuple  

    – Event  Xme  pair  – Extended  to  other  slots  (NEs)  

    •  (Textual)  inference  rules  – DIRT  (with  tree  skeleton)  – More  general  paraphrase  resources  (ongoing)  

    5/23/11   Oslo,  Norway   48  

  • Extrinsic  Approaches  •  Textual  semanXc  relaXons  

    – Relatedness  recogniXon  – Extended  with  two  other  dimensions  

    •  Specialized  TSR  modules  – Split  the  data  (as  for  RTE)  – SystemaXc  feature  engineering  

    5/23/11   Oslo,  Norway   49  

  • Applica@ons  of  RTE  •  Answer  validaXon  (Penãs  et  al.,  2007;  Rodrigo  et  al.,  2008)  –  The  best  results  for  both  English  and  German  (Wang  and  Neumann,  2008a)  

    •  RelaXon  validaXon  (Wang  and  Neumann,  2008b)  

    •  Parser  evaluaXon  (Yuret  et  al.,  2010)  –  The  3rd  place  (Wang  and  Zhang,  2010)  

    5/23/11   Oslo,  Norway   50  

  • Acknowledgements  •  Chris  Callison-‐Burch  •  Georgiana  Dinu  •  Guenter  Neumann  •  Caroline  Sporleder  •  Hans  Uszkoreit  •  Yajing  Zhang  •  Yi  Zhang  

    5/23/11   Oslo,  Norway   51  

  • Thank  YOU!  

    QuesXons?