CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# •...

43
CS 7180: Behavioral Modeling and Decisionmaking in AI Probability Theory Review Prof. Amy Sliva October 3, 2012

Transcript of CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# •...

Page 1: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

CS  7180:  Behavioral  Modeling  and  Decision-­‐making  in  AI  Probability  Theory  Review    Prof.  Amy  Sliva  October  3,  2012  

Page 2: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Decision-­‐making  under  uncertainty  •  So  far  we  have  assumed  perfect,  complete,  and  reliable  information  in  reasoning  about  behavior  •  Derive  previously  unknown  facts/states  from  the  current,  known  ones  

•  In  many  (most?)  domains  this  is  not  the  case…  •  Not  always  possible  to  have  access  to  the  entire  set  of  facts  for  reasoning  •  Agent  behavioral  data  is  noisy,  incomplete,  and  inconsistent  •  Complexity  of  domains  prevent  complete  representation  

•  Don’t  know  what  we  don’t  know…    

•  Actions,  effects,  and  the  state  of  the  world  are  all  uncertain  •  Yet  a  decision  must  still  be  made!  

Page 3: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Example:  Uncertainty  in  modeling  security  

•  Suppose  we  are  trying  to  model  the  behaviors  of  groups  involved  in  a  civil  conQlict  to  determine  a  conQlict  management  strategy    

 Action    ArmedAttack(g,  t)  =  Group  g  will  engage  in  an  armed    attack  by  time  t    

 

• Will  group  g1  attack  by  a  particular  time  t?    

Page 4: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Example:  Uncertainty  in  modeling  security  •  Problems:  

•  Partial  observability—group’s  resources,  other  agent’s  plans  (parties  in  the  conQlict,  external  states,  international  organizations),  etc.  

•  Noisy  sensors—media  or  intelligence  reports  •  Uncertainty  in  action  outcomes—casualties,  responses  other  group,  etc.  •  Immense  complexity  of  modeling  and  predicting  human  behavior  

 

• A  purely  logical  approach  either…  •  Risks  falsehood:      ArmedAttack(g1,  10)    Group  g1  will  attack  at  time  10    

•  Leads  to  conclusions  that  are  too  weak  for  decision  making:    “ArmedAttack(g1,  10)  will  occur  by  time  10  if  g1  has  a  consistent  inQlow  of  resources  and  g2  does  not  receive  external  state  support  and  the  attack  is  successful  and  etc.,  etc…”  

Page 5: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Several  sources  of  uncertainty  in  AI  •  Information  is  partial  

•  Information  is  not  fully  reliable  • Representation  language  is  inherently  imprecise  

•  Information  comes  from  multiple  sources  and  it  is  con;licting  

•  Information  is  approximate  

• Non-­‐absolute  cause-­‐effect  relationships  exist  (nondeterminism)  

Page 6: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Several  sources  of  uncertainty  in  AI  •  Information  is  partial  

•  Information  is  not  fully  reliable  • Representation  language  is  inherently  imprecise  

•  Information  comes  from  multiple  sources  and  it  is  con;licting  

•  Information  is  approximate  

• Non-­‐absolute  cause-­‐effect  relationships  exist  (nondeterminism)  

Sources  of  Uncertainty  1.  Ignorance  2.  Laziness  (efQiciency)  

What  we  call  uncertainty  is  a  summary    of  all  information  that  is  not  explicitly  taken  into  account  in  our  model  or  knowledge  base.  

   

Page 7: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Managing  uncertainty  in  AI  

How  to  represent  uncertainty  in  knowledge?  

How  to  perform  inferences  with  uncertain  knowledge?  

Which  action  to  choose  under  uncertainty?  

Page 8: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Methods  for  uncertain  reasoning  • Default  or  nonmonotonic  reasoning  

•  Assume  the  normal  case  unless  or  until  it  is  contradicted  by  evidence    If  I  believe  Tweety  is  a  bird,  then  I  think  he  can  Qly  If  I  learn  Tweety  is  a  penguin,  then  I  think  he  can’t  Qly  

• Worst-­‐case  reasoning  •  Assume  and  plan  for  the  worst  (i.e.,  adversarial  search  against  optimal  opponent)    

Page 9: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

More  methods  for  uncertain  reasoning  •  Evidential  reasoning—how  strongly  do  I  believe  P  based  on  evidence?  (con;idence  levels)  •  Quantitative  [0,  1],  [-­‐1,  1],  95%  conQidence  interval  •  Qualitative  {deQinite;  very  likely,  likely,  neutral,  unlikely,  very  unlikely,  deQinitely  not}  

 

 •  Fuzzy  concepts—measure  degree  of  “truth”  not  uncertainty  

•  Unemployment  is  high  •  The  next  season  of  Mad  Men  will  start  “soon”  •  Add  degree  to  fuzzy  assertions  between  0  and  1  

• We  will  mainly  focus  on  probabilistic  reasoning  models  

Page 10: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Musings  on  probability…  “When  it  is  not  in  our  power  to  determine  what  is  true,  we  ought  to  follow  what  is  most  probable.”  

       —Rene  Descartes      

 “The  idea  was  fantastically,  wildly  improbable.  But  like  most  fantastically,  wildly  improbable  ideas  it  was  at  least  as  worthy  of  consideration  as  a  more  mundane  one  to  which  the  facts  had  been  strenuously  bent  to  Dit.”  

       —Douglas  Adams  

Page 11: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Probability  Theory    • World  is  not  necessarily  divided  between  “normal”  or  “abnormal,”  nor  is  it  adversarial  

•  Possible  situations  have  associated  likelihoods  

•  Probability  theory  enables  us  to  make  rational  decisions  •  Will  an  armed  attack  happen  at  time  t?  •  What  is  the  probability  of  an  attack  in  a  given  situation?  

• Use  probabilities  to  represent  the  structure  of  our  knowledge  and  for  reasoning  over  that  knowledge  

Page 12: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Syntax  for  probabilisJc  reasoning  •  Basic  element:  random  variable  

•  Similar  to  propositional  logic—possible  worlds  (sample  space)  deQined  by  assignment  of  values  to  random  variables  

•  Boolean  random  variables  •  E.g.,  Attack  (Is  an  attack  occurring?)  

•  Discrete  random  variables  •  E.g.,  Direction  is  one  of  <north,  south,  east,  west>  

•  Elementary  proposition  constructed  by  assignment  of  a  value  to  a  single  random  variable  •  E.g.,  Direction  =  west,  Attack  =  false  (abbreviated  ¬attack)  

•  Complex  propositions  formed  from  elementary  propositions  and  standard  logical  connectives  •  E.g.,  Direction  =  west  ∨  Attack  =  false  

Page 13: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

• Atomic  event—A  complete  speciQication  of  the  state  of  the  world  about  which  the  agent  is  uncertain  •  E.g.,  If  the  world  consists  of  two  boolean  variables  Attack  and  Propaganda,  then  there  are  four  distinct  atomic  events:    

   Attack  =  false  ∧  Propaganda  =  false      Attack  =  false  ∧  Propaganda  =  true      Attack  =  true  ∧  Propaganda  =  false      Attack  =  true  ∧  Propaganda  =  true  

 • Atomic  events  are  mutually  exclusive  and  exhaustive  (often  called  “outcomes”)  

•  Events  in  general  are  sets  of  atomic  events,  such  as    Attack  =  true  

Syntax  for  probability  (cont.)  

Page 14: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Syntax  for  probability  (cont.)  • Atomic  event—A  complete  speciQication  of  the  state  of  the  world  about  which  the  agent  is  uncertain  •  E.g.,  If  the  world  consists  of  two  boolean  variables  Attack  and  Propaganda,  then  there  are  four  distinct  atomic  events:    

   Attack  =  false  ∧  Propaganda  =  false      Attack  =  false  ∧  Propaganda  =  true      Attack  =  true  ∧  Propaganda  =  false      Attack  =  true  ∧  Propaganda  =  true  

 • Atomic  events  are  mutually  exclusive  and  exhaustive  (often  called  “outcomes”)  

•  Events  in  general  are  sets  of  atomic  events,  such  as    Attack  =  true  

Page 15: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Axioms  of  probability  theory  •  Basic  notation  for  probability:    P(A)  is  the  probability  of  proposition  A  being  true  in  the  KB  

             OR  P(A)  is  the  probability  that  event  A  occurs  in  the  world    •  For  any  propositions  A,  B  

•  0  ≤  P(A)  ≤  1  •  P(true)  =  1  and  P(false)  =  0  •  P(A  ∨  B)  =  P(A)  +  P(B)  -­‐  P(A  ∧  B)  (Inclusion-­‐exclusion  principle)  

 

A  ∧  B  A     B    

Page 16: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

MulJvalued  Random  Variables  •  Suppose  A  can  take  on  more  than  2  values  •  A  is  a  random  variable  with  arity  k  if  it  can  take  on  a  value  out  of  the  domain  {v1,v2,…,vk}  

•  Then…    

 P(A  =  vi  ∧  A  =  vj)  =  0  if  i  ≠  j        P(A  =  v1  ∨  A  =  v2  ∨  …  ∨  A  =  vk)  =  1  

•  Sum  of  probability  over  all  possible  values  must  equal  1  

Σ  P(A  =  vj)  =  1  j  =  1  

k  

Page 17: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Set-­‐theoreJc  interpretaJon  of  probability  

• Remember  the  possible  worlds  interpretation  from  logic  •  A  model  (world)  is  a  setting  of  true  or  false  to  every  proposition  •  All  possible  worlds  are  all  combinations  of  true  and  false  

•  Suppose  all  possible  worlds  can  be  represented  by  the  following  diagram:  

W  =  set  of  all  possible  worlds  

Page 18: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Set-­‐theoreJc  interpretaJon  of  probability  

• Remember  the  possible  worlds  interpretation  from  logic  •  A  model  (world)  is  a  setting  of  true  or  false  to  every  proposition  •  All  possible  worlds  are  all  combinations  of  true  and  false  

•  The  probability  of  A  being  true  is  the  proportion  of    |WA|  to  |W|  :  P(A)  =  10  /  32  =  0.31  

W  =  set  of  all  possible  worlds  

WA  =  set  of  worlds  where  A  is  true  

Page 19: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Inclusion-­‐exclusion  axiom  and  possible  worlds  

•  Inclusion-­‐exclusion  principle  P(A  ∨  B)  =  P(A)  +  P(B)  -­‐  P(A  ∧  B)  •  i.e.,  P(WA  U  WB)    

•  The  probability  of  WA  U  WB    is  the  proportion  of    |WA  U  WB|  to  |W|  :  P(WA  U  WB)  =  14  /  32  =  0.44  

W  =  set  of  all  possible  worlds  

WA  =  set  of  worlds  where  A  is  true  

WB  =  set  of  worlds  where  B  is  true  

Page 20: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Inclusion-­‐exclusion  axiom  and  possible  worlds  

•  Inclusion-­‐exclusion  principle  P(A  ∨  B)  =  P(A)  +  P(B)  -­‐  P(A  ∧  B)  •  i.e.,  P(WA  U  WB)    

•  If  A  and  B  are  mutually-­‐exclusive  events  i.e.,  WA          WB    =  ∅    then  P(A  ∨  B)  =  P(A)  +  P(B)  =  16  /  32  =  0.5  

W  =  set  of  all  possible  worlds  

WA  =  set  of  worlds  where  A  is  true  

WB  =  set  of  worlds  where  B  is  true  

U

Page 21: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Joint  probability  distribuJons  •  Prior  or  unconditional  probability—value  prior  to  any  (new)  evidence  

•  E.g.,  P(Attack  =  true)  =  0.1  and  P(Direction  =  north)  =  0.72  

•  Probability  distribution  gives  probabilities  for  each  possible  value  •  E.g.,  P(Direction)  =  <0.72,  0.1,  0.08,  0.1>  (sums  to  1)    

•  Joint  probability  distribution  for  a  set  of  random  variables  gives  probability  for  each  combination  of  values  (i.e.,  every  atomic  event)  for  those  random  variables  •  E.g.,  P(Direction,  Attack)  =  a  4  ×2  matrix  of  values:              •  Sum  of  joint  probabilities  for  each  case  (table)  must  equal  1    

•  Every  question  can  be  answered  by  the  joint  distribution!  

Direction  =     north   south   east   west    

Attack  =  true   0.144   0.02   0.016   0.02  

Attack  =  false   0.576   0.08   0.064   0.08  

Page 22: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Inference  using  joint  probabiliJes  •  Start  with  the  joint  probability  distribution  

•  For  any  proposition  φ,  sum  the  atomic  events  where:    P(φ)  =  Σω:ω╞φ  P(ω)    (i.e.,  sum  over  all  events  where  φ  is  true)  

propaganda   ¬propaganda  election   ¬election   election   ¬election  

attack   0.108   0.012   0.072   0.008  ¬attack   0.016   0.064   0.144   0.576  

Page 23: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

•  Start  with  the  joint  probability  distribution  

•  For  any  proposition  φ,  sum  the  atomic  events  where:    P(φ)  =  Σω:ω╞φ  P(ω)    P(propaganda)  =  0.108  +  0.012  +  0.016  +  0.064  =  0.2  

propaganda   ¬propaganda  election   ¬election   election   ¬election  

attack   0.108   0.012   0.072   0.008  ¬attack   0.016   0.064   0.144   0.576  

Inference  using  joint  probabiliJes  

Page 24: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

The  good  and  the  bad  with  JPDs  • Good  news      Once  you  have  a  joint  distribution,  you  can  ask  important  questions  about  stuff  that  involves  a  lot  of  uncertainty!  

• Bad  news    Impossible  to  create  for  more  than  about  10  variables  because  there  are  so  many  numbers  needed  when  you  build  the  thing!  

Page 25: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

CondiJonal  probability  •  Conditional  or  posterior  probabilities—based  on  known  information  •  Eg.,  P(attack  |  propaganda)  =  0.8  

 Given  that  propaganda  is  known  with  certainty  

•  Formal  deQinition  of  conditional  probability:  P(a  |  b)  =  P(a  ∧  b)  if  P(b)  >  0  

                       P(b)    

•  That  is,  the  proportion  of      |WA          WB|    to  |WB|    P(A  ∧  B)  /  P(B)  =  2  /  6  =  0.33  

W  =  set  of  all  possible  worlds  

WA  =  set  of  worlds  where  A  is  true  

WB  =  set  of  worlds  where  B  is  true  U

Page 26: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

propaganda   ¬propaganda  election   ¬election   election   ¬election  

attack   0.108   0.012   0.072   0.008  ¬attack   0.016   0.064   0.144   0.576  

ComputaJon  of  condiJonal  probabiliJes  •  Start  with  the  joint  probability  distribution  

•  Can  also  compute  conditional  probabilities:    P(¬attack  |  propaganda)  =  P(¬attack  ∧  propaganda)    

               P(propaganda)    

                 =                            0.016  +  0.064                              0.108  +  0.012  +  0.016  +  0.064    

                   =  0.2  

Page 27: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Independence  allows  simplificaJon    •  A  and  B  are  independent  iff  P(A  |  B)  =  P(A)  or  P(B|A)  =  P(B)  or  P(A,B)  =  P(A)P(B)  •  E.g.,  P(Propaganda,  Election,  Attack,  Direction)=    

 P(Propaganda,  Election,  Attack)P(Direction)    

•  If  events  in  conditional  or  joint  distribution  are  independent,  we  can  decompose  into  smaller  distributions    

• We  know  that  A  and  S    are  independent    P(A)  =  0.6  P(S)  =  0.3    P(  S  |  A)  =  P(S)  

• We  can  derive  the  full  JPD  (assuming    independence)  

•  Since  we  have  the  JPD,  can  make  any  query!  

 

A   S   Probability  T   T   0.18  T   F   0.42  F   T   0.12  F   F   0.28  

Page 28: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Absolute  independence  is  rare  • Absolute  independence  powerful,  but  rare  

•  Behavioral  models  (in  economics,  gaming,  security,  etc.)  require  hundreds  of  variables,  none  of  which  are  independent  •  In  fact,  interdependencies  are  often  the  interesting  parts  of  our  data!  

• What  to  do??  

Page 29: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Bayesian  probability  theory  (1763  -­‐  now)  •  The  basis  of  Bayesian  Theory  is  conditional  probabilities  

•  Bayesian  Theory  sees  a  conditional  probability  as  a  way  to  describe  the  structure  and  organization  of  knowledge  

•  In  this  view,  A  |  B  indicates  the  event  A  in  the  “context”  of  event  B  •  E.g.,  the  symptom  A  in  the  context  of  disease  B                    the  action  A  in  the  state  of  the  world  B    

Page 30: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Adding  new  evidence  to  the  current  context  •  Additional  evidence  may  change  the  environment,  and  hence  the  conditional  probability  

•  P(attack  |  propaganda)  =  0.8    If  we  know  more,  e.g.,  attack  is  also  given,  then  we  have  P(attack|  propaganda,  attack)  =  1      •  Evidence  may  be  irrelevant  (independent)  allowing  simpliQication  

•  E.g.,  attack  does  not  depend  on  direction:  P(attack  |  propaganda,  north)  =  P(attack  |  propaganda)  =  0.8  

 •  This  kind  of  inference,  sanctioned  by  domain  knowledge,  is  crucial  for  probabilistic  reasoning!  

Page 31: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

The  chain  rule  •  The  probability  of  a  joint  event  (X1,…,Xn)  can  be  computed  using  the  conditional  probabilities  

•  Product  rule  P(a  ∧  b)  =  P(a  |  b)P(b)  =  P(b  |  a)P(a)    •  Chain  rule  derived  by  successive  application  of  product  rule                      P(X1,…,Xn)    =  P(X1,…,Xn-­‐1)P(Xn  |  X1,…,Xn-­‐1)  

   =  P(X1,…,Xn-­‐2)P(Xn-­‐1  |  X1,…,Xn-­‐2)  P(Xn  |  X1,…,Xn-­‐1)      =  …      =  P(X1)P(X2  |  X1)P(X3  |  X1,  X2)  …  P(Xn  |  X1,…,Xn-­‐1)  

         OR  Πi  =  1  to  n  P(Xi  |  X1,…,Xi-­‐1)  

Page 32: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

EvidenJal  reasoning  with  condiJonal  probs  • Reasoning  about  “hypotheses”  and  “evidences”  that  do  or  do  not  support  the  hypotheses—Bayesian  inference    P(H  |  e):  given  that  I  know  about  evidence  e,  the  probability  that  my  hypothesis  H  is  true      P(H  |  e)  =  P(H  ∧  e)    

                       P(e)  

• Might  also  have  some  extra  or  hidden  context  variables  Y    P(H  |  e)  =  Σy  P(H  ∧  e  ∧  Y  =  y)    

                                   P(e  ∧  Y  =  y)  

Page 33: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Bayesian  reasoning  in  medical  diagnosis  •  Causal  model:  Disease    Condition    Symptom  (H    Y    E)      

P(H  =  cancer  |  E  =  fatigue)  =      α  [  P(H  =  cancer  ∧  E  =  fatigue  ∧  anemia)  +    

                                     P(H  =  cancer  ∧  E  =  fatigue  ∧  ¬anemia)  ]    

α  =  1/P(E  =  fatigue)  or  1/[P(E  =  fatigue  ∧  anemia)  +                                                                                                P(E  =  fatigue  ∧  ¬anemia)  ]    

Cancer   Anemia   Fatigue  

Kidney  Disease   Anemia   Fatigue  

Page 34: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

…but  where  do  we  find  the  numbers?  • Assuming  independence,  doctors  may  be  able  to  estimate  P(symptom  |  disease)  for  each  S/D  pair  (causal  reasoning)  

• Hard  to  estimate  what  we  really  need  to  know:  P(disease  |  symptom)  

•  This  is  why  Bayes  rule  is  so  important  in  probabilistic  AI!  

Page 35: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Bayes  Rule  •  Product  rule  P(a  ∧  b)  =  P(a  |  b)P(b)  =  P(b  |  a)P(a)    ⇒  Bayes  rule:  P(a  |  b)  =  P(b  |  a)P(a)  

               P(b)  

• Useful  for  assessing  diagnostic  probability  from  causal  probability:  •  P(Cause  |  Effect)  =  P(Effect  |  Cause)  P(Cause)  

       P(Effect)  

•  E.g.,  Let  M  be  meningitis,  S  be  stiff  neck    P(M  |  S)  =  P(S  |  M)  P(M)  /  P(S)  =  0.8  ×  0.0001  /  0.1  =  0.0008  

•  Note:  posterior  probability  of  meningitis  is  still  very  small!  

Page 36: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

More  general  forms  of  Bayes  rule  •  P(A  |  B)  =                            P(B  |  A)P(A)  

                   P(B  |  A)P(A)  +  P(B  |  ¬A)P(¬A)  

•  P(A  |  B)  =  P(B  |  A,  e)P(A  |  e)              P(B  |  e)  

•  P(A  =  vi  |  B)  =      P(B  |  A  =  vi)P(A  =  vi)            Σ  P(B  |  A  =  vk)P(A  =  vk)  k = 1

n

Page 37: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

CondiJonal  independence  •  If  an  attack  occurs,  the  probability  of  casualties  does  not  depend  on  whether  or  not  there  is  a  propaganda  campaign:  

•  P(casualties|  propaganda,  attack)  =  P(casualties|  attack)  

•  The  same  independence  holds  if  I  haven’t  got  a  Attack  •  P(casualties|  propaganda,  ¬attack)  =  P(casualties|  ¬attack)  

 

•  Casualties  is  conditionally  independent  of  propaganda  given  attack:      

•  P(Casualties|  Propaganda,  Attack)  =  P(Casualties|  Attack)  P(Propaganda|  Casualties,  Attack)  =  P(Propaganda|  Attack)  P(Propaganda,  Casualties|  Attack)  =  P(Propaganda|Attack)P(Casualties|  Attack)  

Page 38: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

CondiJonal  independence  reduces  size  •  P(Propaganda,  Attack,  Casualties)  has  23  -­‐  1  =  7  JPD  entries  • Write  out  full  joint  distribution  using  chain  rule:    P(Propaganda,  Casualties,  Attack)  

 =  P(Propaganda  |  Casualties,  Attack)P(Casualties,  Attack)    =  P(Propaganda  |  Casualties,  Attack)P(Casualties  |  Attack)P(Attack)    =  P(Propaganda  |  Attack)P(Casualties  |  Attack)P(Attack)  

 i.e.,  2  +  2+  1  =  5  independent  numbers  

•  In  most  cases,  conditional  independence  reduces  size  of  representation  from  exponential  in  n  to  linear  in  n  

•  Conditional  independence  is  most  basic  and  robust  form  of  knowledge  about  uncertain  environments  

Page 39: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Bayes  rule  and  condiJonal  independence  •  P(Attack  |  Propaganda,  Casualties)    =  P(Propaganda,  Casualties  |  Attack)P(Attack)  =  P(Propaganda  |  Attack)P(Casualties  |  Attack)P(Attack)  

•  We  say:  “Propaganda  and  Casualties  are  independent,  given  Attack”  •  Attack  separates    Propaganda  and  Casualties  because  it  is  a  direct  cause  of  both  •  Example  of  a  naïve  Bayes  model  

•   P(Cause,Effect1,…,Effectn)  =  P(Cause)ΠP(Effecti  |  Cause)  

   

•  Total  number  of  parameters  is  linear  in  n  (number  of  effects)  •  This  is  our  Qirst  Bayesian  inference  net!  More  on  Friday…  

i  

Attack  

Propaganda   Casualties  

Cause  

Effect1   Effectn  

Page 40: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

•  P(Attack  |  Propaganda,  Casualties)    =  P(Propaganda,  Casualties  |  Attack)P(Attack)  =  P(Propaganda  |  Attack)P(Casualties  |  Attack)P(Attack)  

•  We  say:  “Propaganda  and  Casualties  are  independent,  given  Attack”  •  Attack  separates    Propaganda  and  Casualties  because  it  is  a  direct  cause  of  both  •  Example  of  a  naïve  Bayes  model  

•   P(Cause,Effect1,…,Effectn)  =  P(Cause)ΠP(Effecti  |  Cause)  

   

•  Total  number  of  parameters  is  linear  in  n  (number  of  effects)  •  This  is  our  Qirst  Bayesian  inference  net!  More  on  Friday…  

Bayes  rule  and  condiJonal  independence  

i  

Attack  

Propaganda   Casualties  

Cause  

Effect1   Effectn  

Page 41: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Example  of  condiJonal  independence  •  R:  There  is  rioting  in  Nigeria  •  H:  Supply  of  gasoline  in  U.S.  reduced  by  hurricane  •  G:  U.S.  gas  prices  increase  

•  Assume  gas  prices  are  sometimes    responsive  to  global  events  (e.g.,  riots  in  oil-­‐producing  countries  like  Nigeria)  

•  Start  with  knowledge  we  are    conQident  about:    P(H  |  R)  =  P(H),  P(H)  =  0.3,  P(R)  =  0.6    

•  Gas  prices  are  not  independent  of  the  weather  and  are  not  independent  of  the  political  situation  in  Nigeria  

•  Hurricanes  in  the  U.S.  and  rioting  in  Nigeria  are  independent  

 

Page 42: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Reduce  complexity  with  independence  •  R:  There  is  rioting  in  Nigeria  •  H:  Supply  of  gasoline  in  U.S.  reduced  by  hurricane  •  G:  U.S.  gas  prices  increase  

•  Assume  gas  prices  are  sometimes    responsive  to  global  events  (e.g.,  riots  in  oil-­‐producing  countries  like  Nigeria)  

•  Start  with  knowledge  we  are    conQident  about:    P(H  |  R)  =  P(H),  P(H)  =  0.3,  P(R)  =  0.6    

•  Know  the  joint  probability  of  H  and  R,  so  now  need:  •  P(G  |  H,  R)  for  the  4  cases  where  H  and  R  are  true/false  

 

Page 43: CS7180:BehavioralModeling# andDecisionmakinginAI · Example:Uncertaintyinmodelingsecurity# • Problems:* • Partialobservability—group’s*resources,*other*agent’s*plans*(parties*in*

Reduce  complexity  with  independence  •  R:  There  is  rioting  in  Nigeria  •  H:  Supply  of  gasoline  reduced  by  hurricane  •  G:  U.S.  gas  prices  increase  

• Assume  gas  prices  are  sometimes  responsive  to  global  events  (e.g.,  riots  in  oil-­‐producing  countries  like  Nigeria)  

 •  Can  derive  a  full  JPD  with  “mere”  6  numbers  instead  of  7  

•  NOTE:  Savings  are  larger  for  larger  numbers  of  variables/values  

•  Same  expressive  and  inference  power  as  JPD    

P(H  |  R)  =  P(H)  P(H)  =  0.3  P(R)  =  0.6  

P(G  |  R  ∧  H)  =  0.05  P(G  |  R  ∧¬  H)  =  0.1  P(G  |  ¬R  ∧  H)  =  0.1  P(G  |  ¬R  ∧  ¬H)  =  0.2