Reasoning Under Uncertainty

44
Reasoning Under Uncertainty Artificial Intelligence CSPP 56553 February 18, 2004

description

Artificial Intelligence CSPP 56553 February 18, 2004. Reasoning Under Uncertainty. Agenda. Motivation Reasoning with uncertainty Medical Informatics Probability and Bayes’ Rule Bayesian Networks Noisy-Or Decision Trees and Rationality Conclusions. Uncertainty. - PowerPoint PPT Presentation

Transcript of Reasoning Under Uncertainty

Page 1: Reasoning Under Uncertainty

Reasoning Under Uncertainty

Artificial Intelligence

CSPP 56553

February 18, 2004

Page 2: Reasoning Under Uncertainty

Agenda

• Motivation– Reasoning with uncertainty

• Medical Informatics

• Probability and Bayes’ Rule– Bayesian Networks– Noisy-Or

• Decision Trees and Rationality• Conclusions

Page 3: Reasoning Under Uncertainty

Uncertainty

• Search and Planning Agents– Assume fully observable, deterministic, static

• Real World: – Probabilities capture “Ignorance & Laziness”

• Lack relevant facts, conditions

• Failure to enumerate all conditions, exceptions

– Partially observable, stochastic, extremely complex

– Can't be sure of success, agent will maximize

– Bayesian (subjective) probabilities relate to knowledge

Page 4: Reasoning Under Uncertainty

Motivation

• Uncertainty in medical diagnosis– Diseases produce symptoms– In diagnosis, observed symptoms => disease ID– Uncertainties

• Symptoms may not occur• Symptoms may not be reported• Diagnostic tests not perfect

– False positive, false negative

• How do we estimate confidence?

Page 5: Reasoning Under Uncertainty

Motivation II

• Uncertainty in medical decision-making– Physicians, patients must decide on treatments– Treatments may not be successful– Treatments may have unpleasant side effects

• Choosing treatments– Weigh risks of adverse outcomes

• People are BAD at reasoning intuitively about probabilities– Provide systematic analysis

Page 6: Reasoning Under Uncertainty

Probability Basics

• The sample space:– A set Ω ={ω1, ω2, ω3,… ωn}

• E.g 6 possible rolls of die; • ωi is a sample point/atomic event

• Probability space/model is a sample space with an assignment P(ω) for every ω in Ω s.t. 0<= P(ω)<=1; Σ ωP(ω) = 1– E.g. P(die roll < 4)=1/6+1/6+1/6=1/2

Page 7: Reasoning Under Uncertainty

Random Variables

• A random variable is a function from sample points to a range (e.g. reals, bools)

• E.g. Odd(1) = true

• P induces a probability distribution for any r.v X:– P(X=xi) = Σ{ω:X(ω)=xi}P(ω)

– E.g. P(Odd=true)=1/6+1/6+1/6=1/2

• Proposition is event (set of sample pts) s.t. proposition is true: e.g. event a= A(ω)=true

Page 8: Reasoning Under Uncertainty

Why probabilities?

• Definitions imply that logically related events have related probabilities

• In AI applications, sample points are defined by set of random variables– Random vars: boolean, discrete, continuous

Page 9: Reasoning Under Uncertainty

Prior Probabilities

• Prior probabilities: belief prior to evidence– E.g. P(cavity=t)=0.2; P(weather=sunny)=0.6

– Distribution gives values for all assignments

• Joint distribution on set of r.v.s gives probability on every atomic event of r.v.s– E.g. P(weather,cavity)=4x2 matrix of values

• Every question about a domain can be answered with joint b/c every event is a sum of sample pts

Page 10: Reasoning Under Uncertainty

Conditional Probabilities

• Conditional (posterior) probabilities– E.g. P(cavity|toothache) = 0.8, given only that– P(cavity|toothache)=2 elt vector of 2 elt vectors

• Can add new evidence, possibly irrelevant

• P(a|b) = P(a,b)/P(b) where P(b) ≠0

• Also, P(a,b)=P(a|b)P(b)=P(b|a)P(a)– Product rule generalizes to chaining

Page 11: Reasoning Under Uncertainty

Inference By Enumeration

Page 12: Reasoning Under Uncertainty

Inference by Enumeration

Page 13: Reasoning Under Uncertainty

Inference by Enumeration

Page 14: Reasoning Under Uncertainty

Independence

Page 15: Reasoning Under Uncertainty

Conditional Independence

Page 16: Reasoning Under Uncertainty

Conditional Independence II

Page 17: Reasoning Under Uncertainty

Probabilities Model Uncertainty

• The World - Features– Random variables– Feature values

• States of the world– Assignments of values to variables

– Exponential in # of variables– possible states

},...,,{ 21 nXXX}...,,{ ,21 iikii xxx

n

iik

1

nik 2;2

Page 18: Reasoning Under Uncertainty

Probabilities of World States

• : Joint probability of assignments– States are distinct and exhaustive

• Typically care about SUBSET of assignments– aka “Circumstance”

– Exponential in # of don’t cares

}),,,({),( 43},{ },{

2142 fXvXtXuXPfXtXPftu ftv

)( iSP

)(1

1

n

i ik

jjSP

Page 19: Reasoning Under Uncertainty

A Simpler World

• 2^n world states = Maximum entropy– Know nothing about the world

• Many variables independent– P(strep,ebola) = P(strep)P(ebola)

• Conditionally independent– Depend on same factors but not on each other– P(fever,cough|flu) = P(fever|flu)P(cough|flu)

Page 20: Reasoning Under Uncertainty

Probabilistic Diagnosis

• Question:– How likely is a patient to have a disease if they have

the symptoms?

• Probabilistic Model: Bayes’ Rule• P(D|S) = P(S|D)P(D)/P(S)

– Where• P(S|D) : Probability of symptom given disease• P(D): Prior probability of having disease• P(S): Prior probability of having symptom

Page 21: Reasoning Under Uncertainty

Diagnosis

• Consider Meningitis:– Disease: Meningitis: m– Symptom: Stiff neck: s– P(s|m) = 0.5– P(m) =0.0001– P(s) = 0.1– How likely is it that someone with a stiff neck

actually has meningitis?

Page 22: Reasoning Under Uncertainty

Modeling (In)dependence

• Simple, graphical notation for conditional independence; compact spec of joint

• Bayesian network– Nodes = Variables– Directed acyclic graph: link ~ directly influences– Arcs = Child depends on parent(s)

• No arcs = independent (0 incoming: only a priori)• Parents of X = • For each X need

)(X))(|( XXP

Page 23: Reasoning Under Uncertainty

Example I

Page 24: Reasoning Under Uncertainty

Simple Bayesian Network

• MCBN1

A

B C

D E

A = only a prioriB depends on AC depends on AD depends on B,CE depends on C

Need:P(A)P(B|A)P(C|A)P(D|B,C)P(E|C)

Truth table22*22*22*2*22*2

Page 25: Reasoning Under Uncertainty

Simplifying with Noisy-OR

• How many computations? – p = # parents; k = # values for variable– (k-1)k^p– Very expensive! 10 binary parents=2^10=1024

• Reduce computation by simplifying model– Treat each parent as possible independent cause– Only 11 computations

• 10 causal probabilities + “leak” probability– “Some other cause”

Page 26: Reasoning Under Uncertainty

Noisy-OR Example

A B

P(b|a) b b

a

a

0.6 0.4

0.5 0.5

2.05.0/4.01

)1/(4.0)1(

4.0)1)(1(

6.0)1)(1(1)|(

1)|(

5.0)1(1)|(

)1)(1()|(

)1)(1(1)|(

a

a

a

an

n

n

an

an

c

Lc

Lc

LcabP

LabP

LLabP

LcabP

LcabP

Page 27: Reasoning Under Uncertainty

Noisy-OR Example II

55.0

)1(665.03.0

)1(63.0)1(035.03.0

9.0)1)(1(1.0)1)(1)(1(7.01

)()|()()|()|(

3.0)1(1)|(

)1)(1(1)|(

)1)(1(1)|(

)1)(1)(1(1)|(

b

b

bb

bba

nnnnn

n

an

bn

ban

c

c

cc

LcLcc

aPbadPaPabdPbdP

LLbadP

LcbadP

LcbadP

LccabdP

A B

D

Full model: P(d|ab)P(d|ab)P(d|ab)P(d|ab) & neg

Assume:

P(a)=0.1

P(b)=0.05

P(d|ab)=0.3

= 0.5

P(d|b) = 0.7

ac

Page 28: Reasoning Under Uncertainty

Graph Models

• Bipartite graphs– E.g. medical reasoning– Generally, diseases cause symptom (not reverse)

d1

d2

d3

d4

s1

s2

s3

s4

s5

s6

Page 29: Reasoning Under Uncertainty

Topologies

• Generally more complex– Polytree: One path between any two nodes

• General Bayes Nets– Graphs with undirected cycles

• No directed cycles - can’t be own cause

• Issue: Automatic net acquisition– Update probabilities by observing data– Learn topology: use statistical evidence of indep,

heuristic search to find most probable structure

Page 30: Reasoning Under Uncertainty

Holmes Example (Pearl)

Holmes is worried that his house will be burgled. Forthe time period of interest, there is a 10^-4 a priori chanceof this happening, and Holmes has installed a burglar alarmto try to forestall this event. The alarm is 95% reliable insounding when a burglary happens, but also has a false positive rate of 1%. Holmes’ neighbor, Watson, is 90% sure to call Holmes at his office if the alarm sounds, but he is alsoa bit of a practical joker and, knowing Holmes’ concern, might (30%) call even if the alarm is silent. Holmes’ otherneighbor Mrs. Gibbons is a well-known lush and often befuddled, but Holmes believes that she is four times morelikely to call him if there is an alarm than not.

Page 31: Reasoning Under Uncertainty

Holmes Example: Model

There a four binary random variables:B: whether Holmes’ house has been burgledA: whether his alarm soundedW: whether Watson calledG: whether Gibbons called

B A

W

G

Page 32: Reasoning Under Uncertainty

Holmes Example: Tables

B = #t B=#f

0.0001 0.9999

A=#t A=#fB

#t#f

0.95 0.05 0.01 0.99

W=#t W=#fA

#t#f

0.90 0.10 0.30 0.70

G=#t G=#fA

#t#f

0.40 0.60 0.10 0.90

Page 33: Reasoning Under Uncertainty

Decision Making

• Design model of rational decision making– Maximize expected value among alternatives

• Uncertainty from– Outcomes of actions– Choices taken

• To maximize outcome– Select maximum over choices– Weighted average value of chance outcomes

Page 34: Reasoning Under Uncertainty

Gangrene Example

Medicine Amputate foot

Live 0.99 Die 0.01

850 0

Die 0.05 0

Full Recovery 0.7 1000

Worse 0.25

Medicine Amputate leg

Die 0.4 0

Live 0.6 995

Die 0.02 0

Live 0.98 700

Page 35: Reasoning Under Uncertainty

Decision Tree Issues

• Problem 1: Tree size– k activities : 2^k orders

• Solution 1: Hill-climbing– Choose best apparent choice after one step

• Use entropy reduction

• Problem 2: Utility values– Difficult to estimate, Sensitivity, Duration

• Change value depending on phrasing of question

• Solution 2c: Model effect of outcome over lifetime

Page 36: Reasoning Under Uncertainty

Conclusion

• Reasoning with uncertainty– Many real systems uncertain - e.g. medical

diagnosis

• Bayes’ Nets– Model (in)dependence relations in reasoning– Noisy-OR simplifies model/computation

• Assumes causes independent

• Decision Trees– Model rational decision making

• Maximize outcome: Max choice, average outcomes

Page 37: Reasoning Under Uncertainty

Bayesian Spam Filtering

• Automatic Text Categorization

• Probabilistic Classifier– Conditional Framework– Naïve Bayes Formulation

• Independence assumptions galore

– Feature Selection– Classification & Evaluation

Page 38: Reasoning Under Uncertainty

Spam Classification

• Text categorization problem– Given a message,M, is it Spam or NotSpam?

• Probabilistic framework– P(Spam|M)> P(NotSpam|M)

• P(Spam|M)=P(Spam,M)P(M)• P(NotSpam|M)=P(NotSpam,M)P(M)

– Which is more likely?

Page 39: Reasoning Under Uncertainty

Characterizing a Message

• Represent message M as set of features– Features: a1,a2,….an

• What features?– Words! (again)

– Alternatively (skip) n-gram sequences

• Stemmed (?)• Term frequencies: N(W, Spam); N(W,NotSpam)

– Also, N(Spam),N(NotSpam): # of words in each class

Page 40: Reasoning Under Uncertainty

Characterizing a Message II

• Estimating term conditional probabilities

• Selecting good features:– Exclude terms s.t.

• N(W|Spam)+N(W|NotSpam)<4

• 0.45 <=P(W|Spam)/P(W|Spam)+P(W|NotSpam)<=0.55

1)(

1),(

)|(

CNK

CWNCWP

Page 41: Reasoning Under Uncertainty

Naïve Bayes Formulation

• Naïve Bayes (aka “Idiot” Bayes)– Assumes all features independent

• Not accurate but useful simplification

• So,– P(M,Spam)=P(a1,a2,..,an,Spam)– = P(a1,a2,..,an|Spam)P(Spam)– =P(a1|Spam)..P(an|Spam)P(Spam)– Likewise for NotSpam

Page 42: Reasoning Under Uncertainty

Experimentation (Pantel & Lin)

• Training: 160 spam, 466 non-spam

• Test: 277 spam, 346 non-spam

• 230,449 training words; 60434 spam– 12228 terms; filtering reduces to 3848

Page 43: Reasoning Under Uncertainty

Results (PL)

• False positives: 1.16%

• False negatives: 8.3%

• Overall error: 4.33%

• Simple approach, effective

Page 44: Reasoning Under Uncertainty

Variants

• Features?

• Model?– Explicit bias to certain error types

• Address lists

• Explicit rules