COMP 2208 Dr. Long Tran-Thanh [email protected] University of Southampton Bayes’ Theorem,...

53
COMP 2208 Dr. Long Tran-Thanh [email protected] University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks

Transcript of COMP 2208 Dr. Long Tran-Thanh [email protected] University of Southampton Bayes’ Theorem,...

Page 1: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

COMP 2208

Dr. Long [email protected]

University of Southampton

Bayes’ Theorem, Bayesian Reasoning,

and Bayesian Networks

Page 2: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Classification

Environment

Perception

Behaviour

Categorize inputs Update

belief model

Update decision making policy

Decision making

Perception

Behaviour

Page 3: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Reasoning

Environment

Perception

Behaviour

Categorize inputs Update

belief model

Update decision making policy

Decision making

Perception

Behaviour

Page 4: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

ReasoningLogic /Rule based

• build up basic rules (axioms) using some form of logic

• Other rules (reasoning) can be derived from the above

• Functional or declarative programming (LISP, ML, Prolog, etc…)

Stochastic reasoning

• Frequentist (non-Bayesian)

• Bayesian

Some bridging efforts:

• E.g., Markov logic (see, e.g., Pedro Domingos)

Page 5: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

The right way to do reasoning?

Debate 1: logic based vs. stochastic

• E.g., Noam Chomsky vs. peter Norvig

Debate 2: frequentist vs. Bayesian

• Many vs. many

Today we talk about Bayesian (because it’s simple to understand and elegant)

Page 6: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

The Bayesian way

• Bayes’ Theorem

• Bayesian belief update

• Inference in Bayesian networks

Page 7: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Some probability theory

• Space of all possible world models = area equal to 1

Page 8: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Some probability theory

• Probability of event A = fraction of worlds in which A happens

A

• What does it mean that P(A) = 0.2?

• What does it mean that P(A) = 0? or = 1?

Page 9: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Some probability theory

• Probability of A not happening = complement of P(A)

A

Page 10: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Some probability theory

AB

Page 11: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Basic axioms in probability theory

Domain of the probability value:

Constants:

Connection of AND and OR:

Page 12: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Conditional probability

AB

Only consider worlds in which A happens -> new space of worlds

B|A

Consider worlds in which B happens, but only within the new space

P(B|A) = fraction of worlds with B within the new space

Page 13: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Conditional probability

AB B|A

Page 14: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Conditional probability

We have:

Chain rule:

Law of total probability:

Page 15: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Bayes’ Theorem (Bayes’ rule)

Use chain rule twice for P(A and B):

The right hand sides must be the same!

Bayes’ rule:

Page 16: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

The beauty of Bayes’ Theorem

A = evidence (observation); B = hypothesis

Prior: captures our prior knowledge/belief

Likelihood: how likely to observe the evidence, if the hypothesis was true

P(evidence) = probability of observing the evidence in general (aggregated over all possible hypotheses)

We update our belief after observing some evidences

Page 17: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Example: the Monty Hall problem

The game:

At the beginning, all doors are closed

The prize is behind 1 door (with equal probability)

You choose 1 door (say Door 1)

The host opens a door (say Door 2) which has a goat behind it

Let’s make a deal: would you swap your choice to Door 3?

Page 18: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Solution of the Monty Hall problemYour choice: Door 1; Offer: choose Door 3 instead

What are the chances of each option for winning (getting the prize)?

A = hypothesis: prize behind Door 1

B = host chooses Door 2 to open (and we see a goat)

Bayes’ rule:

= 1/3 = 1/2

= ½ (why?)

Chance of winning the prize for staying with Door 1 = 1/3

Chance of winning the prize for switching to Door 3 = 2/3

Page 19: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Calculating the denominator

Bayes’ rule:

A = prize behind Door 1; B = host chooses Door 2 (between Doors 2 and 3)

Use law of total probability: X = door with prize

X= Door 1:

X= Door 2:

X= Door 3:

Page 20: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Example 2: the HIV test problem

HIV lab tests are quite accurate:

• 99% sensitivity: if a patient is HIV+, then probability that the test has positive results is 0.99

• 99% specificity: if a patient is HIV-, then probability that the test has negative results is also 0.99

HIV is rare in patients in our population: about 1 out of 1000 (even among those who get tested)

Situation: A patient does a HIV test and gets a positive result

Question: what are the chances that the patient is indeed HIV+?

Page 21: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Solution for the HIV test problem

A = test was positive; B = patient is HIV+

We want to calculate P(B|A)

Prior: P(B) = 0.001 (1 out of 1000 is HIV+)

Likelihood: P(A|B) = 0.99 (99% sensitivity)

What about P(A) ?

Page 22: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Solution for the HIV test problem

Calculation of P(A)

Use law of total probability

Term 1:

Term 2:

Page 23: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Solution for HIV test Calculating P(B|A)

P(B) = 0.001; P(A|B) = 0.99; P(A) = 0.01089

This means that even if the test is positive, it’s only 9% that the patient is HIV+

Page 24: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Some discussions

Only 15% of doctors gets this right

Most doctors think that if a HIV test is positive, there’s a high chance that the patient is HIV+

Why?

• They typically focus on the accuracy (sensitivity) of the test

• They're neglecting the background or base rate of HIV prevalence (prior)

Page 25: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Bonus question

Russian roulette with 2 bullets

• You put 2 bullets into a revolver, such that they are next to each other

• Your opponent spins and pull the trigger

• … and survives. Now it’s your turn!

• Question: should you spin the revolver as well, or you shouldn’t spin it?

• Question 2: what if there’s only 1 bullet in the revolver?

Page 26: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Belief update

London – Rome flight

Page 27: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Belief update

Near London

Near Rome

Near Paris

Near Monaco

You look out the window, and you see…

land …and sea …and high mountains

But you know that you’ve been flying for a while

unlikely

maybe

maybe

maybe

unlikely

maybe

unlikely

maybe

unlikely

unlikely

probably

unlikely

You don’t know over which area you are flying …

Page 28: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Bayesian belief update

Prior: probability that the model is true (before the observation)

Likelihood: how likely to have the observed event, if the model was true

Denominator: marginal likelihood (or model evidence)

Left hand side = posterior: probability that the model is true, after we have seen the observations

Belief: probability distribution over all the possible models

• Captures our knowledge + uncertainty about the true world model

How to update our belief after each observation?

Page 29: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Bayesian belief update

Prior probability Prior distribution = prior belief

Posterior probability Posterior distribution = posterior belief

Page 30: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Bayesian belief update example

Search for a crashed airplane using Bayesian updating

• Imagine you're designing a search-and- rescue UAV. Its job is to autonomously look for aircraft wreckage

• It is easier to detect wreckage in some terrain types than others

Page 31: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Bayesian belief update example• Difficulty model: what’s the probability to find the wreckage in the area

Page 32: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Bayesian belief update example• Prior belief: based on the last known location

Page 33: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Bayesian belief update example• Search: always go to the point with highest probability

• At the beginning:

Page 34: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Bayesian belief update example• Search: always go to the point with highest probability

• After 10 steps:

Page 35: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Bayesian belief update example• Search: always go to the point with highest probability

• After 50 steps:

Page 36: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Bayesian belief update example• Search: always go to the point with highest probability

• After 250 steps:

Page 37: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Bayesian belief update example• Search: always go to the point with highest probability

• After 500 steps:

Page 38: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Bayesian belief update example• Search: always go to the point with highest probability

• After 1000 steps:

Page 39: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Bayesian belief update example• Search: always go to the point with highest probability

• After 2000 steps:

Page 40: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Complex knowledge representation

• So far we deal with simple correlations between probabilities

A B

• We use probabilities to capture uncertainty in our knowledge

• What if we have much more complicated network of correlations?

Inference: derive extra information/conclusions from observed data

• How to do inference in complex networks? How to use Bayes’ rule there?

• Inference in simple systems: use Bayes’ rule

Page 41: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Inference with joint distribution

• The simplest way to do inference is to look at the joint distribution of the probability variables

• Joint distribution captures all the interconnections + dependencies

An example from employment survey

• M: the person is a male?

• L: does the person work long hours?

• R: is the person rich?

But before Bayesian nets …

Page 42: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Inference with joint distribution

Truth table: if all the variables are binaries

Page 43: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Inference with joint distribution

P(the person is rich) = ? 0.13+0.10+0.01+0.02 = 0.26

Page 44: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Inference with joint distribution

P(L | M) = ? P (L|M) = P(L and M)/P(M) = (0.13+0.11)/(0.13+0.11+0.10+0.34)

= 0.35

Page 45: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Inference with joint distribution• We can do any inference from joint distribution

• Issues: doesn’t scale well in practice (brute force solution)

• E.g., with 30 variables, we need 2^30 probabilities … (1 billion)

• In theory: doesn’t show the relationships

• We might want to exploit the structure of relationships to simplify the calculations

• E.g., if R (rich) is independent from M (male) and L (working long hours), then we can drop M and L when we do inference about R

Page 46: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Independence

Definition: Two random variables are independent if their joint probability is the product of their probabilities

Similarly:

Another property: If P(A), P(B) > 0

Page 47: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Bayesian networks

Use graphical representation to capture the dependencies between the random variables

Studied for the exam

Lecturer is in good mood

High exam result

P(M) = 0.3P(cM) = 0.7

P(S) = 0.8P(cS) = 0.2

P(H|M, S) = 0.9

P(H|cM,S) = 0.4

P(H|M, cS) = 0.5

P(H|cM,cS) = 0.05

Page 48: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Bayesian networksInference: given high exam result, what is the probability that the lecturer was in a good mood? (P(M|H) = ?)

Studied for the exam

Lecturer is in good mood

High exam result

P(M) = 0.3P(cM) = 0.7

P(S) = 0.8P(cS) = 0.2

P(H|M, S) = 0.9

P(H|cM,S) = 0.4

P(H|M, cS) = 0.5

P(H|cM,cS) = 0.05

Page 49: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Bayesian networks

P(M) = 0.3P(cM) = 0.7

P(S) = 0.8P(cS) = 0.2

P(H|M, S) = 0.9

P(H|cM,S) = 0.4

P(H|M, cS) = 0.5

P(H|cM,cS) = 0.05

P(M| H) = ?

P(M) = 0.3

P(H|M) = P(H|M, S)P(S) + P(H|M,cS)P(cS)

= 0.9*0.8 + 0.5*0.2 = 0.73

P(H) = ?

Page 50: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Bayesian networks

P(H) = P(H|M,S)P(M and S)

+ P(H|M,cS)P(M and cS)

+ P(H|cM,S)P(cM and S)

+ P(H|cM,cS)P(cM and cS)

M and S are independent

= P(H|M,S)P(M)P(S)

+ P(H|M,cS)P(M)P(cS)

+ P(H|cM,S)P(cM)P(S)

+ P(H|cM,cS)P(cM)P(cS)

P(H) = 0.9*0.3*0.8

+ 0.5*0.3*0.2

+ 0.4*0.7*0.8

+ 0.05*0.7*0.2 = 0.216 + 0.03 + 0.224 + 0.007 = 0.477

Page 51: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Bayesian networks

P(M) = 0.3P(cM) = 0.7

P(S) = 0.8P(cS) = 0.2

P(H|M, S) = 0.9

P(H|cM,S) = 0.4

P(H|M, cS) = 0.5

P(H|cM,cS) = 0.05

P(M| H) = ?

P(M) = 0.3

P(H|M) = 0.73

P(H) = 0.477

P(M| H) = 0.3*0.73/0.477 = 0.459

Page 52: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Building Bayesian networks

• Bayesian nets are sometimes built manually, consulting domain experts for

structure and probabilities.

• More often, the structure is supplied by domain experts (i.e., they specify

what affects what) but the probabilities are learned from data.

• Sometimes both structure and probabilities are learned from data. • Difficult problem: puts the AI program in a similar position to a scientist

trying out different hypotheses. • Need a method to reward the proposed net structure for matching the

data, but to penalize excessive complexity (Occam’s razor).

Building from data:

With domain expert:

Page 53: COMP 2208 Dr. Long Tran-Thanh ltt08r@ecs.soton.ac.uk University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Properties of Bayesian networks

• Bayesian networks must be directed acyclic graphs. • The major efficiency of the Bayesian network is that we have economized

on memory. • They are also easier for human beings to interpret than the raw joint

distribution.