Probabilistic Inference Reading: Chapter 13 Next time: How should we define artificial intelligence?...

46
Probabilistic Inference Reading: Chapter 13 Next time: How should we define artificial intelligence? Reading for next time (see Links, Reading for Retrospective Class): Turing paper Mind, Brain and Behavior, John Searle Prepare discussion points by midnight, wed night (see end of slides)

Transcript of Probabilistic Inference Reading: Chapter 13 Next time: How should we define artificial intelligence?...

Probabilistic Inference

Reading: Chapter 13

Next time: How should we define artificial intelligence?Reading for next time (see Links, Reading for Retrospective Class):Turing paperMind, Brain and Behavior, John SearlePrepare discussion points by midnight, wed night (see end of slides)

2

Transition to empirical AI

Add in Ability to infer new facts from old Ability to generalize Ability to learn based on past observation

Key: Observation of the world Best decision given what is known

3

Overview of Probabilistic Inference

Some terminology

Inference by enumeration

Bayesian Networks

4

5

6

7

8

9

Probability Basics

Sample space

Atomic event

Probability model

An event A

10

11

Random Variables

Random variable

Probability for a random variable

12

13

14

15

16

17

Logical Propositions and Probability

Proposition = event (set of sample points) Given Boolean random variables A and B:

Event a = set of sample points where A(ω)=true Event ⌐a=set of sample points where A(ω)=false Event aΛb=points where A(ω)=true and B(ω)=true

Often the sample space is the Cartesian product of the range of variables

Proposition=disjunction of atomic events in which it is true (aVb) = (⌐aΛb)V(aΛ⌐b)V(aΛb)

P(aVb)= P(⌐aΛb)+P(aΛ⌐b)+P(aΛb)

18

19

20

21

22

23

24

25

Axioms of Probability

All probabilities are between 0 and 1

Necessarily true propositions have probability 1. Necessarily false propositions have probability 0

The probability of a disjunction is P(aVb)=P(a)+P(b)-P(aΛb)

P(⌐a)=1-p(a)

26

The definitions imply that certain logically related events must have related probabilitiesP(aVb)= P(a)+P(b)-P(aΛb)

27

Prior Probability

Prior or unconditional probabilities of propositions P(female=true)=.5 corresponds to belief prior to

arrival of any new evidence Probability distribution gives values for all

possible assignments P(color) = (color = green, color=blue, color=purple) P(color)=<.6,.3,.1> (normalized: sums to 1)

Joint probability distribution for a set of r.v.s gives the probability of every atomic event on those r.v.s (i.e., every sample point) P(color,gender) = a 3X2 matrix

28

29

30

31

32

33

34

Inference by enumeration

Start with the joint distribution

35

Inference by enumeration

P(HasTeeth)=.06+.12+.02=.2

36

Inference by enumeration

P(HasTeethVColor=Green)=.06+.12+.02+.24=.44

37

Conditional Probability

Conditional or posterior probabilities E.g., P(PlayerWins|HostOpenDoor=1 and

PlayerPickDoor2 and Door1=goat) = .5

If we know more (e.g., HostOpenDoor=3 and door3-goat):P(PlayerWins)=1Note: the less specific belief remains valid after more evidence arrives, but is not always useful

New evidence may be irrelevant, allowing simplification: P(PlayerWins|California-

earthquake)=P(PlayerWins)=.3

38

Conditional Probability

A general version holds for joint distributions:

P(PlayerWins,HostOpensDoor1)=P(PlayerWins|HostOpensDoor1)*P(HostOpensDoor1)

39

Inference by enumeration Compute conditional probabilities: P(⌐Hasteeth|color=green)= P(⌐HasteethΛcolor=green)

P(color=green)0.8 = 0.24

0.06+.24

40

Normalization Denominator can be viewed as normalization constraint α P(⌐Hasteeth|color=green) = α P(⌐Hasteeth|color=green)

=α[P(⌐Hasteeth,color=green, female)+ P(⌐Hasteeth,color=green, ⌐ female)]=α[<0.03,0.12>+<0.03,0.012>]=α<0.06,0.24>=<0.2,0.8>

Compute distribution on query variable by fixing evidence variables and summing over hidden variables

41

Inference by enumeration

42

Independence

A and B are independent iffP(A|B)=P(A) or P(B|A)=P(B) or P(A,B)=P(A)P(B)

32 entries reduced to 12; for n independent biased coins, 2n -> n

Absolute independence powerful but rare Any domain is large with hundreds of

variables none of which are independent

43

44

Conditional Independence

If I have length <=.2, the probability that I am female doesn’t depend on whether or not I have teeth: P(female|length<=.2,hasteeth)=P(female|hasteeth)

The same independence holds if I am >.2 P(male|length>.2,hasteeth)=P(male|

length>.2) Gender is conditionally independent of

hasteeth given length

45

In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n

Conditional independence is our most basic and robust form of knowledge about uncertain environments

46

Next Class: Turing Paper

A discussion class

Graduate students and non-degree students: Anyone beyond a bachelor’s:

Prepare a short statement on the paper. Can be your reaction, your position, a place where you disagree, an explication of a point.

Undergraduates: Be prepared with questions for the graduate students

All: Submit your statement or your question by midnight Wed night.

All statements and questions will be printed and distributed in class on Wednesday.