Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge...

29
Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and Artificial Intelligence Fall 2016 (Borrows from slides of Percy Liang, Stanford U.)

Transcript of Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge...

Page 1: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Lecture13:KnowledgeRepresentationandReasoningPart2

SanjeevArora EladHazan

COS402– MachineLearningand

ArtificialIntelligenceFall2016

(BorrowsfromslidesofPercyLiang,StanfordU.)

Page 2: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

MidtermHistogram

0

2

4

6

8

10

12

14

16

18

20

30-39 40-49 50-59 60-69 70-79 80-89 90-99

#ofstudents

#ofstudents

Questions?Concerns?Pls cometalktous.Gradingmistake?Pls talktoDr.LiandMr.Singhfirst.

Mean=78.42

Page 3: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Lasttime:rule-basedsystembaseduponlogic

KB:R⋁ S⋁ C;R→C⋀ ¬S;C⟷ ¬SR →US→¬U

Sensorsgivesomevaluestovariables(e.g.R=1)NowfigureoutifU=0ispossible

Page 4: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

(contd)Venndiagramview

EntailmentKB⊨ fKBcontradictsf

KBisconsistentwithf

(sometimesalsophrasedas“contingency”)

Knowledgebase:Setofformulae{f1,f2,…, fn}M(KB)=Allpossiblemodelsforf1⋀ f2⋀… ⋀ fn

Page 5: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Lasttime:Resolutionprocedure

• DecidesifagivenKBissatisfiable (hasamodel)

• CanbeusedtodecideKB⊨ fbyadding¬ftoKBandcheckingifthenewKBissatisfiable.(Generalpurposeprocedure!)

• Simplealgorithm;oftenworksinpractice;knowntotakeexponentialtimeinworst-case.(Ifwehadanefficientalgorithmthatalwaysworksinpractice,P=NP,whichisbelievedtobefalse.)

Page 6: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Today

Bayesiannets(aka“Beliefnets”and“Graphicalmodels”)

Canbeseenasaddingprobabilitiestologic.(Reasoninginvolvesnumericalcalculationsinsteadofresolution)

Page 7: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Review:Probabilities(example)

Randomvariables:SunshineS∈ {0,1};RainyR∈ {0,1}.

Review: probability (example)

Random variables: sunshine S 2 {0, 1}, rain R 2 {0, 1}

Joint distribution:

P(S,R) =

s r P(S = s,R = r)

0 0 0.20

0 1 0.08

1 0 0.70

1 1 0.02

Marginal distribution:

P(S) =s P(S = s)

0 0.28

1 0.72

(aggregate rows)

Conditional distribution:

P(S | R = 1) =

s P(S = s | R = 1)

0 0.8

1 0.2

(select rows, normalize)

CS221 / Autumn 2015 / Liang 11

JointDistribution

MarginalDistribution

Review: probability (example)

Random variables: sunshine S 2 {0, 1}, rain R 2 {0, 1}

Joint distribution:

P(S,R) =

s r P(S = s,R = r)

0 0 0.20

0 1 0.08

1 0 0.70

1 1 0.02

Marginal distribution:

P(S) =s P(S = s)

0 0.28

1 0.72

(aggregate rows)

Conditional distribution:

P(S | R = 1) =

s P(S = s | R = 1)

0 0.8

1 0.2

(select rows, normalize)

CS221 / Autumn 2015 / Liang 11

ConditionalDistribution

Review: probability (example)

Random variables: sunshine S 2 {0, 1}, rain R 2 {0, 1}

Joint distribution:

P(S,R) =

s r P(S = s,R = r)

0 0 0.20

0 1 0.08

1 0 0.70

1 1 0.02

Marginal distribution:

P(S) =s P(S = s)

0 0.28

1 0.72

(aggregate rows)

Conditional distribution:

P(S | R = 1) =

s P(S = s | R = 1)

0 0.8

1 0.2

(select rows, normalize)

CS221 / Autumn 2015 / Liang 11

S=1R=0 R=1

S=0

S=0,R=0

S=1,R=1

Page 8: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Review(contd)Review: probability (general)

Random variables:

X = (X1, . . . , Xn) partitioned into (A,B)

Joint distribution:

P(X) = P(X1, . . . , Xn)

Marginal distribution:

P(A) =P

b P(A,B = b)

Conditional distribution:

P(A | B = b) = P(A,B=b)P(B=b)

CS221 / Autumn 2015 / Liang 13

Page 9: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Example:Medicaldiagnosis

Patientwithdyspnoea (shortnessofbreath),concernedaboutlungcancer

• Otherexplanations:bronchitis,tuberculosis• Riskfactors:smoking,exposuretoairpollution• X-raycanshowformationsinlungs(butsimilarimagingintuberculosis)

Whatisa”calculus”forestimatingthechancethathehaslungcancer?

Or,assumingcancerhasbeendiagnosed,thechancethatitwascausedbyexposuretoairpollution?

[ExampletakenfromBayesianArtificialIntelligence,byKorb andNicholson.]

Page 10: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

KnowledgeEngg:Settingupdecision-makingtoolsthatleveragedomainknowledge

Page 11: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Lasttime:Knowledgeengineeringviapropositionallogic

Today:ViaBayesiannets

Maingoals:(a)Compactrepresentation.(b)Explicitmodelingofprobabilityand“causality.”

Page 12: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Step1:Identifyvariablesassociatedwiththeproblem

30 Bayesian Artificial Intelligence, Second Edition

Throughout the remainder of this section we will use the following simple medi-cal diagnosis problem.Example problem: Lung cancer. A patient has been suffering from shortness ofbreath (called dyspnoea) and visits the doctor, worried that he has lung cancer. Thedoctor knows that other diseases, such as tuberculosis and bronchitis, are possiblecauses, as well as lung cancer. She also knows that other relevant information in-cludes whether or not the patient is a smoker (increasing the chances of cancer andbronchitis) and what sort of air pollution he has been exposed to. A positive X-raywould indicate either TB or lung cancer.2

2.2.1 Nodes and valuesFirst, the knowledge engineer must identify the variables of interest. This involvesanswering the question: what are the nodes to represent and what values can theytake, or what state can they be in? For now we will consider only nodes that take dis-crete values. The values should be both mutually exclusive and exhaustive, whichmeans that the variable must take on exactly one of these values at a time. Commontypes of discrete nodes include:

• Boolean nodes, which represent propositions, taking the binary values true (T )and false (F). In a medical diagnosis domain, the node Cancer would representthe proposition that a patient has cancer.

• Ordered values. For example, a node Pollution might represent a patient’s pol-lution exposure and take the values {low, medium, high}.

• Integral values. For example, a node called Age might represent a patient’s ageand have possible values from 1 to 120.

Even at this early stage, modeling choices are being made. For example, an alter-native to representing a patient’s exact age might be to clump patients into differentage groups, such as {baby, child, adolescent, young, middleaged, old}. The trick is tochoose values that represent the domain efficiently, but with enough detail to performthe reasoning required. More on this later!

TABLE 2.1Preliminary choices of nodes andvalues for the lung cancer example.

Node name Type ValuesPollution Binary {low, high}Smoker Boolean {T, F}Cancer Boolean {T, F}Dyspnoea Boolean {T, F}X-ray Binary {pos, neg}

2This is a modified version of the so-called “Asia” problem Lauritzen and Spiegelhalter, 1988, givenin §2.5.3.

Needquantitative versionsoffactslike“Smokersarelikeliertogetcancer,butnotallsmokersgetcancer.”

“LungcancercanleadtothesymptomDyspnoea,butnotallpatientsmayexhibitit,andsomepatientswhoexhibititmaynothaveLungCancer..”

Howmanynumbersdoyouneedtospecifyaprobabilitydistributionon5binaryvariables?

25 - 1=31

Page 13: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Step2:Identifycausalitystructure.

30 Bayesian Artificial Intelligence, Second Edition

Throughout the remainder of this section we will use the following simple medi-cal diagnosis problem.Example problem: Lung cancer. A patient has been suffering from shortness ofbreath (called dyspnoea) and visits the doctor, worried that he has lung cancer. Thedoctor knows that other diseases, such as tuberculosis and bronchitis, are possiblecauses, as well as lung cancer. She also knows that other relevant information in-cludes whether or not the patient is a smoker (increasing the chances of cancer andbronchitis) and what sort of air pollution he has been exposed to. A positive X-raywould indicate either TB or lung cancer.2

2.2.1 Nodes and valuesFirst, the knowledge engineer must identify the variables of interest. This involvesanswering the question: what are the nodes to represent and what values can theytake, or what state can they be in? For now we will consider only nodes that take dis-crete values. The values should be both mutually exclusive and exhaustive, whichmeans that the variable must take on exactly one of these values at a time. Commontypes of discrete nodes include:

• Boolean nodes, which represent propositions, taking the binary values true (T )and false (F). In a medical diagnosis domain, the node Cancer would representthe proposition that a patient has cancer.

• Ordered values. For example, a node Pollution might represent a patient’s pol-lution exposure and take the values {low, medium, high}.

• Integral values. For example, a node called Age might represent a patient’s ageand have possible values from 1 to 120.

Even at this early stage, modeling choices are being made. For example, an alter-native to representing a patient’s exact age might be to clump patients into differentage groups, such as {baby, child, adolescent, young, middleaged, old}. The trick is tochoose values that represent the domain efficiently, but with enough detail to performthe reasoning required. More on this later!

TABLE 2.1Preliminary choices of nodes andvalues for the lung cancer example.

Node name Type ValuesPollution Binary {low, high}Smoker Boolean {T, F}Cancer Boolean {T, F}Dyspnoea Boolean {T, F}X-ray Binary {pos, neg}

2This is a modified version of the so-called “Asia” problem Lauritzen and Spiegelhalter, 1988, givenin §2.5.3.

Pollution Smoking

Cancer

X-ray Dyspnoea

Page 14: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Step3:Putinconditionalprobabilities.(estimatedfrompatientstudies)

Pollution(P)

Smoking(S)

Cancer(C)

X-ray(X)

Dyspnoea(D)

30 Bayesian Artificial Intelligence, Second Edition

Throughout the remainder of this section we will use the following simple medi-cal diagnosis problem.Example problem: Lung cancer. A patient has been suffering from shortness ofbreath (called dyspnoea) and visits the doctor, worried that he has lung cancer. Thedoctor knows that other diseases, such as tuberculosis and bronchitis, are possiblecauses, as well as lung cancer. She also knows that other relevant information in-cludes whether or not the patient is a smoker (increasing the chances of cancer andbronchitis) and what sort of air pollution he has been exposed to. A positive X-raywould indicate either TB or lung cancer.2

2.2.1 Nodes and valuesFirst, the knowledge engineer must identify the variables of interest. This involvesanswering the question: what are the nodes to represent and what values can theytake, or what state can they be in? For now we will consider only nodes that take dis-crete values. The values should be both mutually exclusive and exhaustive, whichmeans that the variable must take on exactly one of these values at a time. Commontypes of discrete nodes include:

• Boolean nodes, which represent propositions, taking the binary values true (T )and false (F). In a medical diagnosis domain, the node Cancer would representthe proposition that a patient has cancer.

• Ordered values. For example, a node Pollution might represent a patient’s pol-lution exposure and take the values {low, medium, high}.

• Integral values. For example, a node called Age might represent a patient’s ageand have possible values from 1 to 120.

Even at this early stage, modeling choices are being made. For example, an alter-native to representing a patient’s exact age might be to clump patients into differentage groups, such as {baby, child, adolescent, young, middleaged, old}. The trick is tochoose values that represent the domain efficiently, but with enough detail to performthe reasoning required. More on this later!

TABLE 2.1Preliminary choices of nodes andvalues for the lung cancer example.

Node name Type ValuesPollution Binary {low, high}Smoker Boolean {T, F}Cancer Boolean {T, F}Dyspnoea Boolean {T, F}X-ray Binary {pos, neg}

2This is a modified version of the so-called “Asia” problem Lauritzen and Spiegelhalter, 1988, givenin §2.5.3.

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

Page 15: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

NB:Bayesnethasnodirectedcycle(why?)

Pollution(P)

Smoking(S)

Cancer(C)

X-ray(X)

Dyspnoea(D)

30 Bayesian Artificial Intelligence, Second Edition

Throughout the remainder of this section we will use the following simple medi-cal diagnosis problem.Example problem: Lung cancer. A patient has been suffering from shortness ofbreath (called dyspnoea) and visits the doctor, worried that he has lung cancer. Thedoctor knows that other diseases, such as tuberculosis and bronchitis, are possiblecauses, as well as lung cancer. She also knows that other relevant information in-cludes whether or not the patient is a smoker (increasing the chances of cancer andbronchitis) and what sort of air pollution he has been exposed to. A positive X-raywould indicate either TB or lung cancer.2

2.2.1 Nodes and valuesFirst, the knowledge engineer must identify the variables of interest. This involvesanswering the question: what are the nodes to represent and what values can theytake, or what state can they be in? For now we will consider only nodes that take dis-crete values. The values should be both mutually exclusive and exhaustive, whichmeans that the variable must take on exactly one of these values at a time. Commontypes of discrete nodes include:

• Boolean nodes, which represent propositions, taking the binary values true (T )and false (F). In a medical diagnosis domain, the node Cancer would representthe proposition that a patient has cancer.

• Ordered values. For example, a node Pollution might represent a patient’s pol-lution exposure and take the values {low, medium, high}.

• Integral values. For example, a node called Age might represent a patient’s ageand have possible values from 1 to 120.

Even at this early stage, modeling choices are being made. For example, an alter-native to representing a patient’s exact age might be to clump patients into differentage groups, such as {baby, child, adolescent, young, middleaged, old}. The trick is tochoose values that represent the domain efficiently, but with enough detail to performthe reasoning required. More on this later!

TABLE 2.1Preliminary choices of nodes andvalues for the lung cancer example.

Node name Type ValuesPollution Binary {low, high}Smoker Boolean {T, F}Cancer Boolean {T, F}Dyspnoea Boolean {T, F}X-ray Binary {pos, neg}

2This is a modified version of the so-called “Asia” problem Lauritzen and Spiegelhalter, 1988, givenin §2.5.3.

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

“Roots”

“Leaves”

Rootsareallindependentofeachother.

Page 16: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

BayesianNet:FormalDefinition

Definition

Definition: Bayesian network

Let X = (X1, . . . , Xn) be random variables.

A Bayesian network is a directed acyclic graph (DAG) that spec-

ifies a joint distribution over X as a product of local conditional

distributions, one for each node:

P(X1 = x1, . . . , Xn = xn) =nY

i=1

p(xi | xParents(i))

[whiteboard: example L,B,E,A,R]

CS221 / Autumn 2015 / Liang 19

Page 17: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Examplerevisited

Pollution(P)

Smoking(S)

Cancer(C)

X-ray(X)

Dyspnoea(D)

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

Pr[X,D,C,P,S]

=Pr[X|C]Pr[D|C]Pr[C|P,S]Pr[S]Pr[C]

Note:Distributionon5boolean variables;specifiedusingonly10numbers

(insteadofthetrivial25 -1=31numbers.)

Page 18: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Conditionalprobabilitytable

A1 A2 A3 Ak

B

• BdependsuponallothervariablesonlythroughA1,..Ak

• CPDatBgivesProb.B=1orB=0conditioned onall2k combinationsofA1,..Ak

• Note:SumofprobabilitiesofalleventsinCPD1.

Page 19: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Conditionalprobabilitytable

A1 A2 A3 Ak

B

• BdependsuponallothervariablesonlythroughA1,..Ak

• CPDgivesProb ofBconditionedonall2k combinationsofA1,..Ak

C Mainproperty:Pr[C|B,A1,..Ak]=Pr[C|B]

NowayforA1 to“cause”CexceptviaB(causation/influencetravelsviadirectedpaths)

Page 20: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Special properties

Key idea: locally normalized

All factors (local conditional distributions) satisfy:X

xi

p(xi

| xParents(i)) = 1 for each xParents(i)

Implications:

• Consistency of sub-Bayesian networks

• Consistency of conditional distributions

CS221 / Autumn 2015 / Liang 23

Page 21: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Consistency of sub-Bayesian networks

B E

A

A short calculation:

P(B = b, E = e) =P

a P(B = b, E = e,A = a)

=P

a p(b)p(e)p(a | b, e)

= p(b)p(e)P

a p(a | b, e)

= p(b)p(e)

CS221 / Autumn 2015 / Liang 25

Page 22: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Consistency of sub-Bayesian networks

B E

A

A short calculation:

P(B = b, E = e) =P

a P(B = b, E = e,A = a)

=P

a p(b)p(e)p(a | b, e)

= p(b)p(e)P

a p(a | b, e)

= p(b)p(e)

CS221 / Autumn 2015 / Liang 25

SupposeweremovealeafnodeA.

Marginalizationgivesnewbayes netwhereB,Eareindependentwithprob.p(b),p(e)

Page 23: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Marginalization:Computesubdistributions onsomesubsetofvariables.

Pollution(P)

Smoking(S)

Cancer(C)

X-ray(X)

Dyspnoea(D)

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

Introducing Bayesian Networks 31

For our example, we will begin with the restricted set of nodes and values shownin Table 2.1. These choices already limit what can be represented in the network. Forinstance, there is no representation of other diseases, such as TB or bronchitis, so thesystem will not be able to provide the probability of the patient having them. Anotherlimitation is a lack of differentiation, for example between a heavy or a light smoker,and again the model assumes at least some exposure to pollution. Note that all thesenodes have only two values, which keeps the model simple, but in general there is nolimit to the number of discrete values.

2.2.2 StructureThe structure, or topology, of the network should capture qualitative relationshipsbetween variables. In particular, two nodes should be connected directly if one af-fects or causes the other, with the arc indicating the direction of the effect. So, in ourmedical diagnosis example, we might ask what factors affect a patient’s chance ofhaving cancer? If the answer is “Pollution and smoking,” then we should add arcsfrom Pollution and Smoker to Cancer. Similarly, having cancer will affect the pa-tient’s breathing and the chances of having a positive X-ray result. So we add arcsfrom Cancer to Dyspnoea and XRay. The resultant structure is shown in Figure 2.1.It is important to note that this is just one possible structure for the problem; we lookat alternative network structures in §2.4.3.

P(X=pos|C)

S P(C=T|P,S)

0.05

0.030.02

0.90

0.20

T

FT

F 0.001

Cancer

Pollution Smoker

XRay Dyspnoea

0.90

P(P=L)

C P(D=T|C)

F 0.30

P

L

L

H

P(S=T)

H

C

TF

T 0.65

0.30

FIGURE 2.1: A BN for the lung cancer problem.

Structure terminology and layout

In talking about network structure it is useful to employ a family metaphor: a nodeis a parent of a child, if there is an arc from the former to the latter. Extending the

Pr[X,D,C,P,S]

=Pr[X|C]Pr[D|C]Pr[C|P,S]Pr[S]Pr[C]

Pr[C=0]=Pr[C=0,P=0,S=0]+Pr[C=0,P=0,S=1]+Pr[C=0,P=1,S=0]+Pr[C=0,P=1,S=1]

=∑b=0,1d=0,1Pr[C=0,P=b,S=d]

DisjointEvents

Page 24: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

I’vehintedalreadybutwillnotproveformally…

Bayesnetsdefineproperdistributions,inthesensethatallmarginaldistributionsarewell-defined(meaningprobabilitiessumto1).

Consistency of local conditionals

Key idea: local conditional distributions

Local conditional distributions (factors) are the true conditional

distributions.

A B C

D E

F G H

P(D = d | A = a,B = b)| {z }from probabilistic inference

= p(d | a, b)| {z }local factor

CS221 / Autumn 2015 / Liang 29

e.g,

Frommarginalizationcalculation FromCPDtable

Page 25: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Bayesnetsasmodelsofprobabilisticprocesses

A1 A2 A3 Ak

B

C

Step1:CoinstossedateachAi nodetodecideifAi happened.(Pr[Heads]=Pr[Ai =1])

Step2:CoinstossedatBnodetodecideifB=1.(Pr[Heads]lookedupfromCPD

table)

Step3:CoinstossedatCnodetodecideifC=1.

Uses:modelsoflanguage/text,socialprocesses,diseasepropagationetc.

Page 26: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Example:Bayesnetfora5-wordsentenceaccordingtobigrammodel.

Insteadofbinaryvariables,useN-ary variables(N=#WordsinDictionary)

W1 W2 W3 W4 W5

Pr[Wi |W1 W2…Wi-1]=Pr[Wi |Wi-1 ]

Page 27: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

“Explainingaway”phenomenon

Whiz-bangUadmitsstudentswhoareeitherBrainy(B)orSporty(S)(orboth)

B SPr[S=1|B=1,A=1]≤Pr[S=1|A=1]

Conditionalonhavingbeenadmitted:Beingbrainyimpliesyouarelesslikelytobesporty(andviceversa)

(AkaBerkson’s paradox,and”selectionbias.”)

A

Explaining away

B E

A

Key idea: explaining away

Suppose two causes positively influence an e↵ect. Conditioned on

the e↵ect, conditioning on one cause reduces the probability of the

other cause.

CS221 / Autumn 2015 / Liang 40

Page 28: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Nonuniqueness ofbayes net

SupposeA,Bareindependentcointosses.Followingbayes netsareallcorrect

A B

Pr[A=1]=1/2 Pr[B=1]=1/2

A

B

Pr[A=1]=1/2

Pr[B=1|A]=1/2

Pr[B=1]=1/2

B

A

Pr[A=1|B]=1/2

Generallesson:“Causality”isnoteasytopindownfromcorrelation.

Page 29: Lecture 13: Knowledge Representation and Reasoning Part 2 … · Lecture 13: Knowledge Representation and Reasoning Part 2 Sanjeev Arora Elad Hazan COS 402 – Machine Learning and

Nexttime:Doingcalculations/predictionswithbayesian nets.