4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence...

Post on 11-Mar-2021

3 views 0 download

Transcript of 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence...

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

4190.408 2015-Spring

Bayesian Networks – 3, 4Inference with Probabilistic Graphical Models

Byoung-Tak Zhang

Biointelligence Lab

Seoul National University

1

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

기계학습이란?

• 학습시스템:

– 환경 E와의상호작용으로부터획득한경험적인데이터 D를바탕으로모델 M을자동으로구성하여스스로성능 P를향상하는시스템

• Self-improving Systems (인공지능관점)

• Knowledge Discovery (데이터마이닝관점)

• Data-Driven Software Design (소프트웨어공학관점)

• Automatic Programming (컴퓨터공학관점)

2

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Machine Learning as Automatic Programming

3

ComputerData

ProgramOutput

ComputerData

Output

Program

Traditional Programming

Machine Learning

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Machine Learning (ML): Three Tasks

4

• Supervised Learning– Estimate an unknown mapping from known input and target output pairs– Learn fw from training set D = {(x,y)} s.t.– Classification: y is discrete– Regression: y is continuous

• Unsupervised Learning– Only input values are provided– Learn fw from D = {(x)} s.t.– Density estimation and compression– Clustering, dimension reduction

• Sequential (Reinforcement) Learning– Not target, but rewards (critiques) are provided “sequentially”– Learn a heuristic function fw from Dt = {(st,at,rt) | t = 1, 2, …} s.t.– With respect to the future, not just past– Sequential decision-making– Action selection and policy learning

)()( xxw fyf

xxw )(f

( , , )t t tf a rw s

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

기계학습모델

• 감독학습모델– Neural Nets

– Decision Trees

– K-Nearest Neighbors

– Support Vector Machines

• 무감독학습모델– Self-Organizing Maps

– Clustering Algorithms

– Manifold Learning

– Evolutionary Learning

• 확률그래프모델– Bayesian Networks

– Markov Networks

– Hidden Markov Models

– Hypernetworks

• 동적시스템모델– Kalman Filters

– Sequential Monte Carlo

– Particle Filters

– Reinforcement Learning

5

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Outline

• Bayesian Inference– Monte Carlo– Importance Sampling– MCMC

• Probabilistic Graphical Models– Bayesian Networks– Markov Random Fields

• Hypernetworks– Architecture and Algorithms– Application Examples

• Discussion

6

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Bayes Theorem

7

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

MAP vs. ML

• What is the most probable hypothesis given data?• From Bayes Theorem

• MAP (Maximum A Posteriori)

• ML (Maximum Likelihood)

8

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Bayesian Inference

9

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring) 10

Prof. Schrater’s Lecture Notes

(Univ. of Minnesota)

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring) 11

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Monte Carlo (MC) Approximation

12

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Markov chain Monte Carlo

13

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

MC with Importance Sampling

14

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Graphical Models

15

Graphical Models (GM)

Causal Models Chain Graphs Other Semantics

Directed GMsDependency Networks Undirected GMs

Bayesian Networks

DBNsFST

HMMs

Factorial HMM MixedMemory Markov Models

BMMs

Kalman

Segment Models

Mixture Models

Decision Trees Simple

Models

PCA

LDA

Markov RandomFields / Markov

networks

Gibbs/BoltzmanDistributions

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

BAYESIAN NETWORKS

16

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Bayesian Networks

• Bayesian network

– DAG (Directed Acyclic Graph)

– Express dependence relations between variables

– Can use prior knowledge on the data (parameters)

17

A B C

D E

n

i

iiXPP1

)|()( paX

P(A,B,C,D,E)

= P(A)P(B|A)P(C|B) P(D|A,B)P(E|B,C,D)

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Representing Probability Distributions

• Probability distribution= probability for each combination of values of these attributes

• Naïve representations (such as tables) run into troubles– 20 attributes require more than 220 106 parameters

– Real applications usually involve hundreds of attributes

18

Hospital patients described by

• Background: age, gender, history of diseases, …

• Symptoms: fever, blood pressure, headache, …

• Diseases: pneumonia, heart attack, …

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Bayesian Networks - Key Idea

• utilize conditional independence

• Graphical representation of conditional independence respectively “causal” dependencies

19

Exploit regularities !!!

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Bayesian Networks

1. Finite, directed acyclic graph

2. Nodes: (discrete) random variables

3. Edges: direct influences

4. Associated with each node: a table representing a conditional probability distribution (CPD), quantifying the effect the parents have on the node

20

MJ

E B

A

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Bayesian Networks

21

X1 X2

X3(0.2, 0.8) (0.6, 0.4)

true 1 (0.2,0.8)

true 2 (0.5,0.5)

false 1 (0.23,0.77)

false 2 (0.53,0.47)

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Example: Use a DAG to model the causality

22

TrainStrike

MartinLate

NormanLate

ProjectDelay

OfficeDirty

BossAngry

BossFailure-in-Love

MartinOversleep

NormanOversleep

Normanuntidy

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Example: Attach prior probabilities to all root nodes

23

TrainStrike

MartinLate

NormanLate

ProjectDelay

OfficeDirty

BossAngry

BossFailure-in-Love

MartinOversleep

NormanOversleep

MartinOversleep ProbabilityT 0.01F 0.99

TrainStrike ProbabilityT 0.1F 0.9

NormanOversleep ProbabilityT 0.2F 0.8

Boss failure-in-love ProbabilityT 0.01F 0.99

Normanuntidy

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Example: Attach prior probabilities to non-root nodes

24

TrainStrike

MartinLate

NormanLate

ProjectDelay

OfficeDirty

BossAngry

BossFailure-in-Love

MartinOversleep

NormanOversleep

Norman oversleepT F

Norman

untidy

T 0.6 0.2

F 0.4 0.8

Train strike

T F

Martin oversleep

T F T F

Martin Late

T 0.95 0.8 0.7 0.05

F 0.05 0.2 0.3 0.95

Normanuntidy

Each column is summed to 1.

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Example: Attach prior probabilities to non-root nodes

25

TrainStrike

MartinLate

NormanLate

ProjectDelay

OfficeDirty

BossAngry

BossFailure-in-Love

MartinOversleep

NormanOversleep

Normanuntidy

Each column is summed to 1.

Boss Failure-in-love

T F

Project Delay

T F T F

Office Dirty

T F T F T F T F

Boss Angry

very 0.98 0.85 0.6 0.5 0.3 0.2 0 0.01

mid 0.02 0.15 0.3 0.25 0.5 0.5 0.2 0.02

little 0 0 0.1 0.25 0.2 0.3 0.7 0.07

no 0 0 0 0 0 0 0.1 0.9

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring) 26

Inference

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

MARKOV RANDOM FIELDS (MARKOV NETWORKS)

27

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Graphical Models

28

Directed Graph(e.g. Bayesian Network)

Undirected Graph(e.g. Markov Random Field)

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Bayesian Image Analysis

29

Original Image Degraded (observed) Image

Noise

Transmission

Likelihood Marginal

yProbabilit PrioriA Processn Degradatio

yProbabilit PosterioriA Image Degraded

Image OriginalImage OriginalImage DegradedImage DegradedImage Original

Pr

PrPr Pr

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Image Analysis

• We could thus represent both the observed image (X) and the true image (Y) as Markov random fields.

• And invoke the Bayesian framework to find P(Y|X)

30

X – observed image

Y – true image

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Details

• Remember

• P(Y|X) proportional to P(X|Y)P(Y)– P(X|Y) is the data model.

– P(Y) models the label interaction.

• Next we need to compute the prior P(Y=y) and the likelihood P(X|Y).

31

P(Y | X) =P(X |Y )P(Y )

P(X)µP(X |Y )P(Y )

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Back to Image Analysis

32

Likelihood can be modeled as a mixture of Gaussians.

The potential is modeled to capture the domain knowledge. One common model is the Isingmodel of the form βyiyj

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Bayesian Image Analysis

• Let X be the observed image = {x1,x2…xmn}

• Let Y be the true image = {y1,y2…ymn}

• Goal : find Y = y* = {y1*,y2*…} such that P(Y = y*|X) is maximum.

• Labeling problem with a search space of Lmn

– L is the set of labels.

– m*n observations.

33

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Unfortunately

34

Observed Image SVM MRF

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Markov Random Fields (MRFs)

• Introduced in the 1960s, a principled approach for incorporating context information.

• Incorporating domain knowledge .

• Works within the Bayesian framework.

• Widely worked on in the 70s, disappeared over the 80s, and finally made a big come back in the late 90s.

35

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Markov Random Field

• Random Field: Let be a family of random variables defined on the set S , in which each random variable … takes a value in a label set L. The family F is called a random field.

• Markov Random Field: F is said to be a Markov random field on S with respect to a neighborhood system N if and only if the following two conditions are satisfied:

},...,,{ 21 MFFFF

iF if

Positivity: ( ) 0,P f f F

)|(}){|( :tyMarkovianiiNii ffPiSfP

36

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Inference

• Finding the optimal y* such that P(Y=y*|X) is maximum.

• Search space is exponential.

• Exponential algorithm - simulated annealing (SA)

• Greedy algorithm – iterated conditional modes (ICM)

• There are other more advanced graph cut based strategies.

37

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Sampling and Simulated Annealing

• Sampling– A way to generate random samples from a (potentially very

complicated) probability distribution.

– Gibbs/Metropolis.

• Simulated annealing– A schedule for modifying the probability distribution so that, at “zero

temperature”, you draw samples only from the MAP solution.

• If you can find the right cooling schedule the algorithm will converge to a global MAP solution.

• Flip side --- SLOW finding the correct schedule is non trivial.

38

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Iterated Conditional Modes

• Greedy strategy, fast convergence

• Idea is to maximize the local conditional probabilities iteratively, given an initial solution.

• Simulated annealing with T =0 .

39

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Parameter Learning

• Supervised learning (easiest case)

• Maximum likelihood:

• For an MRF: ( | )/1( | )

( )

U f TP f eZ

* arg max ( | )P f

40

Bio

Intelligence4190.408 Artificial Intelligence (2015-Spring)

Pseudo Likelihood

• So we approximate

• Large lattice theorem: in the large lattice limit M, PL converges to ML estimate.

• Turns out that a local learning method like pseudo-likelihood when combined with a local inference method such as ICM does quite well. Close to optimal results.

( , )

( , )( ) ( | ) =

i i Ni

i j j N j

j

U f f

i N U f fi X

f L

ePL f P f f

e

( ) ( , )ii i N

i

U f U f f

41