4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence...

Intelligence4190.408 Artificial Intelligence (2015-Spring)

4190.408 2015-Spring

Bayesian Networks – 3, 4Inference with Probabilistic Graphical Models

Byoung-Tak Zhang

Biointelligence Lab

Seoul National University

기계학습이란?

• 학습시스템:

– 환경 E와의상호작용으로부터획득한경험적인데이터 D를바탕으로모델 M을자동으로구성하여스스로성능 P를향상하는시스템

• Self-improving Systems (인공지능관점)

• Knowledge Discovery (데이터마이닝관점)

• Data-Driven Software Design (소프트웨어공학관점)

• Automatic Programming (컴퓨터공학관점)

Machine Learning as Automatic Programming

ComputerData

ProgramOutput

ComputerData

Output

Program

Traditional Programming

Machine Learning

Machine Learning (ML): Three Tasks

• Supervised Learning– Estimate an unknown mapping from known input and target output pairs– Learn fw from training set D = {(x,y)} s.t.– Classification: y is discrete– Regression: y is continuous

• Unsupervised Learning– Only input values are provided– Learn fw from D = {(x)} s.t.– Density estimation and compression– Clustering, dimension reduction

• Sequential (Reinforcement) Learning– Not target, but rewards (critiques) are provided “sequentially”– Learn a heuristic function fw from Dt = {(st,at,rt) | t = 1, 2, …} s.t.– With respect to the future, not just past– Sequential decision-making– Action selection and policy learning

)()( xxw fyf

xxw )(f

( , , )t t tf a rw s

기계학습모델

• 감독학습모델– Neural Nets

– Decision Trees

– K-Nearest Neighbors

– Support Vector Machines

• 무감독학습모델– Self-Organizing Maps

– Clustering Algorithms

– Manifold Learning

– Evolutionary Learning

• 확률그래프모델– Bayesian Networks

– Markov Networks

– Hidden Markov Models

– Hypernetworks

• 동적시스템모델– Kalman Filters

– Sequential Monte Carlo

– Particle Filters

– Reinforcement Learning

Outline

• Bayesian Inference– Monte Carlo– Importance Sampling– MCMC

• Probabilistic Graphical Models– Bayesian Networks– Markov Random Fields

• Hypernetworks– Architecture and Algorithms– Application Examples

• Discussion

Bayes Theorem

MAP vs. ML

• What is the most probable hypothesis given data?• From Bayes Theorem

• MAP (Maximum A Posteriori)

• ML (Maximum Likelihood)

Bayesian Inference

Intelligence4190.408 Artificial Intelligence (2015-Spring) 10

Prof. Schrater’s Lecture Notes

(Univ. of Minnesota)

Monte Carlo (MC) Approximation

Markov chain Monte Carlo

MC with Importance Sampling

Graphical Models

Graphical Models (GM)

Causal Models Chain Graphs Other Semantics

Directed GMsDependency Networks Undirected GMs

Bayesian Networks

DBNsFST

Factorial HMM MixedMemory Markov Models

Kalman

Segment Models

Mixture Models

Decision Trees Simple

Models

Markov RandomFields / Markov

networks

Gibbs/BoltzmanDistributions

BAYESIAN NETWORKS

Bayesian Networks

• Bayesian network

– DAG (Directed Acyclic Graph)

– Express dependence relations between variables

– Can use prior knowledge on the data (parameters)

iiXPP1

)|()( paX

P(A,B,C,D,E)

= P(A)P(B|A)P(C|B) P(D|A,B)P(E|B,C,D)

Representing Probability Distributions

• Probability distribution= probability for each combination of values of these attributes

• Naïve representations (such as tables) run into troubles– 20 attributes require more than 220 106 parameters

– Real applications usually involve hundreds of attributes

Hospital patients described by

• Background: age, gender, history of diseases, …

• Symptoms: fever, blood pressure, headache, …

• Diseases: pneumonia, heart attack, …

Bayesian Networks - Key Idea

• utilize conditional independence

• Graphical representation of conditional independence respectively “causal” dependencies

Exploit regularities !!!

Bayesian Networks

1. Finite, directed acyclic graph

2. Nodes: (discrete) random variables

3. Edges: direct influences

4. Associated with each node: a table representing a conditional probability distribution (CPD), quantifying the effect the parents have on the node

Bayesian Networks

X3(0.2, 0.8) (0.6, 0.4)

true 1 (0.2,0.8)

true 2 (0.5,0.5)

false 1 (0.23,0.77)

false 2 (0.53,0.47)

Example: Use a DAG to model the causality

TrainStrike

MartinLate

NormanLate

ProjectDelay

OfficeDirty

BossAngry

BossFailure-in-Love

MartinOversleep

NormanOversleep

Normanuntidy

Example: Attach prior probabilities to all root nodes

TrainStrike

MartinLate

NormanLate

ProjectDelay

OfficeDirty

BossAngry

BossFailure-in-Love

MartinOversleep

NormanOversleep

MartinOversleep ProbabilityT 0.01F 0.99

TrainStrike ProbabilityT 0.1F 0.9

NormanOversleep ProbabilityT 0.2F 0.8

Boss failure-in-love ProbabilityT 0.01F 0.99

Normanuntidy

Example: Attach prior probabilities to non-root nodes

TrainStrike

MartinLate

NormanLate

ProjectDelay

OfficeDirty

BossAngry

BossFailure-in-Love

MartinOversleep

NormanOversleep

Norman oversleepT F

Norman

untidy

T 0.6 0.2

F 0.4 0.8

Train strike

Martin oversleep

T F T F

Martin Late

T 0.95 0.8 0.7 0.05

F 0.05 0.2 0.3 0.95

Normanuntidy

Each column is summed to 1.

Example: Attach prior probabilities to non-root nodes

TrainStrike

MartinLate

NormanLate

ProjectDelay

OfficeDirty

BossAngry

BossFailure-in-Love

MartinOversleep

NormanOversleep

Normanuntidy

Each column is summed to 1.

Boss Failure-in-love

Project Delay

T F T F

Office Dirty

T F T F T F T F

Boss Angry

very 0.98 0.85 0.6 0.5 0.3 0.2 0 0.01

mid 0.02 0.15 0.3 0.25 0.5 0.5 0.2 0.02

little 0 0 0.1 0.25 0.2 0.3 0.7 0.07

no 0 0 0 0 0 0 0.1 0.9

Inference

MARKOV RANDOM FIELDS (MARKOV NETWORKS)

Graphical Models

Directed Graph(e.g. Bayesian Network)

Undirected Graph(e.g. Markov Random Field)

Bayesian Image Analysis

Original Image Degraded (observed) Image

Transmission

Likelihood Marginal

yProbabilit PrioriA Processn Degradatio

yProbabilit PosterioriA Image Degraded

Image OriginalImage OriginalImage DegradedImage DegradedImage Original

PrPr Pr

Image Analysis

• We could thus represent both the observed image (X) and the true image (Y) as Markov random fields.

• And invoke the Bayesian framework to find P(Y|X)

X – observed image

Y – true image

Details

• Remember

• P(Y|X) proportional to P(X|Y)P(Y)– P(X|Y) is the data model.

– P(Y) models the label interaction.

• Next we need to compute the prior P(Y=y) and the likelihood P(X|Y).

P(Y | X) =P(X |Y )P(Y )

P(X)µP(X |Y )P(Y )

Back to Image Analysis

Likelihood can be modeled as a mixture of Gaussians.

The potential is modeled to capture the domain knowledge. One common model is the Isingmodel of the form βyiyj

Bayesian Image Analysis

• Let X be the observed image = {x1,x2…xmn}

• Let Y be the true image = {y1,y2…ymn}

• Goal : find Y = y* = {y1*,y2*…} such that P(Y = y*|X) is maximum.

• Labeling problem with a search space of Lmn

– L is the set of labels.

– m*n observations.

Unfortunately

Observed Image SVM MRF

Markov Random Fields (MRFs)

• Introduced in the 1960s, a principled approach for incorporating context information.

• Incorporating domain knowledge .

• Works within the Bayesian framework.

• Widely worked on in the 70s, disappeared over the 80s, and finally made a big come back in the late 90s.

Markov Random Field

• Random Field: Let be a family of random variables defined on the set S , in which each random variable … takes a value in a label set L. The family F is called a random field.

• Markov Random Field: F is said to be a Markov random field on S with respect to a neighborhood system N if and only if the following two conditions are satisfied:

},...,,{ 21 MFFFF

Positivity: ( ) 0,P f f F

)|(}){|( :tyMarkovianiiNii ffPiSfP

Inference

• Finding the optimal y* such that P(Y=y*|X) is maximum.

• Search space is exponential.

• Exponential algorithm - simulated annealing (SA)

• Greedy algorithm – iterated conditional modes (ICM)

• There are other more advanced graph cut based strategies.

Sampling and Simulated Annealing

• Sampling– A way to generate random samples from a (potentially very

complicated) probability distribution.

– Gibbs/Metropolis.

• Simulated annealing– A schedule for modifying the probability distribution so that, at “zero

temperature”, you draw samples only from the MAP solution.

• If you can find the right cooling schedule the algorithm will converge to a global MAP solution.

• Flip side --- SLOW finding the correct schedule is non trivial.

Iterated Conditional Modes

• Greedy strategy, fast convergence

• Idea is to maximize the local conditional probabilities iteratively, given an initial solution.

• Simulated annealing with T =0 .

Parameter Learning

• Supervised learning (easiest case)

• Maximum likelihood:

• For an MRF: ( | )/1( | )

U f TP f eZ

* arg max ( | )P f

Pseudo Likelihood

• So we approximate

• Large lattice theorem: in the large lattice limit M, PL converges to ML estimate.

• Turns out that a local learning method like pseudo-likelihood when combined with a local inference method such as ICM does quite well. Close to optimal results.

( , )( ) ( | ) =

i i Ni

i j j N j

i N U f fi X

ePL f P f f

( ) ( , )ii i N

U f U f f

4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence...

Documents

Transcript of 4190.408 2015-Spring Bayesian Networks 3, 4B io I ntelligence 4190.408 Artificial Intelligence...

Being Bayesian About Network Structure. A Bayesian ...people.ee.duke.edu/~lcarin/BayesianNetworkStructure.pdf · Title: Being Bayesian About Network Structure. A Bayesian Approach

Bayesian Bayesian Data Analysis-Gelman Et Al

14.126 Spring 2016 Bayesian Games Slides Lecture … · Bayesian Games Mihai Manea MIT. Partly based on lecture notes by Muhamet Yildiz.

Problem Solving & Heuristic Search - bi.snu.ac.kr io I ntelligence 4190.408 Artificial Intelligence (2016-Spring) 4190.408 Artificial Intelligence 2016-Spring Problem Solving & Heuristic

Stat 460course1.winona.edu/bdeppa/stat 450-460/Notes/Chapter … · Web viewStat 460. Chapter 16: Bayesian estimation and inference. Spring 2020. A Bayesian is one who, vaguely

Bayesian Networks October 9, 2008 Sung-Bae Cho. Bayesian Network –Introduction –Inference of Bayesian Network –Modeling of Bayesian Network Bayesian Network.

Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

ABC Methods for Bayesian Model Choiceconferences.inf.ed.ac.uk/bayes250/slides/robert.pdf · ABC Methods for Bayesian Model Choice Approximate Bayesian computation Regular Bayesian

Bayesian Classifier - Dan Rothl2r.cs.uiuc.edu/Teaching/CS446-17/Lectures/09-LecBayes-NB.pdf · Bayesian Learning CS446 –Spring ‘17 Naive Bayes (3) V MAP = argmax v P(x 1, x 2,

Bayesian Statistics - UvA...Preface These lecture notes were written for the course ‘Bayesian Statistics’, taught at University of Amsterdam in the spring of 2007. The course was

Bayesian Learning of Bayesian Networks with Informative Priors - …stoics.org.uk/~nicos/pbs/amai.pdf · 2019. 3. 4. · Bayesian Learning of Bayesian Networks Bayesian Learning of

Lecture 13: Bayesian networks I · 2019. 5. 14. · Lecture 13: Bayesian networks I CS221 / Spring 2019 / Charikar & Sadigh. Pac-Man competition 1. (1783) Adam Klein 2. (1769) Jonathan

Introduction to Bayesian Networks...Introduction to Bayesian Networks Huizhen Yu janey.yu@cs.helsinki.ﬁ Dept. Computer Science, Univ. of Helsinki Probabilistic Models, Spring, 2010

Bayesian Interpretations of Regularization - mit.edu9.520/spring09/Classes/class15-bayes.pdf · The Plan Bayesian estimation basics Bayesian interpretation of ERM Bayesian interpretation

CanIt-PRO User’s Guide...Chapter9, “Bayesian Filtering”, explains CanIt-PRO’s Bayesian filtering module. Bayesian filtering Bayesian filtering uses statistical analysis

An introduction to Bayesian networks Stochastic Processes Course Hossein Amirkhani Spring 2011.

4190.408 2015-Spring Artificial Intelligence: Introduction · 2016-04-17 · B io I ntelligence 4190.408 Artificial Intelligence (2016-Spring) 4190.408 2016-Spring Artificial Intelligence:

4190.408 2016-Spring Intelligent Agents · •focuses on disaster or emergency-response scenarios 1. Drive a utility vehicle at the site. 2. Travel dismounted across rubble. 3. Remove

Bayesian Regression & Classiﬁcation · Bayesian Regression & Classiﬁcation learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic

Bayesian Optimization with Robust Bayesian Neural …papers.nips.cc/paper/6117-bayesian-optimization-with-robust... · Bayesian Optimization with Robust Bayesian Neural Networks Jost