Restricted Boltzman Machine (RBM) presentation of fundamental theory

M&S

Restricted Boltzman Machine - Theory -

Seongwon Hwang

M&S

Energy Based Model

M&S

1. Scalar Function

θV cos0

θgtθV sin0

jgtθViθVV )sin(cos 002

2mvmghE

*Total Energy = Potential + Kinetic Energy

M&S

2. Principle of Minimum Energy

E

Principle of Maximum Entropy

Principle of Minimum Energy

Equilibrium at fixed internal energy

Equilibrium at fixed entropy

EquilibriumUnstable

S

M&S

In Neural Network

Supervised Model

),,( jiij yxWEix

jy

Input variables

Output variables Energy = - Correlation

ijW

Unsupervised Model

ix Input variables

),( iij xWEEnergy with input variables =

- Correlation ijW

M&S

In Neural Network

Unsupervised Model with Hidden units

),,( jiij hvWEiv

jh

Visible variables

Hidden variables Energy = - Correlation

ijW

Energy Correlation

M&S

In Neural Network

Learning in unsupervised model

x

),( xWE

dataxminx x

),( xWE

datax

'WW

M&S

How we make energy in neural network?

Hopfield Neural Network

M&S

Two constraints

1. Symmetric weight between neurons

2. Asynchronously learning required for stable state

jiij WW 3x

1x 2x

3x

1x 2x

M&S

Two constraints

1. Symmetric weight between neurons

2. Asynchronously learning required for stable state

jiij WW 3x

1x 2x

3x

1x 2x

3x

1x 2x

Randomly activate node

M&S

Define energy by Hopfield

1x

2x

3x

5x

4x

ji

ijji wxxE

2

3

11

2

4 3

} 1 ,0 {ix

M&S

Example for intuition

1x

2x

3x

5x

4x

2

3

11

2

4 3

11 x

} 1 ,0 {ix

12 x 13 x 04 x 05 x

01 x 12 x 13 x 04 x 15 x

7E6E

... ...

M&S

Application - Data store

1

2x

3x

5x

4x

2

3

11

2

4 3

} 1 ,0 {ix

M&S

Application - Data store

1

1

1

0

0

2

3

11

2

4 3

} 1 ,0 {ix

M&S

Learning in Hopfield Network

1x

2x

3x

5x

4x

12w

15w

13w45w

35w

23w34w

} 1 ,0 {ix

ji

ijji wxxE

Several dataset

ijij wwWeight uptdate

M&S

Boltzman Machine

M&S

Overview

Energy Correlation

Probability Correlation

Hopfield Neural Network

Boltzman Machine

M&S

Overview

Probablity Correlation

Boltzman Machine

j

vE

vE

i j

i

eevvP )(

)(

)(

Energy

Boltzman Distribution

M&S

Thermalphysics for boltzman distribution

M&S

Macrostate Vs. Microstate

TH H

10 100 500

HH THT HTH TTT HHT THH HTT T

Total number of microstate: 8

Microstate 1

Microstate 2

Microstate 3

…

Position, Velocity…

M&S


TH H

10 100 500


Total number of macrostate : 4

1

T

2

03

M&S


TH H

10 100 500


Total number of macrostate : 4

2

H

1

30

Temperature, Pressure…

M&S

Canonical Ensemble (NVT Ensemble)

N, V, T, Fixed ensemble of microstates

0 , , , ETVN

M&S

Canonical Ensemble (NVT Ensemble)

N, V, T, Fixed ensemble of microstates

0 , , , ETVN1 , , , ETVN

M&S


j

TkE

TkE

i Bj

Bi

eeEP /

/

)(

!...!!!

210 NNNNW

SW ln

≈

Maximum Entropy!

Number of cases Number of particles of total system

ith microstate’s number of particle

...) ,0 ,0 ,0 ,(N

...) ,0 ,0 ,2 ,2( N

...) ,0 ,1 ,2 ,3( N...

Number of cases

M&S


j

TkE

TkE

i Bj

Bi

eeEP /

/

)()( iEP

iE

M&S

1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1

0 1 0 12 0 0 0 1 0 0 0 0 30 0 1 0 0 0 0 1 0 00 0 0 0 4 0 0 6 0 00 0 2 0 0 0 3 2 0 10 3 0 0 7 1 0 0 0 00 0 1 0 4 5 3 0 1 00 0 0 0 2 0 0 0 0 00 0 0 0 0 1 0 0 0 1

0 1 2 3 4 5 6 7

2

j

TkE

TkE

i Bj

Bi

eeEP /

/

)(

1 1 1 1 1 1 1 1 1 11 1 2 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 0 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1

M&S

H

2 0 0 0 1 0 0 0 0 30 0 1 0 0 0 0 1 0 00 0 0 0 4 0 0 6 0 00 0 2 0 0 0 3 2 0 10 3 0 0 7 1 0 0 0 00 0 1 0 4 5 3 0 1 00 0 0 0 2 0 0 0 0 00 0 0 0 0 1 0 0 0 1

0 1 2 3 4 5 6 7

Intuition for connection between Physics and Network

M&S

H

2 0 0 0 1 0 0 0 0 30 0 1 0 0 0 0 1 0 00 0 0 0 4 0 0 6 0 00 0 2 0 0 0 3 2 0 10 3 0 0 7 1 0 0 0 00 0 1 0 4 5 3 0 1 00 0 0 0 2 0 0 0 0 00 0 0 0 0 1 0 0 0 1

0 1 2 3 4 5 6 7

Intuition for connection between Physics and Network

As energy changesChanges of molecular struture

Changes of configuration of network

Physics

Network

M&S

Helmholtz free energy

j

Eβbb

jeTkZTkF )ln(ln

= Free energy associated with Canonical Ensemble

M&S

Overview

Probability

j

vE

vE

i j

i

eevP )(

)(

)(

Energy

Configurations...) ,1 ,0 ,1 ,0(1 v ...) ,1 ,0 ,1 ,1(2 v

N2

N-dimensional binary data

...

1v 2v 3v 4v

5v 6v 7v

M&S

Overview

Probability

lk

hvE

hvE

ii lk

ji

eehvP

.

) ,(

) ,(

) ,(

Energy

1v 2v 3v 4v

1h 2h 3h

M&S

Restricted Boltzman Machine

M&S

Restriction – NO connections between H and V respectvely

Boltzman Machine


1v 2v 3v 4v

1h 2h 3h

1v 2v 3v 4v

1h 2h 3h

M&S

Restriction – NO connections between H and V respectvely


1v 2v 3v 4v

1h 2h 3h

Conditional Independent!

)|()|()|,( CBPCAPCBAP

)|()|()|,( 1111 vhPvhPvhhP

j

j vhPvhP )|()|(

i

i hvPhvP )|()|(

General Form

M&S

Energy from Hopfield Network

i j j

jji

iijiij hcvbhvwhvE ),(

1v 2v 3v 4v

1h 2h 3h

hv

hvE

hvE

j

vE

vE

i eehvP

eevvP

j

i

,

),(

),(

)(

)(

),()(

v‘ biash‘ bias

M&S

Two Important Conditional Probabilites! – First

i

jiijj cvwσvhP )()|1(

1v 2v 3v 4v

1h 2h 3h

x

x

eexσ

1

)(

M&S

Two Important Conditional Probabilites! – Second

1v 2v 3v 4v

1h 2h 3h

i

ijiji bhwσhvP )()|1(

M&S

Generative Vs. Discriminative Model

),( ),|( yxPyxP

Y

X

Y

X

)|( xyPEX) Gaussians, Sigmoid Belief Networks,

Bayesian NetworksEX) Neural Network, Logistic Regression,

Support Vector Machine

<Generative Model> <Discriminative Model>

RBM

M&S

Maximum Likelihood Estimator

Population

Sample

Maximizing the possibility based on observed samples to estimate unobserved parameters of population

M&S

Maximum Likelihood Estimator

EX) What is the probability of coin in the case of head?

H H T

322 )1()|()( ppppθxPθL

032)( 2

ppppL

32

p

M&S

Learning in RBM

Cost = Negative Log-Likelihood (NLL)

)|(ln)|( θvPvθNLL hv

hvE

h

hvE ee,

),(),( lnln

model),(),( )|(

hvE

θhvE

θθvθNLL

data

Gradient Discent for NLL<...> Expectation

Free Energy 1 Free Energy ∞

Positive Phase Negative Phase

M&S

Learning in RBM

model),(),( )|(

hvE

θhvE

θθvθNLL

data

Gradient Discent for NLL

1 0 1 1

1h 2h 3h} 1 ,0 {jh

Easy to compute!

M&S

Learning in RBM

model),(),( )|(

hvE

θhvE

θθvθNLL

data


1v 2v 3v 4v

1h 2h 3h} 1 ,0 {jh

Hard to compute!

} 1 ,0 {iv

mn2

Total number of possible configurations:

M&S

Markov Chain Monte Carlo (MCMC)

1. Markov Chain

First-order Markov chain is that next state depends only on immediately preceding one, Second or higher order’s next state depends on two or more preceding ones.

M&S

Markov Chain Monte Carlo (MCMC)

2. Monte Carlo – Compute the value statistically by using random numbers

samples of number Totalcircle the in sample of Number

4π

22 yx 1<Evalutation>

VS.

Sampling

EX) Compute circular constant

M&S

Gibbs Sampling

1. Set up initial values randomly

Multi-Dimensional Variants

... , , 321 xxx ...) , , ,( 321 xxxp

Joint Probability or Conditional Probability

or

2. Sampling with conditional distribution

3. Perform this to reach stationary value

- Algorithm -

...) ,0 ,1 ,1 ,0 ,0 ,0 ,1( ...) ,0 ,1 ,1 ,1 ,0 ,1 ,0( ...) ,1 ,1 ,1 ,0 ,1 ,0 ,1(0r 1r 2r

)|( 01 rrp )|( 12 rrp ...

)|( ii xxp

M&S

k-th Contrastive Divergence

1. Usage of real data as initial values

2. kth sample is equal to expectation of desirable distribution

- Characteristics -

...) ,0 ,1 ,1 ,0 ,0 ,0 ,1( ...) ,0 ,1 ,1 ,1 ,0 ,1 ,0( ...) ,1 ,1 ,1 ,0 ,1 ,0 ,1(

data

1r 2r)|( 01 rrp )|( 12 rrp

2k

M&S

k-th Contrastive Divergence

1. Usage of real data as initial values

2. kth sample is equal to expectation of desirable distribution

3. That k is 1 is enough to be converged since the real data is used as initial valuesData

- Characteristics -

...) ,0 ,1 ,1 ,0 ,0 ,0 ,1( ...) ,0 ,1 ,1 ,1 ,0 ,1 ,0(

data

1r)|( 01 rrp

M&S

Learning in RBM

model),(),( )|(

hvE

θhvE

θθvθNLL

data


1v 2v 3v 4v

1h 2h 3h} 1 ,0 {jh

Hard to compute!

} 1 ,0 {iv

mn2

Total number of possible configurations:

M&S

Approximation in RBM

model),(

hvEθ

m

mxfm

xf )(1)( model

)()()(1 1 kk

m

m xfxfxfm

MCMC_Gibbs sampling

CD_k=1

M&S

Sampling Algorithm in RBM

1st Step

1 0 1 1

1h 2h 3h

- Usage of real data as an initial value

dataInput

} 1 ,0 {jh

} 1 ,0 {iv

M&S


2nd Step

1 0 1 1

1 2h 3h

- Sampling each hidden unit with conditional probability starting from initial values

dataInput

} 1 ,0 {jh

} 1 ,0 {iv

M&S

Two Important Conditional Probabilites! – First

i


1v 2v 3v 4v

1h 2h 3h

x

x

eexσ

1

)(

M&S


2nd Step

1 0 1 1

1 2h 3h


dataInput

} 1 ,0 {jh

} 1 ,0 {iv

i


M&S


2nd Step

1 0 1 1

1 0 3h


dataInput

} 1 ,0 {jh

} 1 ,0 {iv

i


M&S


2nd Step

1 0 1 1

1 0 1


dataInput

} 1 ,0 {jh

} 1 ,0 {iv

i


M&S


3rd Step

0 0 1 1

1 0 1

- Sampling each input unit with conditional probability starting from sampled hidden units

} 1 ,0 {jh

} 1 ,0 {iv

Reconstruction! Generative Model!

M&S

Two Important Conditional Probabilites! – Second

1v 2v 3v 4v

1h 2h 3h

i


M&S


3rd Step

0 0 1 1

1 0 1


} 1 ,0 {jh

} 1 ,0 {iv


i


M&S


3rd Step

0 0 1 0

1 0 1


} 1 ,0 {jh

} 1 ,0 {iv


i

ijiji bhwσhvP )()|1(CD_k=1

M&S


4th Step - k times performing CD_k

1v 2v ...v

1h 2h ...h

1v 2v ...v

1h 2h ...h

…1h 2h ...h

1v 2v ...v

t = 0 t = 1 t = ∞ ≈ k

M&S


4th Step - k times performing CD_k=1

1v 2v ...v

1h 2h ...h

1v 2v ...v

1h 2h ...h

…1h 2h ...h

1v 2v ...v

t = 0 t = 1 t = ∞ ≈ k

Data Model

M&S

Learning in RBM

model),(),( )|(

hvE

θhvE

θθvθNLL

data


cbwθ ,,

jiij

hvw

hvE

),(

ii

vb

hvE

),(

jj

hc

hvE

),(

i j j

jji

iijiij hcvbhvwhvE ),(

M&S

Learning in RBM

model),(),( )|(

hvE

θhvE

θθvθNLL

data


cbwθ ,,

jiij

hvw

hvE

),(

ii

vb

hvE

),(

jj

hc

hvE

),(

)( model jidatajiwij hvhvηwΔ

)( model idataibi vvηbΔ

)( model jdatajcj hhηcΔ

M&S

Learning in RBM

model),(),( )|(

hvE

θhvE

θθvθNLL

data


cbwθ ,,

)( model jidatajiwij hvhvηwΔ

)( model idataibi vvηbΔ

)( model jdatajcj hhηcΔ

i

ijiijji vcvwσhv )(

i

jiijj cvwσh )(

ii vv

M&S

Learning in RBM

model),(),( )|(

hvE

θhvE

θθvθNLL

data


cbwθ ,,

ijtij

tij wΔww 1

iti

ti bΔbb 1

jtj

tj cΔcc 1

M&S

Learning in RBM

model),(),( )|(

hvE

θhvE

θθvθNLL

data


cbwθ ,,

)( )(1 kiib

ti

ti vvηbb

))()(( )()(1

i

kij

kiij

iijiijw

tij

tij vcvwσvcvwσηww

))()(( )(1

ij

kiij

ijiijc

tj

tj cvwσcvwσηcc

ModelData

M&S

Intuition for RBM


)|( vθNLL hv

hvE

h

hvE ee,

),(),( lnln

Model

Data ),( hvE

),( hvEEnergy Surface in global configurations

Datapoint + Hidden(datapoint)

Reconstruction + Hidden(reconstruction)

Sampling

M&S

Intuition for RBM


)|( vθNLL hv

hvE

h

hvE ee,

),(),( lnln

Model

Data ),( hvE

),( hvEEnergy Surface in global configurations

M&S

Intuition for RBM


)|( vθNLL hv

hvE

h

hvE ee,

),(),( lnln

Sampling Direction

Energy Surface in global configurations

Sampling

Global Minimum

Global MinimumDatapoint

M&S

Intuition for RBM


)|( vθNLL hv

hvE

h

hvE ee,

),(),( lnln

Sampling Direction

Datapoint

j

vE

vE

i j

i

eevvP )(

)(

)(

ith configuration

Overall configuration

M&S

Intuition for RBM

Sampling Direction

Global Minimum

1v 2v ...v

1h 2h ...h

1v 2v ...v

1h 2h ...h

t = 0 t = 1

i


i



lk

hvE

hvE

ii lk

ji

eehvP

.

) ,(

) ,(

) ,(

)( iEP

iE

Energy

M&S

Intuition for RBM

Sampling Direction

Global Minimum

1v 2v ...v

1h 2h ...h

1v 2v ...v

1h 2h ...h

t = 0 t = 1

…1v 2v ...v

1h 2h ...h

t = ∞

Energy

Energy Surface in global configurations

Sampling

Global Minimum

M&S

Intuition for RBM

Contrastive Divergence (CD)

PCD Vs. CD

Global Minimum

M&S

Intuition for RBM

Persistent Contrastive Divergence (PCD)

PCD Vs. CD

Global Minimum

M&S

Intuition for RBM


PCD Vs. CD

Global Minimum

Previous sample point

M&S

Intuition for RBM


PCD Vs. CD

Global Minimum

Previous sample point

Winner is PCD!

M&S

Practice

Input Data 1th epoch Reconstruction

M&S

Practice

11th epoch Reconstruction 61th epoch Reconstruction

M&S

In Reality – Unsupervised Pretraining

1v 2v 3v ...v

1h 2h ...h

1h 2h 3h ...h

1y 2y ...y

Pretraining!

M&S

Thank you!

Restricted Boltzman Machine (RBM) presentation of fundamental theory

Data & Analytics

Transcript of Restricted Boltzman Machine (RBM) presentation of fundamental theory