Simulation of rare events and optimisation with the cross-entropy method
-
Upload
arthur-breitman -
Category
Education
-
view
1.199 -
download
0
description
Transcript of Simulation of rare events and optimisation with the cross-entropy method
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Using cross-entropy techniques for rare event
simulation and optimization
Arthur Breitman
NYC Machine learning meetup
August 18, 2011
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
EntropyKullback-Leibler divergence
OutlineWhat is cross-entropy?
EntropyKullback-Leibler divergence
From Riemann to Monte-CarloRiemann integrationMonte-Carlo integrationImportance sampling
Cross-Entropy techniquesAnalytical expressionsSimulation of rare eventsOptimizationFitting parameters
Cross-Entropy tricksMultiple maximaSlow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
EntropyKullback-Leibler divergence
Information entropy
definition of information entropy
Entropy measures disorder of a physical system
Entropy measures information (Shannon)
Entropy measures ignorance (E.T. Jaynes)
Formally:
H = −∑
x∈Ω
p(x) ln(p(x))
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
EntropyKullback-Leibler divergence
The continuous case
In the continuous case, for a random variable X with p.d.f p(x)entropy is defined as
H(X ) = −
∫
Ω
P(x) ln(p(x))dx
Simple, right?
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
EntropyKullback-Leibler divergence
The entropy of a probability distribution is meaningless
Wrong!
Not invariant under a change of variable
Can even be negative!
Not an extension of Shannon’s entropy.
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
EntropyKullback-Leibler divergence
E.T. Jaynes to the rescue
E.T. Jaynes, adjusted the definition. Consider a sequence ofdiscrete values in Ω dense in Ω, it must a approach a distributionm. Set
H(X ) = −
∫
Ω
P(x) ln
(
p(x)
m(x)
)
dx
N.B. m is not necessarily a probability distribution, just a density,so improper priors are O.K.
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
EntropyKullback-Leibler divergence
OutlineWhat is cross-entropy?
EntropyKullback-Leibler divergence
From Riemann to Monte-CarloRiemann integrationMonte-Carlo integrationImportance sampling
Cross-Entropy techniquesAnalytical expressionsSimulation of rare eventsOptimizationFitting parameters
Cross-Entropy tricksMultiple maximaSlow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
EntropyKullback-Leibler divergence
Definition of KL divergence
Kullback-Leibler divergence: entropy of a probability distribution p
relative to probability distribution q
DKL(P ||Q) = −
∫
Ω
P(x) ln
(
p(x)
q(x)
)
dx
Similar but distinct from entropy.
Expected number of nats (or bits) to encode data drawn fromQ assuming it is drawn from P .
Not symmetric!
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
EntropyKullback-Leibler divergence
Why code length matter
All ML problems ⇔ fitting a probability distribution
KL divergence measures how concise your description is
Relates to MDL and Solomonoff induction
PAC-learning patches against a lack of epistemology
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
EntropyKullback-Leibler divergence
Likelihood of parameters and Cross-Entropy
Given a sample qi of Q, and Pθ∈Θ,
LL(θ|qi ) = H(Pθ) + DKL
(
Pθ
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
1
N
∑
i
δqi
)
The likelihood of θ is the KL-divergence of Pθ w.r.t a Dirac comb.
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Riemann integrationMonte-Carlo integrationImportance sampling
OutlineWhat is cross-entropy?
EntropyKullback-Leibler divergence
From Riemann to Monte-CarloRiemann integrationMonte-Carlo integrationImportance sampling
Cross-Entropy techniquesAnalytical expressionsSimulation of rare eventsOptimizationFitting parameters
Cross-Entropy tricksMultiple maximaSlow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Riemann integrationMonte-Carlo integrationImportance sampling
Riemann integration
How does one compute the integral of a function? Rectanglemethod:
∫ b
a
f (x)dx →1
N
N−1∑
i=0
f
(
a+ (b − a)i
N
)
Linear convergence.
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Riemann integrationMonte-Carlo integrationImportance sampling
The curse of dimensionality
Multiple dimensions?
∫ b1
a1
· · ·
∫ bm
am
f (x)dx →1
Nm
N−1∑
i1=0
· · ·
N−1∑
im=0
f
(
a+1
Ni (b− a)
)
Computation is exponential in m.
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Riemann integrationMonte-Carlo integrationImportance sampling
OutlineWhat is cross-entropy?
EntropyKullback-Leibler divergence
From Riemann to Monte-CarloRiemann integrationMonte-Carlo integrationImportance sampling
Cross-Entropy techniquesAnalytical expressionsSimulation of rare eventsOptimizationFitting parameters
Cross-Entropy tricksMultiple maximaSlow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Riemann integrationMonte-Carlo integrationImportance sampling
Monte-Carlo integration
If P is a probability distribution over Ω, draw xi from P :
∫
Ω
f (x)dx ∼1
N
N∑
i=1
f (xi )
p(xi )
Very simple to implement, often p ∼ 1
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Riemann integrationMonte-Carlo integrationImportance sampling
Monte-Carlo convergence
Let random variable Xp = f (x)/p(x)
If var(Xp) < ∞, convergence is O(N1/2) by the central-limittheorem!
If m > 2, Monte-Carlo becomes attractive.
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Riemann integrationMonte-Carlo integrationImportance sampling
Problems with MC
If the mass of f is concentrated in a small region, convergencecan be very slow.
also a problem with Riemann integration...
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Riemann integrationMonte-Carlo integrationImportance sampling
OutlineWhat is cross-entropy?
EntropyKullback-Leibler divergence
From Riemann to Monte-CarloRiemann integrationMonte-Carlo integrationImportance sampling
Cross-Entropy techniquesAnalytical expressionsSimulation of rare eventsOptimizationFitting parameters
Cross-Entropy tricksMultiple maximaSlow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Riemann integrationMonte-Carlo integrationImportance sampling
Importance sampling
Sample preferably the regions of interest by picking p tominimize the variance of f /p
In Riemann world, equivalent to an irregular grid
Ideal sampling distribution (if f > 0) is f∫f, but we don’t
know∫
f !
Best convergence when χ2 of f w.r.t p is minimized
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Riemann integrationMonte-Carlo integrationImportance sampling
Adaptive importance sampling
What if we don’t know the shape of f ?
Learn it adaptively from the sampling.
Iteratively improve the importance sampling function.
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Riemann integrationMonte-Carlo integrationImportance sampling
Vegas algorithm and cross-entropy
Vegas algorithm, use histograms and separate variables
Cross-entropy algorithm, pick p from a family of distributionsto minimize cross-entropy to the sample
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Analytical expressionsSimulation of rare eventsOptimizationFitting parameters
OutlineWhat is cross-entropy?
EntropyKullback-Leibler divergence
From Riemann to Monte-CarloRiemann integrationMonte-Carlo integrationImportance sampling
Cross-Entropy techniquesAnalytical expressionsSimulation of rare eventsOptimizationFitting parameters
Cross-Entropy tricksMultiple maximaSlow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Analytical expressionsSimulation of rare eventsOptimizationFitting parameters
Why cross-entropy?
In many cases, the expression is analytical and computationallycheap to derive, e.g.
the uniform distribution
the categorical distribution (finite, discrete)
all the natural exponential family
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Analytical expressionsSimulation of rare eventsOptimizationFitting parameters
The natural exponential distribution?
fX (x|θ) = h(x) exp (θ∗x− A(θ))
theta is the sufficient statistic
maximum cross-entropy distribution given θ w.r.t dH
Examples: normal, multivariate normal, gamma, binomial,multinomial, negative binomial
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Analytical expressionsSimulation of rare eventsOptimizationFitting parameters
Beta distribution
Not analytical! To fit, start with approximate values from themoment’s method
α = X
(
X (1− X
S2− 1
)
, β = (1− X )
(
X (1− X
S2− 1
)
The likelihood is given by
n(ln(Γ(α+β)−ln(Γ(α)−ln(Γ(β))+(α−1)n∑
i=0
ln(Xi )+(β−1)n∑
i=0
ln(1−Xi )
The first and second derivatives are the digamma and trigammafunction, available in the gsl. Newton’s method using the Jacobianconverges in a couple iterations. Very useful to model boundedvariables.
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Analytical expressionsSimulation of rare eventsOptimizationFitting parameters
OutlineWhat is cross-entropy?
EntropyKullback-Leibler divergence
From Riemann to Monte-CarloRiemann integrationMonte-Carlo integrationImportance sampling
Cross-Entropy techniquesAnalytical expressionsSimulation of rare eventsOptimizationFitting parameters
Cross-Entropy tricksMultiple maximaSlow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Analytical expressionsSimulation of rare eventsOptimizationFitting parameters
Surviving the zombie hordes
Figure: Electric fences, the horde and you
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Analytical expressionsSimulation of rare eventsOptimizationFitting parameters
Simulating zombie breakouts
Each fence (Ui , λi ) delivers u ∼ max(Ui − Exp(λi ), 0) volts.
Crossing a fence deals u damage to a zombie
Zombies come from everywhere and can take 5 damage hitseach.
Zombies outbreaks are very rare!
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Analytical expressionsSimulation of rare eventsOptimizationFitting parameters
Mere integration fails!
We can estimate this probability by sampling the randomvoltages and finding a shortest path.
Speed of Monte-Carlo proportional to poutbreak(1− poutbreak),too slow!
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Analytical expressionsSimulation of rare eventsOptimizationFitting parameters
Cross-Entropy to the rescue
We want to approximate the multivariate power distributionconditional on an outbreak occurring!
Approximate the shape by changing the parameters Ui and λi
for each fence
Generate samples, fit Ui and λi on the samples inducing anoutbreak
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Analytical expressionsSimulation of rare eventsOptimizationFitting parameters
The elite sample
What if the probability is so low that we don’t observe anyoutbreak in our sample?
Generate n samplings using the sampling distribution
If more than e samples are outbreaks, fit to those samples,break
Otherwise, fit on the e best sample, the elite sample.
Iterate
Generate a sample, weight each points by the importancesampling weight, estimate probability
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Analytical expressionsSimulation of rare eventsOptimizationFitting parameters
Other examples
Modeling rare event for any complex probability distribution,e.g. Bayesian networks.
Estimating tails for the sum of fat-tailed distributions
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Analytical expressionsSimulation of rare eventsOptimizationFitting parameters
OutlineWhat is cross-entropy?
EntropyKullback-Leibler divergence
From Riemann to Monte-CarloRiemann integrationMonte-Carlo integrationImportance sampling
Cross-Entropy techniquesAnalytical expressionsSimulation of rare eventsOptimizationFitting parameters
Cross-Entropy tricksMultiple maximaSlow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Analytical expressionsSimulation of rare eventsOptimizationFitting parameters
From integration to optimization
Using an elite sample to help convergence is a trick that does aform of hill climbing of a smooth function approximating theindicator function of the rare event.
Interesting even if not interested in integrating f .
Keep iterating based on an elite sample to converge towardsone global maximum.
variance of the sampling distribution follows the curvature off .
e.g. using a multivariate normal allows the covariance toreflect the differential
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Analytical expressionsSimulation of rare eventsOptimizationFitting parameters
Combinatorial optimization
One classical example if combinatorial optimization. To solve aTSP with Cross-Entropy:
Assume the travel is a Markov chain on the graph nodes.
Generate travels by coercing them to be permutations.
Update transition probabilities from the elite sample.
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Analytical expressionsSimulation of rare eventsOptimizationFitting parameters
Clustering
CE does clustering too!
Assign probabilities of membership to classes for each point(the sampling distribution).
Sample random membership assignments.
Use average distance to centroids to find an elite sample.
Slower than K-means but much less sensitive to initial choiceof centroids.
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Analytical expressionsSimulation of rare eventsOptimizationFitting parameters
A form of global optimization
Is it global optimization?
If the sampling distribution is bounded below by a distributionthat covers the global maximum, yes, with probability 1!
In practice we may never see one maximum and converge toanother local maximum.
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Analytical expressionsSimulation of rare eventsOptimizationFitting parameters
OutlineWhat is cross-entropy?
EntropyKullback-Leibler divergence
From Riemann to Monte-CarloRiemann integrationMonte-Carlo integrationImportance sampling
Cross-Entropy techniquesAnalytical expressionsSimulation of rare eventsOptimizationFitting parameters
Cross-Entropy tricksMultiple maximaSlow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Analytical expressionsSimulation of rare eventsOptimizationFitting parameters
Fitting model parameters with CE
Cross-Entropy techniques work generally very well for finding MLparameters of a model. Why?
Models often have different sensitivities to differentparameters, CE reflects that.
With a covariance structure, it does a form of gradient ascent.
But it can deal with discrete parameters at the same time!
It does not tend to get trapped in local maxima.
Well suited for high-dimensional parameter spaces.
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Multiple maximaSlow convergence
OutlineWhat is cross-entropy?
EntropyKullback-Leibler divergence
From Riemann to Monte-CarloRiemann integrationMonte-Carlo integrationImportance sampling
Cross-Entropy techniquesAnalytical expressionsSimulation of rare eventsOptimizationFitting parameters
Cross-Entropy tricksMultiple maximaSlow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Multiple maximaSlow convergence
Forgetting maxima
Some maxima can be ”forgotten”
Smooth changes in the sampling function.
Expand the sampling function (equivalent to applying a prioror ”shrinkage”).
Keep the entire sample
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Multiple maximaSlow convergence
Not converging to a maximum
Multiple maxima may prevent variance of the sampling fromdecreasing.
Mixtures of multivariate normals can deal with this.
They can be introduced dynamically.
Fit with EM.
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Multiple maximaSlow convergence
OutlineWhat is cross-entropy?
EntropyKullback-Leibler divergence
From Riemann to Monte-CarloRiemann integrationMonte-Carlo integrationImportance sampling
Cross-Entropy techniquesAnalytical expressionsSimulation of rare eventsOptimizationFitting parameters
Cross-Entropy tricksMultiple maximaSlow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Multiple maximaSlow convergence
Independent variables
If the sampling distribution is separable, convergence can be spedup by sampling over one dimension at a time.
Arthur Breitman crossentropy for rare event simulation and optimization
What is cross-entropy?From Riemann to Monte-Carlo
Cross-Entropy techniquesCross-Entropy tricks
Questions
Questions
Questions?
Arthur Breitman crossentropy for rare event simulation and optimization