Drift Analysis - A Tutorial
-
Upload
per-kristian-lehre -
Category
Education
-
view
4.229 -
download
2
description
Transcript of Drift Analysis - A Tutorial
![Page 1: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/1.jpg)
Drift Analysis - A Tutorial
Per Kristian Lehre
ASAP Research GroupSchool of Computer Science
University of Nottingham, [email protected]
July 6, 2012
![Page 2: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/2.jpg)
![Page 3: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/3.jpg)
What is Drift Analysis?
I Prediction of the long term behaviour of a process XI hitting time, stability, occupancy time etc.
from properties of ∆.
![Page 4: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/4.jpg)
What is Drift Analysis?
I Prediction of the long term behaviour of a process XI hitting time, stability, occupancy time etc.
from properties of ∆.
![Page 5: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/5.jpg)
What is Drift Analysis1?
ba Yk
“Drift” ≥ ε0
1NB! Drift is a different concept than genetic drift in evolutionary biology.
![Page 6: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/6.jpg)
Runtime of (1+1) EA on Linear Functions [3]
Droste, Jansen & Wegener (2002)
![Page 7: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/7.jpg)
Runtime of (1+1) EA on Linear Functions [2]
Doerr, Johannsen, and Winzen (GECCO 2010)
![Page 8: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/8.jpg)
Some history
Origins
I Stability of equilibria in ODEs (Lyapunov, 1892)
I Stability of Markov Chains (see eg [14])I 1982 paper by Hajek [6]
I Simulated annealing [19]
Drift Analysis of Evolutionary AlgorithmsI Introduced to EC in 2001 by He and Yao [7, 8]
I (1+1) EA on linear functions: O(n lnn) [7]I (1+1) EA on maximum matching by Giel and Wegener [5]
I Simplified drift in 2008 by Oliveto and Witt [18]I Multiplicative drift by Doerr et al [2]
I (1+1) EA on linear functions: en ln(n) +O(n) [22]
I Variable drift by Johannsen [11] and Mitavskiy et al. [15]
I Population drift by L. [12]
![Page 9: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/9.jpg)
About this tutorial...
I Assumes no or little background in probability theoryI Main focus will be on drift theorems and their proofs
I Some theorems are presented in a simplified formfull details are available in the references
I A few simple applications will be shown
I Please feel free to interrupt me with questions!
![Page 10: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/10.jpg)
General Assumptions
ba Yk
I Xk a stochastic process2 in some general state space XI Yk := g(Xk) were g : X → R is a “distance function”
I Two stopping times τa and τb
τa := mink ≥ 0 | Yk ≤ a τb := mink ≥ 0 | Yk ≥ b
where we assume −∞ ≤ a < b <∞ and Y0 ∈ (a, b).
2not necessarily Markovian.
![Page 11: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/11.jpg)
Overview of Tutorial
ba Yk
E [Yk+1 − Yk | Fk]
Drift Condition3 Statement Note
E [Yk+1 | Fk] ≤ Yk − ε0 E [τa] ≤ Additive drift [7, 10]
Pr (τa > B1) ≤ [6]Pr (τb < B2) ≤ Simplified drift [6, 17]
E [Yk+1 | Fk] ≥ Yk − ε0 ≤ E [τa] Additive drift (lower b.) [7, 9]E [Yk+1 | Fk] ≤ Yk E [τa] ≤ Supermartingale [16]E [Yk+1 | Fk] ≤ (1− δ)Yk E [τa] ≤ Multiplicative drift [2, 4]
Pr (τa > B3) ≤ [1]E [Yk+1 | Fk] ≥ (1− δ)Yk ≤ E [τa] Multipl. drift (lower b.) [13]E [Yk+1 | Fk] ≤ Yk − h(Yk) E [τa] ≤ Variable drift [11]
E[eλYk+1 | Fk
]≤ eλYk
α0Pr (τb < B) ≤ Population drift [12]
3Some drift theorems need additional conditions.
![Page 12: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/12.jpg)
Overview of Tutorial
ba Yk
E [Yk+1 − Yk | Fk]
Drift Condition3 Statement Note
E [Yk+1 | Fk] ≤ Yk − ε0 E [τa] ≤ Additive drift [7, 10]
Pr (τa > B1) ≤ [6]Pr (τb < B2) ≤ Simplified drift [6, 17]
E [Yk+1 | Fk] ≥ Yk − ε0 ≤ E [τa] Additive drift (lower b.) [7, 9]
E [Yk+1 | Fk] ≤ Yk E [τa] ≤ Supermartingale [16]E [Yk+1 | Fk] ≤ (1− δ)Yk E [τa] ≤ Multiplicative drift [2, 4]
Pr (τa > B3) ≤ [1]E [Yk+1 | Fk] ≥ (1− δ)Yk ≤ E [τa] Multipl. drift (lower b.) [13]E [Yk+1 | Fk] ≤ Yk − h(Yk) E [τa] ≤ Variable drift [11]
E[eλYk+1 | Fk
]≤ eλYk
α0Pr (τb < B) ≤ Population drift [12]
3Some drift theorems need additional conditions.
![Page 13: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/13.jpg)
Overview of Tutorial
ba Yk
E [Yk+1 − Yk | Fk]
Drift Condition3 Statement Note
E [Yk+1 | Fk] ≤ Yk − ε0 E [τa] ≤ Additive drift [7, 10]
Pr (τa > B1) ≤ [6]Pr (τb < B2) ≤ Simplified drift [6, 17]
E [Yk+1 | Fk] ≥ Yk − ε0 ≤ E [τa] Additive drift (lower b.) [7, 9]E [Yk+1 | Fk] ≤ Yk E [τa] ≤ Supermartingale [16]
E [Yk+1 | Fk] ≤ (1− δ)Yk E [τa] ≤ Multiplicative drift [2, 4]Pr (τa > B3) ≤ [1]
E [Yk+1 | Fk] ≥ (1− δ)Yk ≤ E [τa] Multipl. drift (lower b.) [13]E [Yk+1 | Fk] ≤ Yk − h(Yk) E [τa] ≤ Variable drift [11]
E[eλYk+1 | Fk
]≤ eλYk
α0Pr (τb < B) ≤ Population drift [12]
3Some drift theorems need additional conditions.
![Page 14: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/14.jpg)
Overview of Tutorial
ba Yk
E [Yk+1 − Yk | Fk]
Drift Condition3 Statement Note
E [Yk+1 | Fk] ≤ Yk − ε0 E [τa] ≤ Additive drift [7, 10]
Pr (τa > B1) ≤ [6]Pr (τb < B2) ≤ Simplified drift [6, 17]
E [Yk+1 | Fk] ≥ Yk − ε0 ≤ E [τa] Additive drift (lower b.) [7, 9]E [Yk+1 | Fk] ≤ Yk E [τa] ≤ Supermartingale [16]E [Yk+1 | Fk] ≤ (1− δ)Yk E [τa] ≤ Multiplicative drift [2, 4]
Pr (τa > B3) ≤ [1]E [Yk+1 | Fk] ≥ (1− δ)Yk ≤ E [τa] Multipl. drift (lower b.) [13]
E [Yk+1 | Fk] ≤ Yk − h(Yk) E [τa] ≤ Variable drift [11]
E[eλYk+1 | Fk
]≤ eλYk
α0Pr (τb < B) ≤ Population drift [12]
3Some drift theorems need additional conditions.
![Page 15: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/15.jpg)
Overview of Tutorial
ba Yk
E [Yk+1 − Yk | Fk]
Drift Condition3 Statement Note
E [Yk+1 | Fk] ≤ Yk − ε0 E [τa] ≤ Additive drift [7, 10]
Pr (τa > B1) ≤ [6]Pr (τb < B2) ≤ Simplified drift [6, 17]
E [Yk+1 | Fk] ≥ Yk − ε0 ≤ E [τa] Additive drift (lower b.) [7, 9]E [Yk+1 | Fk] ≤ Yk E [τa] ≤ Supermartingale [16]E [Yk+1 | Fk] ≤ (1− δ)Yk E [τa] ≤ Multiplicative drift [2, 4]
Pr (τa > B3) ≤ [1]E [Yk+1 | Fk] ≥ (1− δ)Yk ≤ E [τa] Multipl. drift (lower b.) [13]E [Yk+1 | Fk] ≤ Yk − h(Yk) E [τa] ≤ Variable drift [11]
E[eλYk+1 | Fk
]≤ eλYk
α0Pr (τb < B) ≤ Population drift [12]
3Some drift theorems need additional conditions.
![Page 16: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/16.jpg)
Overview of Tutorial
ba Yk
E [Yk+1 − Yk | Fk]
Drift Condition3 Statement Note
E [Yk+1 | Fk] ≤ Yk − ε0 E [τa] ≤ Additive drift [7, 10]Pr (τa > B1) ≤ [6]Pr (τb < B2) ≤ Simplified drift [6, 17]
E [Yk+1 | Fk] ≥ Yk − ε0 ≤ E [τa] Additive drift (lower b.) [7, 9]E [Yk+1 | Fk] ≤ Yk E [τa] ≤ Supermartingale [16]E [Yk+1 | Fk] ≤ (1− δ)Yk E [τa] ≤ Multiplicative drift [2, 4]
Pr (τa > B3) ≤ [1]E [Yk+1 | Fk] ≥ (1− δ)Yk ≤ E [τa] Multipl. drift (lower b.) [13]E [Yk+1 | Fk] ≤ Yk − h(Yk) E [τa] ≤ Variable drift [11]
E[eλYk+1 | Fk
]≤ eλYk
α0Pr (τb < B) ≤ Population drift [12]
3Some drift theorems need additional conditions.
![Page 17: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/17.jpg)
Overview of Tutorial
ba Yk
E [Yk+1 − Yk | Fk]
Drift Condition3 Statement Note
E [Yk+1 | Fk] ≤ Yk − ε0 E [τa] ≤ Additive drift [7, 10]Pr (τa > B1) ≤ [6]Pr (τb < B2) ≤ Simplified drift [6, 17]
E [Yk+1 | Fk] ≥ Yk − ε0 ≤ E [τa] Additive drift (lower b.) [7, 9]E [Yk+1 | Fk] ≤ Yk E [τa] ≤ Supermartingale [16]E [Yk+1 | Fk] ≤ (1− δ)Yk E [τa] ≤ Multiplicative drift [2, 4]
Pr (τa > B3) ≤ [1]E [Yk+1 | Fk] ≥ (1− δ)Yk ≤ E [τa] Multipl. drift (lower b.) [13]E [Yk+1 | Fk] ≤ Yk − h(Yk) E [τa] ≤ Variable drift [11]
E[eλYk+1 | Fk
]≤ eλYk
α0Pr (τb < B) ≤ Population drift [12]
3Some drift theorems need additional conditions.
![Page 18: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/18.jpg)
Part 1 - Basic Probability Theory
![Page 19: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/19.jpg)
Basic Probability Theory
Ω
1
2
3
4
5
6
RX
Probability Triple (Ω,F ,Pr)
I Ω : Sample space
I F : σ-algebra (family of events)
I Pr : F → R probability function(satisfying probability axioms)
Events
I E ∈ F
Random Variable
I X : Ω→ R and X−1 : B → F
I X = y ⇐⇒ ω ∈ Ω | X(ω) = yExpectation
I E [X] :=∑
y yPr (X = y).
![Page 20: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/20.jpg)
Basic Probability Theory
Ω
1
2
3
4
5
6
RX
Probability Triple (Ω,F ,Pr)
I Ω : Sample space
I F : σ-algebra (family of events)
I Pr : F → R probability function(satisfying probability axioms)
Events
I E ∈ F
Random Variable
I X : Ω→ R and X−1 : B → F
I X = y ⇐⇒ ω ∈ Ω | X(ω) = yExpectation
I E [X] :=∑
y yPr (X = y).
![Page 21: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/21.jpg)
Basic Probability Theory
Ω
1
2
3
4
5
6
RX
Probability Triple (Ω,F ,Pr)
I Ω : Sample space
I F : σ-algebra (family of events)
I Pr : F → R probability function(satisfying probability axioms)
Events
I E ∈ F
Random Variable
I X : Ω→ R and X−1 : B → F
I X = y ⇐⇒ ω ∈ Ω | X(ω) = yExpectation
I E [X] :=∑
y yPr (X = y).
![Page 22: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/22.jpg)
Basic Probability Theory
Ω
1
2
3
4
5
6
RX
Probability Triple (Ω,F ,Pr)
I Ω : Sample space
I F : σ-algebra (family of events)
I Pr : F → R probability function(satisfying probability axioms)
Events
I E ∈ F
Random Variable
I X : Ω→ R and X−1 : B → F
I X = y ⇐⇒ ω ∈ Ω | X(ω) = y
Expectation
I E [X] :=∑
y yPr (X = y).
![Page 23: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/23.jpg)
Basic Probability Theory
Ω
1
2
3
4
5
6
RX
Probability Triple (Ω,F ,Pr)
I Ω : Sample space
I F : σ-algebra (family of events)
I Pr : F → R probability function(satisfying probability axioms)
Events
I E ∈ F
Random Variable
I X : Ω→ R and X−1 : B → F
I X = y ⇐⇒ ω ∈ Ω | X(ω) = yExpectation
I E [X] :=∑
y yPr (X = y).
![Page 24: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/24.jpg)
Conditional Expectation
E [X | E ] :=∑x
x
Pr (X = x | E) =
∑x
x
Pr (X = x ∧ E)
Pr (E)
E [X | Z = z]
E [X | Z](ω) = E [X | Z = z] , where z = Z(ω)
DefinitionY = E [X | G ] if
1. Y is G -measurable, ie, Y −1(A) ∈ G for all A ∈ B2. E [|Y |] <∞3. E [Y IF ] = E [XIF ] for all F ∈ G
![Page 25: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/25.jpg)
Conditional Expectation
E [X | E ] :=∑x
xPr (X = x | E) =∑x
xPr (X = x ∧ E)
Pr (E)
E [X | Z = z]
E [X | Z](ω) = E [X | Z = z] , where z = Z(ω)
DefinitionY = E [X | G ] if
1. Y is G -measurable, ie, Y −1(A) ∈ G for all A ∈ B2. E [|Y |] <∞3. E [Y IF ] = E [XIF ] for all F ∈ G
![Page 26: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/26.jpg)
Conditional Expectation
E [X | E ] :=∑x
xPr (X = x | E) =∑x
xPr (X = x ∧ E)
Pr (E)
E [X | Z = z]
E [X | Z](ω) = E [X | Z = z] , where z = Z(ω)
DefinitionY = E [X | G ] if
1. Y is G -measurable, ie, Y −1(A) ∈ G for all A ∈ B2. E [|Y |] <∞3. E [Y IF ] = E [XIF ] for all F ∈ G
![Page 27: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/27.jpg)
Conditional Expectation
E [X | E ] :=∑x
xPr (X = x | E) =∑x
xPr (X = x ∧ E)
Pr (E)
E [X | Z = z]
E [X | Z](ω) = E [X | Z = z] , where z = Z(ω)
DefinitionY = E [X | G ] if
1. Y is G -measurable, ie, Y −1(A) ∈ G for all A ∈ B2. E [|Y |] <∞3. E [Y IF ] = E [XIF ] for all F ∈ G
![Page 28: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/28.jpg)
Three Properties
I E [a1X1 + a2X2 | G ] = a1E [X1 | G ] + a2E [X2 | G ]
I If X is G -measurable then E [X | G ] = X
I If H ⊂ G , then E [E [X | G ] |H ] = E [X |H ]
![Page 29: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/29.jpg)
Stochastic Processes and Filtration
Definition
I A stochastic process is a sequence of rv Y1, Y2, . . .
I A filtration is an increasing family of sub-σ-algebras of F
F0 ⊆ F1 ⊆ · · · ⊆ F
I A stochastic process Yk is adapted to a filtration Fk
if Yk is Fk-measurable for all k
=⇒ Informally, Fk represents the information that has beenrevealed about the process during the first k steps.
![Page 30: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/30.jpg)
Stopping Time
DefinitionA rv. τ : Ω→ N is called a stopping time if for all k ≥ 0
τ ≤ k ∈ Fk
I The information obtained until step k is sufficientto decide whether the event τ ≤ k is true or not.
Example
I The smallest k such that Yk < a in a stochastic process.
I The runtime of an evolutionary algorithm
![Page 31: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/31.jpg)
Martingales
Definition (Supermartingale)
Any process Y st ∀k1. Y is adapted to F
2. E [|Yk|] <∞3. E [Yk+1 | Fk] ≤ Yk
0 20 40 60 80 100
k
Example
Let ∆1,∆2, . . . be rvs with −∞ < E [∆k+1 | Fk] ≤ −ε0 for k ≥ 0Then the following sequence is a super-martingale
Yk := ∆1 + · · ·+ ∆k
Zk := Yk + kε0
E [Yk+1 | Fk] = ∆1 + · · ·+ ∆k + E [∆k+1 | Fk]
+ (k + 1)ε0
≤ ∆1 + · · ·+ ∆k − ε0
+ (k + 1)ε0
< Yk
![Page 32: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/32.jpg)
Martingales
Definition (Supermartingale)
Any process Y st ∀k1. Y is adapted to F
2. E [|Yk|] <∞3. E [Yk+1 | Fk] ≤ Yk
0 20 40 60 80 100
k
Example
Let ∆1,∆2, . . . be rvs with −∞ < E [∆k+1 | Fk] ≤ −ε0 for k ≥ 0Then the following sequence is a super-martingale
Yk := ∆1 + · · ·+ ∆k
Zk := Yk + kε0
E [Yk+1 | Fk] = ∆1 + · · ·+ ∆k + E [∆k+1 | Fk]
+ (k + 1)ε0
≤ ∆1 + · · ·+ ∆k − ε0
+ (k + 1)ε0
< Yk
![Page 33: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/33.jpg)
Martingales
Definition (Supermartingale)
Any process Y st ∀k1. Y is adapted to F
2. E [|Yk|] <∞3. E [Yk+1 | Fk] ≤ Yk
0 20 40 60 80 100
k
Example
Let ∆1,∆2, . . . be rvs with −∞ < E [∆k+1 | Fk] ≤ −ε0 for k ≥ 0Then the following sequence is a super-martingale
Yk := ∆1 + · · ·+ ∆k Zk := Yk + kε0
E [Zk+1 | Fk] = ∆1 + · · ·+ ∆k + E [∆k+1 | Fk] + (k + 1)ε0
≤ ∆1 + · · ·+ ∆k − ε0 + (k + 1)ε0 = Zk
![Page 34: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/34.jpg)
Martingales
LemmaIf Y is a supermartingale, then E [Yk | F0] ≤ Y0 for all fixed k ≥ 0.
Proof.
E [Yk | F0] = E [E [Yk | Fk] | F0]
≤ E [Yk−1 | F0]≤ · · · ≤ E [Y0 | F0]︸ ︷︷ ︸by induction on k
= Y0
Example
Where is the process Y in the previous example after k steps?
Y0 = Z0 ≥ E [Zk | F0] = E [Yk + kε0 | F0]
Hence, E [Yk | F0] ≤ Y0 − ε0k, which is not surprising...
![Page 35: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/35.jpg)
Martingales
LemmaIf Y is a supermartingale, then E [Yk | F0] ≤ Y0 for all fixed k ≥ 0.
Proof.
E [Yk | F0] = E [E [Yk | Fk] | F0]
≤ E [Yk−1 | F0]≤ · · · ≤ E [Y0 | F0]︸ ︷︷ ︸by induction on k
= Y0
Example
Where is the process Y in the previous example after k steps?
Y0 = Z0 ≥ E [Zk | F0] = E [Yk + kε0 | F0]
Hence, E [Yk | F0] ≤ Y0 − ε0k, which is not surprising...
![Page 36: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/36.jpg)
Martingales
LemmaIf Y is a supermartingale, then E [Yk | F0] ≤ Y0 for all fixed k ≥ 0.
Proof.
E [Yk | F0] = E [E [Yk | Fk] | F0]
≤ E [Yk−1 | F0]≤ · · · ≤ E [Y0 | F0]︸ ︷︷ ︸by induction on k
= Y0
Example
Where is the process Y in the previous example after k steps?
Y0 = Z0 ≥ E [Zk | F0] = E [Yk + kε0 | F0]
Hence, E [Yk | F0] ≤ Y0 − ε0k, which is not surprising...
![Page 37: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/37.jpg)
Part 2 - Additive Drift
![Page 38: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/38.jpg)
Additive Drift
ba = 0 Yk
ε0
(C1+) ∀k E [Yk+1 − Yk | Yk > 0 ∧Fk] ≤ −ε0
(C1−) ∀k E [Yk+1 − Yk | Yk > 0 ∧Fk] ≥ −ε0
Theorem ([7, 9, 10])
Given a sequence (Yk,Fk) over an interval [0, b] ⊂ R.Define τ := mink ≥ 0 | Yk = 0, and assume E [τ | F0] <∞.
I If (C1+) holds for an ε0 > 0, then E [τ | F0] ≤ Y0/ε0 ≤ b/ε0.
I If (C1−) holds for an ε0 > 0, then E [τ | F0] ≥ Y0/ε0.
![Page 39: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/39.jpg)
Additive Drift
ba = 0 Yk
ε0
(C1+) ∀k E [Yk+1 − Yk | Yk > 0 ∧Fk] ≤ −ε0
(C1−) ∀k E [Yk+1 − Yk | Yk > 0 ∧Fk] ≥ −ε0
Theorem ([7, 9, 10])
Given a sequence (Yk,Fk) over an interval [0, b] ⊂ R.Define τ := mink ≥ 0 | Yk = 0, and assume E [τ | F0] <∞.
I If (C1+) holds for an ε0 > 0, then E [τ | F0] ≤ Y0/ε0 ≤ b/ε0.
I If (C1−) holds for an ε0 > 0, then E [τ | F0] ≥ Y0/ε0.
![Page 40: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/40.jpg)
Obtaining Supermartingales from Drift Conditions
Definition (Stopped Process)
Let Y be a stochastic process and τ a stopping time.
Yk∧τ :=
Yk if k < τ
Yτ if k ≥ τ
(C1) E [Yk+1 − Yk | Yk > a ∧Fk] ≤ −ε0
I Yk not necessarily a supermartingale,because (C1) assumes Yk > a
I But the “stopped process” Yk∧τa is a supermartingale, so
∀k Y0 ≥ E [Yk∧τa | F0]
∀k Y0 ≥ E [Yk∧τa + (k ∧ τa)ε0 | F0]
![Page 41: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/41.jpg)
Obtaining Supermartingales from Drift Conditions
Definition (Stopped Process)
Let Y be a stochastic process and τ a stopping time.
Yk∧τ :=
Yk if k < τ
Yτ if k ≥ τ
(C1) E [Yk+1 − Yk | Yk > a ∧Fk] ≤ −ε0
I Yk not necessarily a supermartingale,because (C1) assumes Yk > a
I But the “stopped process” Yk∧τa is a supermartingale, so
∀k Y0 ≥ E [Yk∧τa | F0]
∀k Y0 ≥ E [Yk∧τa + (k ∧ τa)ε0 | F0]
![Page 42: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/42.jpg)
Dominated Convergence Theorem
TheoremSuppose Xk is a sequence of random variables such thatfor each outcome in the sample space
limk→∞
Xk = X.
Let Y ≥ 0 be a random variable with E [Y ] <∞ such thatfor each outcome in the sample space, and for each k
|Xk| ≤ Y.
Then it holds
limk→∞
E [Xk] = E[
limk→∞
Xk
]= E [X]
![Page 43: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/43.jpg)
Proof of Additive Drift Theorem
(C1+) ∀k E [Yk+1 − Yk | Yk > 0 ∧Fk] ≤ −ε0
(C1−) ∀k E [Yk+1 − Yk | Yk > 0 ∧Fk] ≥ −ε0
TheoremGiven a sequence (Yk,Fk) over an interval [0, b] ⊂ RDefine τ := mink ≥ 0 | Yk = 0, and assume E [τ | F0] <∞.
I If (C1+) holds for an ε0 > 0, then E [τ | F0] ≤ Y0/ε0.
I If (C1−) holds for an ε0 > 0, then E [τ | F0] ≥ Y0/ε0.
Proof.By (C1+), Zk := Yk∧τ + ε0(k ∧ τ) is a super-martingale, so
Y0 = E [Z0 | F0] ≥ E [Zk | F0] ∀k.
Since Yk is bounded to [0, b], and τ has finite expectation,the dominated convergence theorem applies and
Y0 ≥ limk→∞
E [Zk | F0] = E [Yτ + ε0τ | F0] = ε0E [τ | F0] .
![Page 44: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/44.jpg)
Proof of Additive Drift Theorem
(C1+) ∀k E [Yk+1 − Yk | Yk > 0 ∧Fk] ≤ −ε0
(C1−) ∀k E [Yk+1 − Yk | Yk > 0 ∧Fk] ≥ −ε0
TheoremGiven a sequence (Yk,Fk) over an interval [0, b] ⊂ RDefine τ := mink ≥ 0 | Yk = 0, and assume E [τ | F0] <∞.
I If (C1+) holds for an ε0 > 0, then E [τ | F0] ≤ Y0/ε0.
I If (C1−) holds for an ε0 > 0, then E [τ | F0] ≥ Y0/ε0.
Proof.By (C1+), Zk := Yk∧τ + ε0(k ∧ τ) is a super-martingale, so
Y0 = E [Z0 | F0] ≥ E [Zk | F0] ∀k.
Since Yk is bounded to [0, b], and τ has finite expectation,the dominated convergence theorem applies and
Y0 ≥ limk→∞
E [Zk | F0] = E [Yτ + ε0τ | F0] = ε0E [τ | F0] .
![Page 45: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/45.jpg)
Proof of Additive Drift Theorem
(C1+) ∀k E [Yk+1 − Yk | Yk > 0 ∧Fk] ≤ −ε0
(C1−) ∀k E [Yk+1 − Yk | Yk > 0 ∧Fk] ≥ −ε0
TheoremGiven a sequence (Yk,Fk) over an interval [0, b] ⊂ RDefine τ := mink ≥ 0 | Yk = 0, and assume E [τ | F0] <∞.
I If (C1+) holds for an ε0 > 0, then E [τ | F0] ≤ Y0/ε0.
I If (C1−) holds for an ε0 > 0, then E [τ | F0] ≥ Y0/ε0.
Proof.By (C1−), Zk := Yk∧τ + ε0(k ∧ τ) is a sub-martingale, so
Y0 = E [Z0 | F0]≤E [Zk | F0] ∀k.
Since Yk is bounded to [0, b], and τ has finite expectation,the dominated convergence theorem applies and
Y0≤ limk→∞
E [Zk | F0] = E [Yτ + ε0τ | F0] = ε0E [τ | F0] .
![Page 46: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/46.jpg)
Examples: (1+1) EA
1 (1+1) EA
1: Sample x(0) uniformly at random from 0, 1n.2: for k = 0, 1, 2, . . . do3: Set y := x(k), and flip each bit of y with probability 1/n.4:
x(k+1) :=
y if f(y) ≥ f(x(k))
x(k) otherwise.
5: end for
![Page 47: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/47.jpg)
Law of Total Probability
E [X] = Pr (E)E [X | E ] + Pr(E)E[X | E
]
![Page 48: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/48.jpg)
Example 1: (1+1) EA on LeadingOnes
Lo(x) :=
n∑i=1
i∏j=1
xj
x =
Leading 1-bits︷ ︸︸ ︷1111111111111111
Remaining bits︷ ︸︸ ︷0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ .
Left-most 0-bit
I Let Yk := n−Lo(x(k)) be the “remaining” bits in step k ≥ 0.I Let E be the event that only the left-most 0-bit flipped in y.I The sequence Yk is non-increasing, so
E [Yk+1 − Yk | Yk > 0 ∧Fk]
≤ (−1) Pr (E | Yk > 0 ∧Fk)
= (−1)(1/n)(1− 1/n)n−1 ≤ −1/en.
I By the additive drift theorem, E [τ | F0] ≤ enY0 ≤ en2.
![Page 49: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/49.jpg)
Example 1: (1+1) EA on LeadingOnes
Lo(x) :=
n∑i=1
i∏j=1
xj
x =
Leading 1-bits︷ ︸︸ ︷1111111111111111
Remaining bits︷ ︸︸ ︷0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ .
Left-most 0-bit
I Let Yk := n−Lo(x(k)) be the “remaining” bits in step k ≥ 0.
I Let E be the event that only the left-most 0-bit flipped in y.I The sequence Yk is non-increasing, so
E [Yk+1 − Yk | Yk > 0 ∧Fk]
≤ (−1) Pr (E | Yk > 0 ∧Fk)
= (−1)(1/n)(1− 1/n)n−1 ≤ −1/en.
I By the additive drift theorem, E [τ | F0] ≤ enY0 ≤ en2.
![Page 50: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/50.jpg)
Example 1: (1+1) EA on LeadingOnes
Lo(x) :=
n∑i=1
i∏j=1
xj
x =
Leading 1-bits︷ ︸︸ ︷1111111111111111
Remaining bits︷ ︸︸ ︷0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ .
Left-most 0-bit
I Let Yk := n−Lo(x(k)) be the “remaining” bits in step k ≥ 0.I Let E be the event that only the left-most 0-bit flipped in y.
I The sequence Yk is non-increasing, so
E [Yk+1 − Yk | Yk > 0 ∧Fk]
≤ (−1) Pr (E | Yk > 0 ∧Fk)
= (−1)(1/n)(1− 1/n)n−1 ≤ −1/en.
I By the additive drift theorem, E [τ | F0] ≤ enY0 ≤ en2.
![Page 51: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/51.jpg)
Example 1: (1+1) EA on LeadingOnes
Lo(x) :=
n∑i=1
i∏j=1
xj
x =
Leading 1-bits︷ ︸︸ ︷1111111111111111
Remaining bits︷ ︸︸ ︷0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ .
Left-most 0-bit
I Let Yk := n−Lo(x(k)) be the “remaining” bits in step k ≥ 0.I Let E be the event that only the left-most 0-bit flipped in y.I The sequence Yk is non-increasing, so
E [Yk+1 − Yk | Yk > 0 ∧Fk]
≤ (−1) Pr (E | Yk > 0 ∧Fk)
= (−1)(1/n)(1− 1/n)n−1 ≤ −1/en.
I By the additive drift theorem, E [τ | F0] ≤ enY0 ≤ en2.
![Page 52: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/52.jpg)
Example 1: (1+1) EA on LeadingOnes
Lo(x) :=
n∑i=1
i∏j=1
xj
x =
Leading 1-bits︷ ︸︸ ︷1111111111111111
Remaining bits︷ ︸︸ ︷0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ .
Left-most 0-bit
I Let Yk := n−Lo(x(k)) be the “remaining” bits in step k ≥ 0.I Let E be the event that only the left-most 0-bit flipped in y.I The sequence Yk is non-increasing, so
E [Yk+1 − Yk | Yk > 0 ∧Fk]
≤ (−1) Pr (E | Yk > 0 ∧Fk)
= (−1)(1/n)(1− 1/n)n−1 ≤ −1/en.
I By the additive drift theorem, E [τ | F0] ≤ enY0 ≤ en2.
![Page 53: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/53.jpg)
Example 2: (1+1) EA on Linear Functions
I Given some constants w1, . . . , wn ∈ [wmin, wmax], define
f(x) := w1x1 + w2x2 + · · ·+ wnxn
I Let Yk be the function value that “remains” at time k, ie
Yk :=
(n∑i=1
wi
)−
(n∑i=1
wix(k)i
)=
n∑i=1
wi
(1− x(k)
i
).
I Let Ei be the event that only bit i flipped in y, then
E [Yk+1 − Yk | Fk] ≤n∑i=1
Pr (Ei | Fk)
E [Yk+1 − Yk | Ei ∧Fk]
≤
(1
n
)(1− 1
n
)n−1 n∑i=1
wi
(x
(k)i − 1
)
≤ −Yken≤ −wmin
en
I By the additive drift theorem, E [τ | F0] ≤ en2(wmax/wmin).
![Page 54: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/54.jpg)
Example 2: (1+1) EA on Linear Functions
I Given some constants w1, . . . , wn ∈ [wmin, wmax], define
f(x) := w1x1 + w2x2 + · · ·+ wnxn
I Let Yk be the function value that “remains” at time k, ie
Yk :=
(n∑i=1
wi
)−
(n∑i=1
wix(k)i
)=
n∑i=1
wi
(1− x(k)
i
).
I Let Ei be the event that only bit i flipped in y, then
E [Yk+1 − Yk | Fk] ≤n∑i=1
Pr (Ei | Fk)
E [Yk+1 − Yk | Ei ∧Fk]
≤
(1
n
)(1− 1
n
)n−1 n∑i=1
wi
(x
(k)i − 1
)
≤ −Yken≤ −wmin
en
I By the additive drift theorem, E [τ | F0] ≤ en2(wmax/wmin).
![Page 55: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/55.jpg)
Example 2: (1+1) EA on Linear Functions
I Given some constants w1, . . . , wn ∈ [wmin, wmax], define
f(x) := w1x1 + w2x2 + · · ·+ wnxn
I Let Yk be the function value that “remains” at time k, ie
Yk :=
(n∑i=1
wi
)−
(n∑i=1
wix(k)i
)=
n∑i=1
wi
(1− x(k)
i
).
I Let Ei be the event that only bit i flipped in y, then
E [Yk+1 − Yk | Fk] ≤n∑i=1
Pr (Ei | Fk)
E [Yk+1 − Yk | Ei ∧Fk]
≤
(1
n
)(1− 1
n
)n−1 n∑i=1
wi
(x
(k)i − 1
)
≤ −Yken≤ −wmin
en
I By the additive drift theorem, E [τ | F0] ≤ en2(wmax/wmin).
![Page 56: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/56.jpg)
Example 2: (1+1) EA on Linear Functions
I Given some constants w1, . . . , wn ∈ [wmin, wmax], define
f(x) := w1x1 + w2x2 + · · ·+ wnxn
I Let Yk be the function value that “remains” at time k, ie
Yk :=
(n∑i=1
wi
)−
(n∑i=1
wix(k)i
)=
n∑i=1
wi
(1− x(k)
i
).
I Let Ei be the event that only bit i flipped in y, then
E [Yk+1 − Yk | Fk] ≤n∑i=1
Pr (Ei | Fk)E [Yk+1 − Yk | Ei ∧Fk]
≤(
1
n
)(1− 1
n
)n−1 n∑i=1
wi
(x
(k)i − 1
)≤ −Yk
en≤ −wmin
en
I By the additive drift theorem, E [τ | F0] ≤ en2(wmax/wmin).
![Page 57: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/57.jpg)
Example 2: (1+1) EA on Linear Functions
I Given some constants w1, . . . , wn ∈ [wmin, wmax], define
f(x) := w1x1 + w2x2 + · · ·+ wnxn
I Let Yk be the function value that “remains” at time k, ie
Yk :=
(n∑i=1
wi
)−
(n∑i=1
wix(k)i
)=
n∑i=1
wi
(1− x(k)
i
).
I Let Ei be the event that only bit i flipped in y, then
E [Yk+1 − Yk | Fk] ≤n∑i=1
Pr (Ei | Fk)E [Yk+1 − Yk | Ei ∧Fk]
≤(
1
n
)(1− 1
n
)n−1 n∑i=1
wi
(x
(k)i − 1
)≤ −Yk
en≤ −wmin
en
I By the additive drift theorem, E [τ | F0] ≤ en2(wmax/wmin).
![Page 58: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/58.jpg)
Remarks on Example Applications
Example 1: (1+1) EA on LeadingOnes
I The upper bound en2 is very accurate.
I The exact expression is c(n)n2, where c(n)→ (e− 1)/2 [20].
Example 2: (1+1) EA on Linear Functions
I The upper bound en2(wmax/wmin) is correct, but very loose.
I The linear function BinVal has (wmax/wmin) = 2n−1.
I The tightest known bound is en log(n) +O(n) [22].
=⇒ A poor choice of distance function gives an imprecise bound!
![Page 59: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/59.jpg)
What is a good distance function?
Theorem ([8])
Assume Y is a homogeneous Markov chain, and τ the time toabsorption. Then the function g(x) := E [τ | Y0 = x], satisfies
g(x) = 0 if x is an absorbing state
E [g(Yk+1)− g(Yk) | Fk] = −1 otherwise.
I Distance function g gives exact expected runtime!
I But g requires complete knowledge of the expected runtime!I Still provides insight into what is a good distance function:
I a good approximation (or guess) for the remaining runtime
![Page 60: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/60.jpg)
Part 3 - Variable Drift
![Page 61: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/61.jpg)
Drift may be Position-Dependant
Constant Drift Variable Drift
![Page 62: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/62.jpg)
Drift may be Position-Dependant
Idea: Find a function g : R→ Rst. the transformed stochasticprocess g(X1), g(X2), g(X3), . . .has constant drift.
Variable Drift
![Page 63: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/63.jpg)
Drift may be Position-Dependant
Idea: Find a function g : R→ Rst. the transformed stochasticprocess g(X1), g(X2), g(X3), . . .has constant drift.
Constant Drift
![Page 64: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/64.jpg)
Jensen’s Inequality
TheoremIf g : R→ R concave, then E [g(X) | F ] ≤ g(E [X | F ]).
I If g′′(x) < 0, then g is concave.
![Page 65: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/65.jpg)
Multiplicative Drift
ba Yk
δYk
(M) ∀k E [Yk+1 − Yk | Yk > a ∧Fk] ≤ −δYk
Theorem ([2, 4])
Given a sequence (Yk,Fk) over an interval [a, b] ⊂ R, a > 0Define τa := mink ≥ 0 | Yk = a, and assume E [τa | F0] <∞.
I If (M) holds for a δ > 0, then E [τa | F0] ≤ ln(Y0/a)/δ.
Proof.g(s) := ln(s/a) is concave, so by Jensen’s inequality
E [g(Yk+1)− g(Yk) | Yk > a ∧Fk]
≤ ln(E [Yk+1 | Yk > a ∧Fk])− ln(Yk) ≤ ln(1− δ) ≤ −δ.
![Page 66: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/66.jpg)
Multiplicative Drift
ba Yk
δYk
(M) ∀k E [Yk+1 − Yk | Yk > a ∧Fk] ≤ −δYk
Theorem ([2, 4])
Given a sequence (Yk,Fk) over an interval [a, b] ⊂ R, a > 0Define τa := mink ≥ 0 | Yk = a, and assume E [τa | F0] <∞.
I If (M) holds for a δ > 0, then E [τa | F0] ≤ ln(Y0/a)/δ.
Proof.g(s) := ln(s/a) is concave, so by Jensen’s inequality
E [g(Yk+1)− g(Yk) | Yk > a ∧Fk]
≤ ln(E [Yk+1 | Yk > a ∧Fk])− ln(Yk) ≤ ln(1− δ) ≤ −δ.
![Page 67: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/67.jpg)
Multiplicative Drift
ba Yk
δYk
(M) ∀k E [Yk+1 | Yk > a ∧Fk] ≤ (1− δ)Yk
Theorem ([2, 4])
Given a sequence (Yk,Fk) over an interval [a, b] ⊂ R, a > 0Define τa := mink ≥ 0 | Yk = a, and assume E [τa | F0] <∞.
I If (M) holds for a δ > 0, then E [τa | F0] ≤ ln(Y0/a)/δ.
Proof.g(s) := ln(s/a) is concave, so by Jensen’s inequality
E [g(Yk+1)− g(Yk) | Yk > a ∧Fk]
≤ ln(E [Yk+1 | Yk > a ∧Fk])− ln(Yk) ≤ ln(1− δ) ≤ −δ.
![Page 68: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/68.jpg)
Example: Linear Functions Revisited
I For any c ∈ (0, 1), define the distance at time k as
Yk := cwmin +
n∑i=1
wi
(1− x(k)
i
)I We have already seen that
E [Yk+1 − Yk | Fk] ≤1
en
n∑i=1
wi
(x
(k)i − 1
)= −Yk − cwmin
en≤ −Yk(1− c)
en
I By the multiplicative drift theorem (a := cwmin and δ := 1−cen )
E [τa | F0] ≤(
en
1− c
)ln
(1 +
nwmax
cwmin
)
![Page 69: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/69.jpg)
Variable Drift Theorem
ba Yk
h(Yk)
(V) ∀k E [Yk+1 − Yk | Yk > 0 ∧Fk] ≤ −h(Yk)
Theorem ([15, 11])
Given a sequence (Yk,Fk) over an interval [a, b] ⊂ R, a > 0.Define τa := mink ≥ 0 | Yk = a, and assume E [τa | F0] <∞.If there exists a function h : R→ R such that
I h(x) > 0 and h′(x) > 0 for all x ∈ [a, b], and
I drift condition (V) holds, then
E [τa | F0] ≤∫ Y0
a
1
h(z)dz
=⇒ The multiplicative drift theorem is the special case h(x) = δx.
![Page 70: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/70.jpg)
Variable Drift Theorem
ba Yk
h(Yk)
(V) ∀k E [Yk+1 − Yk | Yk > 0 ∧Fk] ≤ −h(Yk)
Theorem ([15, 11])
Given a sequence (Yk,Fk) over an interval [a, b] ⊂ R, a > 0.Define τa := mink ≥ 0 | Yk = a, and assume E [τa | F0] <∞.If there exists a function h : R→ R such that
I h(x) > 0 and h′(x) > 0 for all x ∈ [a, b], and
I drift condition (V) holds, then
E [τa | F0] ≤∫ Y0
a
1
h(z)dz
=⇒ The multiplicative drift theorem is the special case h(x) = δx.
![Page 71: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/71.jpg)
Variable Drift Theorem: Proof
g(x) :=
∫ x
a
1
h(z)dz
E [Yk+1 | Fk]
1h(x)
Yk
h(Yk)
1h(Yk)
Proof.
The function g is concave (g′′ < 0), so by Jensen’s inequality
E [g(Yk)− g(Yk+1) | Fk]
≥ g(Yk)− g(E [Yk+1 | Fk])
≥∫ Yk
Yk−h(Yk)
1
h(z)dz ≥ 1
![Page 72: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/72.jpg)
Variable Drift Theorem: Proof
g(x) :=
∫ x
a
1
h(z)dz
E [Yk+1 | Fk]
1h(x)
Yk
h(Yk)
1h(Yk)
Proof.The function g is concave (g′′ < 0), so by Jensen’s inequality
E [g(Yk)− g(Yk+1) | Fk] ≥ g(Yk)− g(E [Yk+1 | Fk])
≥∫ Yk
Yk−h(Yk)
1
h(z)dz ≥ 1
![Page 73: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/73.jpg)
Variable Drift Theorem: Proof
g(x) :=
∫ x
a
1
h(z)dz
E [Yk+1 | Fk]
1h(x)
Ykh(Yk)
1h(Yk)
Proof.
The function g is concave (g′′ < 0), so by Jensen’s inequality
E [g(Yk)− g(Yk+1) | Fk] ≥ g(Yk)− g(E [Yk+1 | Fk])
≥∫ Yk
Yk−h(Yk)
1
h(z)dz
≥ 1
![Page 74: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/74.jpg)
Variable Drift Theorem: Proof
g(x) :=
∫ x
a
1
h(z)dz
E [Yk+1 | Fk]
1h(x)
Ykh(Yk)
1h(Yk)
Proof.
The function g is concave (g′′ < 0), so by Jensen’s inequality
E [g(Yk)− g(Yk+1) | Fk] ≥ g(Yk)− g(E [Yk+1 | Fk])
≥∫ Yk
Yk−h(Yk)
1
h(z)dz ≥ 1
![Page 75: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/75.jpg)
Part 4 - Supermartingale
![Page 76: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/76.jpg)
Supermartingale
ba = 0 Yk
(S1) ∀k E [Yk+1 − Yk | Yk > 0 ∧Fk] ≤ 0
(S2) ∀k Var [Yk+1 | Yk > 0 ∧Fk] ≥ σ2
Theorem (See eg. [16])
Given a sequence (Yk,Fk) over an interval [0, b] ⊂ R.Define τ := mink ≥ 0 | Yk = 0, and assume E [τ | F0] <∞.
I If (S1) and (S2) hold for σ > 0, then E [τ | F0] ≤ Y0(2b−Y0)σ2
Proof.Let Zk := b2− (b−Yk)2, and note that b−Yk ≤ E [b− Yk+1 | Fk].
E [Zk+1 − Zk | Fk] = −E[(b− Yk+1)2 | Fk
]+ (b− Yk)2
≤ −E[(b− Yk+1)2 | Fk
]+ E [b− Yk+1 | Fk]
2
= −Var [b− Yk+1] = −Var [Yk+1 | Fk] ≤ −σ2
![Page 77: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/77.jpg)
Supermartingale
ba = 0 Yk
(S1) ∀k E [Yk+1 − Yk | Yk > 0 ∧Fk] ≤ 0
(S2) ∀k Var [Yk+1 | Yk > 0 ∧Fk] ≥ σ2
Theorem (See eg. [16])
Given a sequence (Yk,Fk) over an interval [0, b] ⊂ R.Define τ := mink ≥ 0 | Yk = 0, and assume E [τ | F0] <∞.
I If (S1) and (S2) hold for σ > 0, then E [τ | F0] ≤ Y0(2b−Y0)σ2
Proof.Let Zk := b2− (b−Yk)2, and note that b−Yk ≤ E [b− Yk+1 | Fk].
E [Zk+1 − Zk | Fk] = −E[(b− Yk+1)2 | Fk
]+ (b− Yk)2
≤ −E[(b− Yk+1)2 | Fk
]+ E [b− Yk+1 | Fk]
2
= −Var [b− Yk+1] = −Var [Yk+1 | Fk] ≤ −σ2
![Page 78: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/78.jpg)
Supermartingale
ba = 0 Yk
(S1) ∀k E [Yk+1 − Yk | Yk > 0 ∧Fk] ≤ 0
(S2) ∀k Var [Yk+1 | Yk > 0 ∧Fk] ≥ σ2
Theorem (See eg. [16])
Given a sequence (Yk,Fk) over an interval [0, b] ⊂ R.Define τ := mink ≥ 0 | Yk = 0, and assume E [τ | F0] <∞.
I If (S1) and (S2) hold for σ > 0, then E [τ | F0] ≤ Y0(2b−Y0)σ2
Proof.Let Zk := b2− (b−Yk)2, and note that b−Yk ≤ E [b− Yk+1 | Fk].
E [Zk+1 − Zk | Fk] = −E[(b− Yk+1)2 | Fk
]+ (b− Yk)2
≤ −E[(b− Yk+1)2 | Fk
]+ E [b− Yk+1 | Fk]
2
= −Var [b− Yk+1] = −Var [Yk+1 | Fk] ≤ −σ2
![Page 79: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/79.jpg)
Part 5 - Hajek’s Theorem
![Page 80: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/80.jpg)
Hajek’s Theorem4
TheoremIf there exist λ, ε0 > 0 and D <∞ such that for all k ≥ 0
(C1) E [Yk+1 − Yk | Yk > a ∧Fk] ≤ −ε0
(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E[eλZ]
= D
then for any δ ∈ (0, 1)
(2.9) Pr (τa > B | F0) ≤ eη(Y0−a−B(1−δ)ε0)
(*) Pr (τb < B | Y0 < a) ≤ BD(1−δ)ηε0 · e
η(a−b)
for some η ≥ minλ, δε0λ2/D > 0.
I If λ, ε0, D ∈ O(1) and b− a ∈ Ω(n),then there exists a constant c > 0 such that
Pr (τb ≤ ecn | F0) ≤ e−Ω(n)
4The theorem presented here is a corollary to Theorem 2.3 in [6].
![Page 81: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/81.jpg)
Hajek’s Theorem4
TheoremIf there exist λ, ε0 > 0 and D <∞ such that for all k ≥ 0
(C1) E [Yk+1 − Yk | Yk > a ∧Fk] ≤ −ε0
(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E[eλZ]
= D
then for any δ ∈ (0, 1)
(2.9) Pr (τa > B | F0) ≤ eη(Y0−a−B(1−δ)ε0)
(*) Pr (τb < B | Y0 < a) ≤ BD(1−δ)ηε0 · e
η(a−b)
for some η ≥ minλ, δε0λ2/D > 0.
I If λ, ε0, D ∈ O(1) and b− a ∈ Ω(n),then there exists a constant c > 0 such that
Pr (τb ≤ ecn | F0) ≤ e−Ω(n)
4The theorem presented here is a corollary to Theorem 2.3 in [6].
![Page 82: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/82.jpg)
Stochastic Dominance - (|Yk+1 − Yk| | Fk) ≺ Z
DefinitionY ≺ Z if Pr (Z ≤ c) ≤ Pr (Y ≤ c) for all c ∈ R
- 3 - 2 -1 1 2 3 4
0.2
0.4
0.6
0.8
1.0
Example
1. If Y ≤ Z, then Y ≺ Z.
2. Let (Ω, d) be a metric space, and V (x) := d(x, x∗).Then |V (Xk+1)− V (Xk)| ≺ d(Xk+1, Xk)
XkV (Xk)
x∗
V (Xk+1)
Xk+1
d(Xk, Xk+1)
![Page 83: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/83.jpg)
Stochastic Dominance - (|Yk+1 − Yk| | Fk) ≺ Z
DefinitionY ≺ Z if Pr (Z ≤ c) ≤ Pr (Y ≤ c) for all c ∈ R
- 3 - 2 -1 1 2 3 4
0.2
0.4
0.6
0.8
1.0
Example
1. If Y ≤ Z, then Y ≺ Z.
2. Let (Ω, d) be a metric space, and V (x) := d(x, x∗).Then |V (Xk+1)− V (Xk)| ≺ d(Xk+1, Xk)
XkV (Xk)
x∗
V (Xk+1)
Xk+1
d(Xk, Xk+1)
![Page 84: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/84.jpg)
Stochastic Dominance - (|Yk+1 − Yk| | Fk) ≺ Z
DefinitionY ≺ Z if Pr (Z ≤ c) ≤ Pr (Y ≤ c) for all c ∈ R
- 3 - 2 -1 1 2 3 4
0.2
0.4
0.6
0.8
1.0
Example
1. If Y ≤ Z, then Y ≺ Z.
2. Let (Ω, d) be a metric space, and V (x) := d(x, x∗).Then |V (Xk+1)− V (Xk)| ≺ d(Xk+1, Xk)
XkV (Xk)
x∗
V (Xk+1)
Xk+1
d(Xk, Xk+1)
![Page 85: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/85.jpg)
Condition (C2) implies that “long jumps” must be rare
Assume that
(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E[eλZ]
= D
Then for any j ≥ 0,
Pr (|Yk+1 − Yk| ≥ j) = Pr(eλ|Yk+1−Yk| ≥ eλj
)≤ E
[eλ|Yk+1−Yk|
]e−λj
≤ E[eλZ]e−λj
= De−λj .
Markov’s inequality
I If X ≥ 0, then Pr (X ≥ k) ≤ E [X] /k.
![Page 86: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/86.jpg)
Moment Generating Function (mgf) E[eλZ]
DefinitionThe mgf of a rv X is MX(λ) := E
[eλX
]for all λ ∈ R.
I The n-th derivative at t = 0 is M(n)X (0) = E [Xn],
hence MX provides all moments of X, thus the name.
I If X and Y are independent rv. and a, b ∈ R, then
MaX+bY (t) = E[et(aX+bX)
]= E
[etaX
]E[etbX
]= MX(at)MY (bt)
Example
I Let X :=∑n
i=1Xi where Xi are independent rvs withPr (Xi = 1) = p and Pr (Xi = 0) = 1− p. Then
MXi(λ) = (1− p)eλ·0 + peλ·1
MX(λ) = MX1(λ)MX2(λ) · · ·MXn(λ) = (1− p+ peλ)n.
![Page 87: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/87.jpg)
Moment Generating Function (mgf) E[eλZ]
DefinitionThe mgf of a rv X is MX(λ) := E
[eλX
]for all λ ∈ R.
I The n-th derivative at t = 0 is M(n)X (0) = E [Xn],
hence MX provides all moments of X, thus the name.
I If X and Y are independent rv. and a, b ∈ R, then
MaX+bY (t) = E[et(aX+bX)
]= E
[etaX
]E[etbX
]= MX(at)MY (bt)
Example
I Let X :=∑n
i=1Xi where Xi are independent rvs withPr (Xi = 1) = p and Pr (Xi = 0) = 1− p. Then
MXi(λ) = (1− p)eλ·0 + peλ·1
MX(λ) = MX1(λ)MX2(λ) · · ·MXn(λ) = (1− p+ peλ)n.
![Page 88: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/88.jpg)
Moment Generating Functions
Distribution mgf
Bernoulli Pr (X = 1) = p 1− p+ pet
Binomial X ∼ Bin(n, p) (1− p+ pet)n
Geometric Pr (X = k) = (1− p)k−1p pet
1−(1−p)et
Uniform X ∼ U(a, b) etb−etat(b−a)
Normal X ∼ N(µ, σ2) exp(tµ+ 12σ
2t2)
The mgf. of X ∼ Bin(n, p) at t = ln(2) is
(1− p+ pet)n = (1 + p)n ≤ epn.
![Page 89: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/89.jpg)
Moment Generating Functions
Distribution mgf
Bernoulli Pr (X = 1) = p 1− p+ pet
Binomial X ∼ Bin(n, p) (1− p+ pet)n
Geometric Pr (X = k) = (1− p)k−1p pet
1−(1−p)et
Uniform X ∼ U(a, b) etb−etat(b−a)
Normal X ∼ N(µ, σ2) exp(tµ+ 12σ
2t2)
The mgf. of X ∼ Bin(n, p) at t = ln(2) is
(1− p+ pet)n = (1 + p)n ≤ epn.
![Page 90: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/90.jpg)
Condition (C2) often holds trivially
Example ((1+1) EA)
Choose x uniformly from 0, 1nfor k = 0, 1, 2, ...
Set x′ := x(k), and flip each bit of x′ with probability p.
If f(x′) ≥ f(x(k)), then x(k+1) := x′ else x(k+1) := x(k)
Assume
I Fitness function f has unique maximum x∗ ∈ 0, 1n.
I Distance function is g(x) = H(x, x∗)
Then
I |g(x(k+1))− g(x(k))| ≺ Z where Z := H(x(k), x′)
I Z ∼ Bin(n, p) so E[eλZ]≤ enp for λ = ln(2)
![Page 91: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/91.jpg)
Condition (C2) often holds trivially
Example ((1+1) EA)
Choose x uniformly from 0, 1nfor k = 0, 1, 2, ...
Set x′ := x(k), and flip each bit of x′ with probability p.
If f(x′) ≥ f(x(k)), then x(k+1) := x′ else x(k+1) := x(k)
Assume
I Fitness function f has unique maximum x∗ ∈ 0, 1n.
I Distance function is g(x) = H(x, x∗)
Then
I |g(x(k+1))− g(x(k))| ≺ Z where Z := H(x(k), x′)
I Z ∼ Bin(n, p) so E[eλZ]≤ enp for λ = ln(2)
![Page 92: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/92.jpg)
Simple Application (1+1) EA on Needle
(1+1) EA with mutation rate p = 1/n on
Needle(x) :=
n∏i=1
xi
Yk := H(x(k), 0n)
a := (3/4)n
b := n
Condition (C2) satisfied5 with D = E[eλZ]≤ e where λ = ln(2).
Condition (C1) satisfied for ε0 := 1/2 because
E [Yk+1 − Yk | Yk > a ∧Fk] ≤ (n− a)p− ap = −ε0.
Thus, η ≥ minλ, δε0λ2/D > 1/25 when δ = 1/2 and
Pr (τa > n+ k | F0) ≤ e(1/25)(Y0−a−(n+k)(1−δ)ε0) ≤ e−k/100
Pr(τb < en/200 | F0
)= e−Ω(n)
5See previous slide.
![Page 93: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/93.jpg)
Proof overview
Theorem (2.3 in [6])
Assume that there exists 0 < ρ < 1 and D ≥ 1 such that
(D1) E[eηYk+1 | Yk > a ∧Fk
]≤ ρeηYk
(D2) E[eηYk+1 | Yk ≤ a ∧Fk
]≤ Deηa
Then
(2.6) E[eηYk+1 | F0
]≤ ρkeηY0 +Deηa(1− ρk)/(1− ρ).
(2.8) Pr (Yk ≥ b | F0) ≤ ρkeη(Y0−b) +Deη(a−b)(1− ρk)/(1− ρ).
(*) Pr (τb < B | Y0 < a) ≤ eη(a−b)BD/(1− ρ)
(2.9) Pr (τa > k | F0) ≤ eη(Y0−a)ρk
LemmaAssume that there exists a ε0 > 0 such that
(C1) E [Yk+1 − Yk | Yk > a ∧Fk] ≤ −ε0
(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E[eλZ]
= D <∞ for a λ > 0.
then (D1) and (D2) hold for some η and ρ < 1
![Page 94: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/94.jpg)
Theorem
(D1) E[eη(Yk+1−Yk) | Yk > a ∧Fk
]≤ ρ
(D2) E[eη(Yk+1−a) | Yk ≤ a ∧Fk
]≤ D
Assume that (D1) and (D2) hold. Then
(2.6) E[eηYk+1 | F0
]≤ ρkeηY0 +Deηa(1− ρk)/(1− ρ).
Proof.
By the law of total probability, and the conditions (D1) and (D2)
E[eηYk+1 | Fk
]≤ ρeηYk +Deηa (1)
By the law of total expectation,
Ineq. (1), and induction on k
E[eηYk+1 | F0
]= E
[E[eηYk+1 | Fk
]| F0
]
≤ ρE[eηYk | F0
]+Deηa
≤ ρkeηY0 + (1 + ρ+ ρ2 + · · ·+ ρk−1)Deηa.
![Page 95: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/95.jpg)
Theorem
(D1) E[eηYk+1 | Yk > a ∧Fk
]≤ ρeηYk
(D2) E[eηYk+1 | Yk ≤ a ∧Fk
]≤ Deηa
Assume that (D1) and (D2) hold. Then
(2.6) E[eηYk+1 | F0
]≤ ρkeηY0 +Deηa(1− ρk)/(1− ρ).
Proof.By the law of total probability, and the conditions (D1) and (D2)
E[eηYk+1 | Fk
]≤ ρeηYk +Deηa (1)
By the law of total expectation,
Ineq. (1), and induction on k
E[eηYk+1 | F0
]= E
[E[eηYk+1 | Fk
]| F0
]
≤ ρE[eηYk | F0
]+Deηa
≤ ρkeηY0 + (1 + ρ+ ρ2 + · · ·+ ρk−1)Deηa.
![Page 96: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/96.jpg)
Theorem
(D1) E[eηYk+1 | Yk > a ∧Fk
]≤ ρeηYk
(D2) E[eηYk+1 | Yk ≤ a ∧Fk
]≤ Deηa
Assume that (D1) and (D2) hold. Then
(2.6) E[eηYk+1 | F0
]≤ ρkeηY0 +Deηa(1− ρk)/(1− ρ).
Proof.By the law of total probability, and the conditions (D1) and (D2)
E[eηYk+1 | Fk
]≤ ρeηYk +Deηa (1)
By the law of total expectation,
Ineq. (1), and induction on k
E[eηYk+1 | F0
]= E
[E[eηYk+1 | Fk
]| F0
]
≤ ρE[eηYk | F0
]+Deηa
≤ ρkeηY0 + (1 + ρ+ ρ2 + · · ·+ ρk−1)Deηa.
![Page 97: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/97.jpg)
Theorem
(D1) E[eηYk+1 | Yk > a ∧Fk
]≤ ρeηYk
(D2) E[eηYk+1 | Yk ≤ a ∧Fk
]≤ Deηa
Assume that (D1) and (D2) hold. Then
(2.6) E[eηYk+1 | F0
]≤ ρkeηY0 +Deηa(1− ρk)/(1− ρ).
Proof.By the law of total probability, and the conditions (D1) and (D2)
E[eηYk+1 | Fk
]≤ ρeηYk +Deηa (1)
By the law of total expectation, Ineq. (1),
and induction on k
E[eηYk+1 | F0
]= E
[E[eηYk+1 | Fk
]| F0
]≤ ρE
[eηYk | F0
]+Deηa
≤ ρkeηY0 + (1 + ρ+ ρ2 + · · ·+ ρk−1)Deηa.
![Page 98: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/98.jpg)
Theorem
(D1) E[eηYk+1 | Yk > a ∧Fk
]≤ ρeηYk
(D2) E[eηYk+1 | Yk ≤ a ∧Fk
]≤ Deηa
Assume that (D1) and (D2) hold. Then
(2.6) E[eηYk+1 | F0
]≤ ρkeηY0 +Deηa(1− ρk)/(1− ρ).
Proof.By the law of total probability, and the conditions (D1) and (D2)
E[eηYk+1 | Fk
]≤ ρeηYk +Deηa (1)
By the law of total expectation, Ineq. (1), and induction on k
E[eηYk+1 | F0
]= E
[E[eηYk+1 | Fk
]| F0
]≤ ρE
[eηYk | F0
]+Deηa
≤ ρkeηY0 + (1 + ρ+ ρ2 + · · ·+ ρk−1)Deηa.
![Page 99: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/99.jpg)
Proof of (2.8)
Theorem
(D1) E[eη(Yk+1−Yk) | Yk > a ∧Fk
]≤ ρ
(D2) E[eη(Yk+1−a) | Yk ≤ a ∧Fk
]≤ D
Assume that (D1) and (D2) hold. Then
(2.6) E[eηYk+1 | F0
]≤ ρkeηY0 +Deηa(1− ρk)/(1− ρ).
(2.8) Pr (Yk ≥ b | F0) ≤ ρkeη(Y0−b) +Deη(a−b)(1− ρk)/(1− ρ).
Proof.(2.8) follows from Markov’s inequality and (2.6)
Pr (Yk+1 ≥ b | F0) = Pr(eηYk+1 ≥ eηb | F0
)≤ E
[eηYk+1 | F0
]e−ηb
![Page 100: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/100.jpg)
Proof of (*)
Theorem
(D1) E[eη(Yk+1−Yk) | Yk > a ∧Fk
]≤ ρ
(D2) E[eη(Yk+1−a) | Yk ≤ a ∧Fk
]≤ D
Assume that (D1) and (D2) hold for D ≥ 1. Then
(2.8) Pr (Yk ≥ b | F0) ≤ ρkeη(Y0−b) +Deη(a−b)(1− ρk)/(1− ρ).
(*) Pr (τb < B | Y0 < a) ≤ eη(a−b)BD/(1− ρ)
Proof.By the union bound and (2.8)
Pr (τb < B | Y0 < a ∧F0) ≤B∑k=1
Pr (Yk ≥ b | Y0 < a ∧F0)
≤B∑k=1
Deη(a−b)(ρk +
1− ρk
1− ρ
)≤ BDeη(a−b)
1− ρ
![Page 101: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/101.jpg)
Proof of (*)
Theorem
(D1) E[eη(Yk+1−Yk) | Yk > a ∧Fk
]≤ ρ
(D2) E[eη(Yk+1−a) | Yk ≤ a ∧Fk
]≤ D
Assume that (D1) and (D2) hold for D ≥ 1. Then
(2.8) Pr (Yk ≥ b | F0) ≤ ρkeη(Y0−b) +Deη(a−b)(1− ρk)/(1− ρ).
(*) Pr (τb < B | Y0 < a) ≤ eη(a−b)BD/(1− ρ)
Proof.By the union bound and (2.8)
Pr (τb < B | Y0 < a ∧F0) ≤B∑k=1
Pr (Yk ≥ b | Y0 < a ∧F0)
≤B∑k=1
Deη(a−b)(ρk +
1− ρk
1− ρ
)≤ BDeη(a−b)
1− ρ
![Page 102: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/102.jpg)
Union Bound
Ω
E3
E1 E2 E4
Pr (E1 ∨ E2 ∨ · · · ∨ Ek) ≤ Pr (E1) + Pr (E2) + · · ·+ Pr (Ek)
![Page 103: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/103.jpg)
Proof of (2.9)
Theorem
(D1) E[eη(Yk+1−Yk) | Yk > a ∧Fk
]≤ ρ
Assume that (D1) hold. Then
(2.9) Pr (τa > k | F0) ≤ eη(Y0−a)ρk
Proof.By (D1) Zk := eηYk∧τρ−k∧τ is a supermartingale, so
eηY0 = Z0 ≥ E [Zk | F0] = E[eηYk∧τρ−k∧τ | F0
](2)
By (2) and the law of total probability
eηY0 ≥ Pr (τa > k | F0)E[eηYk∧τρ−k∧τ | τa > k ∧F0
]
= Pr (τa > k | F0)E[eηYkρ−k | τa > k ∧F0
]≥ Pr (τa > k | F0) eηaρ−k
![Page 104: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/104.jpg)
Proof of (2.9)
Theorem
(D1) E[eηYk+1ρ−1 | Yk > a ∧Fk
]≤ eηYk
Assume that (D1) hold. Then
(2.9) Pr (τa > k | F0) ≤ eη(Y0−a)ρk
Proof.By (D1) Zk := eηYk∧τρ−k∧τ is a supermartingale, so
eηY0 = Z0 ≥ E [Zk | F0] = E[eηYk∧τρ−k∧τ | F0
](2)
By (2) and the law of total probability
eηY0 ≥ Pr (τa > k | F0)E[eηYk∧τρ−k∧τ | τa > k ∧F0
]
= Pr (τa > k | F0)E[eηYkρ−k | τa > k ∧F0
]≥ Pr (τa > k | F0) eηaρ−k
![Page 105: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/105.jpg)
Proof of (2.9)
Theorem
(D1) E[eηYk+1ρ−1 | Yk > a ∧Fk
]≤ eηYk
Assume that (D1) hold. Then
(2.9) Pr (τa > k | F0) ≤ eη(Y0−a)ρk
Proof.By (D1) Zk := eηYk∧τρ−k∧τ is a supermartingale, so
eηY0 = Z0 ≥ E [Zk | F0] = E[eηYk∧τρ−k∧τ | F0
](2)
By (2) and the law of total probability
eηY0 ≥ Pr (τa > k | F0)E[eηYk∧τρ−k∧τ | τa > k ∧F0
]
= Pr (τa > k | F0)E[eηYkρ−k | τa > k ∧F0
]≥ Pr (τa > k | F0) eηaρ−k
![Page 106: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/106.jpg)
Proof of (2.9)
Theorem
(D1) E[eηYk+1ρ−1 | Yk > a ∧Fk
]≤ eηYk
Assume that (D1) hold. Then
(2.9) Pr (τa > k | F0) ≤ eη(Y0−a)ρk
Proof.By (D1) Zk := eηYk∧τρ−k∧τ is a supermartingale, so
eηY0 = Z0 ≥ E [Zk | F0] = E[eηYk∧τρ−k∧τ | F0
](2)
By (2) and the law of total probability
eηY0 ≥ Pr (τa > k | F0)E[eηYk∧τρ−k∧τ | τa > k ∧F0
]= Pr (τa > k | F0)E
[eηYkρ−k | τa > k ∧F0
]
≥ Pr (τa > k | F0) eηaρ−k
![Page 107: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/107.jpg)
Proof of (2.9)
Theorem
(D1) E[eηYk+1ρ−1 | Yk > a ∧Fk
]≤ eηYk
Assume that (D1) hold. Then
(2.9) Pr (τa > k | F0) ≤ eη(Y0−a)ρk
Proof.By (D1) Zk := eηYk∧τρ−k∧τ is a supermartingale, so
eηY0 = Z0 ≥ E [Zk | F0] = E[eηYk∧τρ−k∧τ | F0
](2)
By (2) and the law of total probability
eηY0 ≥ Pr (τa > k | F0)E[eηYk∧τρ−k∧τ | τa > k ∧F0
]= Pr (τa > k | F0)E
[eηYkρ−k | τa > k ∧F0
]≥ Pr (τa > k | F0) eηaρ−k
![Page 108: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/108.jpg)
(C1) and (C2) =⇒ (D1)
(C1) E [Yk+1 − Yk | Yk > a ∧Fk] ≤ −ε0
(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E[eλZ]
= D <∞ for a λ > 0.
(D1) E[eη(Yk+1−Yk) | Yk > a ∧Fk
]≤ ρ
LemmaAssume (C1) and (C2). Then (D1) holds when ρ ≥ 1− ηε0 + η2c,
and 0 < η ≤ minλ, ε0/c where c :=∑∞
k=2λk−2
k! E[Zk].
Proof.Let X := (Yk+1 − Yk | Yk > a ∧Fk).By (C2) it holds, |X| ≺ Z, so E
[Xk]≤ E
[|X|k
]≤ E
[Zk].
From ex =∑∞
k=0 xk/(k!) and linearity of expectation
0 < E[eηX
]= 1 + ηE [X] +
∞∑k=2
ηk
k!E[Xk]≤ ρ.
![Page 109: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/109.jpg)
(C2) =⇒ (D2)
(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E[eλZ]
= D <∞ for a λ > 0.
(D2) E[eη(Yk+1−a) | Yk ≤ a ∧Fk
]≤ D
TheoremAssume (C2) and 0 < η ≤ λ. Then (D2) holds.
Proof.If Yk ≤ a then Yk+1 − a ≤ Yk+1 − Yk ≤ |Yk+1 − Yk|, so
E[eη(Yk+1−a) | Yk ≤ a ∧Fk
]≤ E
[eλ|Yk+1−Yk| | Yk ≤ a ∧Fk
]Furthermore, by (C2)
E[eλ|Yk+1−Yk| | Yk ≤ a ∧Fk
]≤ E
[eλZ]
= D.
![Page 110: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/110.jpg)
(C1) and (C2) =⇒ (D1) and (D2)
(C1) E [Yk+1 − Yk | Yk > a ∧Fk] ≤ −ε0
(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E[eλZ]
= D <∞ for a λ > 0.
(D1) E[eη(Yk+1−Yk) | Yk > a ∧Fk
]≤ ρ
(D2) E[eη(Yk+1−a) | Yk ≤ a ∧Fk
]≤ D
LemmaAssume (C1) and (C2). Then (D1) and (D2) hold when
ρ ≥ 1− ηε0 + η2c and 0 < η ≤ minλ, ε0/c
where c :=∑∞
k=2λk−2
k! E[Zk]
= (D − 1− λE [Z])λ−2 > 0.
Corollary
Assume (C1), (C2) and 0 < δ < 1. Then
(D1) and (D2) hold for
η := minλ, δε0/c
=⇒ δε0 ≥ ηc
ρ := 1− (1− δ)ηε0
= 1− ηε0 + ηδε0 ≥ 1− ηε+ η2c
![Page 111: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/111.jpg)
(C1) and (C2) =⇒ (D1) and (D2)
(C1) E [Yk+1 − Yk | Yk > a ∧Fk] ≤ −ε0
(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E[eλZ]
= D <∞ for a λ > 0.
(D1) E[eη(Yk+1−Yk) | Yk > a ∧Fk
]≤ ρ
(D2) E[eη(Yk+1−a) | Yk ≤ a ∧Fk
]≤ D
LemmaAssume (C1) and (C2). Then (D1) and (D2) hold when
ρ ≥ 1− ηε0 + η2c and 0 < η ≤ minλ, ε0/c
where c :=∑∞
k=2λk−2
k! E[Zk]
= (D − 1− λE [Z])λ−2 > 0.
Corollary
Assume (C1), (C2) and 0 < δ < 1. Then (D1) and (D2) hold for
η := minλ, δε0/c =⇒ δε0 ≥ ηcρ := 1− (1− δ)ηε0 = 1− ηε0 + ηδε0 ≥ 1− ηε+ η2c
![Page 112: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/112.jpg)
Reformulation of Hajek’s Theorem
TheoremIf there exist λ, ε > 0 and 1 < D <∞ such that for all k ≥ 0
(C1) E [Yk+1 − Yk | Yk > a ∧Fk] ≤ −ε0
(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E[eλZ]
= D
then for any δ ∈ (0, 1)
(2.9) Pr (τa > B | F0) ≤ eη(Y0−a)ρB
(*) Pr (τb < B | Y0 < a) ≤ BD(1−ρ) · e
η(a−b)
where η := minλ, δε0/c and ρ := 1− (1− δ)ηε0
1. Note that ln(ρ) ≤ ρ− 1 = −(1− δ)ηε0 so
Pr (τa > B | F0) ≤ eη(Y0−a)ρB = eη(Y0−a)eB ln(ρ)
≤ eη(Y0−a−B(1−δ)ε0).
2. c = (D − 1− λE [Z])λ−2 < D/λ2 so η ≥ minλ, δε0λ2/D.
![Page 113: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/113.jpg)
Reformulation of Hajek’s Theorem
TheoremIf there exist λ, ε > 0 and 1 < D <∞ such that for all k ≥ 0
(C1) E [Yk+1 − Yk | Yk > a ∧Fk] ≤ −ε0
(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E[eλZ]
= D
then for any δ ∈ (0, 1)
(2.9) Pr (τa > B | F0) ≤ eη(Y0−a)ρB
(*) Pr (τb < B | Y0 < a) ≤ BD(1−ρ) · e
η(a−b)
where η := minλ, δε0/c and ρ := 1− (1− δ)ηε0
1. Note that ln(ρ) ≤ ρ− 1 = −(1− δ)ηε0 so
Pr (τa > B | F0) ≤ eη(Y0−a)ρB = eη(Y0−a)eB ln(ρ)
≤ eη(Y0−a−B(1−δ)ε0).
2. c = (D − 1− λE [Z])λ−2 < D/λ2 so η ≥ minλ, δε0λ2/D.
![Page 114: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/114.jpg)
Reformulation of Hajek’s Theorem
TheoremIf there exist λ, ε > 0 and 1 < D <∞ such that for all k ≥ 0
(C1) E [Yk+1 − Yk | Yk > a ∧Fk] ≤ −ε0
(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E[eλZ]
= D
then for any δ ∈ (0, 1)
(2.9) Pr (τa > B | F0) ≤ eη(Y0−a−B(1−δ)ε0)
(*) Pr (τb < B | Y0 < a) ≤ BD(1−δ)ηε0 · e
η(a−b)
for some η ≥ minλ, δε0λ2/D.
1. Note that ln(ρ) ≤ ρ− 1 = −(1− δ)ηε0 so
Pr (τa > B | F0) ≤ eη(Y0−a)ρB = eη(Y0−a)eB ln(ρ)
≤ eη(Y0−a−B(1−δ)ε0).
2. c = (D − 1− λE [Z])λ−2 < D/λ2 so η ≥ minλ, δε0λ2/D.
![Page 115: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/115.jpg)
Simplified Drift Theorem [17]
We have already seen that
(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E[eλZ]
= D
implies Pr (|Yk+1 − Yk| ≥ j) ≤ De−λj for all j ∈ N0.
The simplified drift theorem replaces (C2) with
(S) Pr (Yk+1 − Yk ≥ j | Yk < b) ≤ r(n)(1 + δ)−j for all j ∈ N0.
and with some additional assumptions, provides a bound of type6
Pr(τb < 2c(b−a)
)≤ 2−Ω(b−a). (3)
I Until 2008, conditions (D1) and (D2) were used in EC.
I (D1) and (D2) can lead to highly tedious calculations.
I Oliveto and Witt were the first in EC to point out that themuch simpler to verify (C1), along with (S) is sufficient.
6See [17] for the exact statement.
![Page 116: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/116.jpg)
Simplified Drift Theorem [17]
We have already seen that
(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E[eλZ]
= D
implies Pr (|Yk+1 − Yk| ≥ j) ≤ De−λj for all j ∈ N0.
The simplified drift theorem replaces (C2) with
(S) Pr (Yk+1 − Yk ≥ j | Yk < b) ≤ r(n)(1 + δ)−j for all j ∈ N0.
and with some additional assumptions, provides a bound of type6
Pr(τb < 2c(b−a)
)≤ 2−Ω(b−a). (3)
I Until 2008, conditions (D1) and (D2) were used in EC.
I (D1) and (D2) can lead to highly tedious calculations.
I Oliveto and Witt were the first in EC to point out that themuch simpler to verify (C1), along with (S) is sufficient.
6See [17] for the exact statement.
![Page 117: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/117.jpg)
Part 6 - Population Drift
![Page 118: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/118.jpg)
Drift Analysis of Population-based Evolutionary Algorithms
I Evolutionary algorithms generally use populations.
I So far, we have analysed the drift of the (1+1) EA,ie an evolutionary algorithm with population size one.
I The state aggregation problem makes analysis ofpopulation-based EAs with classical drift theorems difficult:
How to define an appropriate distance function?I Should reflect the progress of the algorithmI Often hard to define for single-individual algorithmsI Highly non-trivial for population-based algorithms
=⇒ This part of the tutorial focuses on a drift theorem forpopulations which alleviates the state aggregation problem.
![Page 119: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/119.jpg)
Population-based Evolutionary Algorithms
Ptx
Require: ,Finite set X , and initial population P0 ∈ X λSelection mechanism psel : X λ ×X → [0, 1]Variation operator pmut : X × X → [0, 1]
for t = 0, 1, 2, . . . until termination condition dofor i = 1 to λ do
Sample i-th parent x according to psel(Pt, ·)Sample i-th offspring Pt+1(i) according to pmut(x, ·)
end forend for
![Page 120: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/120.jpg)
Selection and Variation - Example
10001110111011
10010110111001
Tournament Selection wrt
Bitwise Mutation
![Page 121: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/121.jpg)
Population Drift
Central Parameters
I Reproductive rate of selection mechanism psel
α0 = max1≤j≤λ
E [#offspring from parent j],
I Random walk process corresponding to variation operator pmut
Xk+1∼ pmut(Xk)
![Page 122: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/122.jpg)
Population Drift
Central Parameters
I Reproductive rate of selection mechanism psel
α0 = max1≤j≤λ
E [#offspring from parent j],
I Random walk process corresponding to variation operator pmut
Xk+1∼ pmut(Xk)
![Page 123: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/123.jpg)
Population Drift
Central Parameters
I Reproductive rate of selection mechanism psel
α0 = max1≤j≤λ
E [#offspring from parent j],
I Random walk process corresponding to variation operator pmut
Xk+1∼ pmut(Xk)
![Page 124: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/124.jpg)
Population Drift
Central Parameters
I Reproductive rate of selection mechanism psel
α0 = max1≤j≤λ
E [#offspring from parent j],
I Random walk process corresponding to variation operator pmut
Xk+1∼ pmut(Xk)
![Page 125: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/125.jpg)
Population Drift [12]
(C1P) ∀k E[eκ(g(Xk+1)−g(Xk)) | a < g(Xk) < b
]< 1/α0
TheoremDefine τb := mink ≥ 0 | g(Pk(i)) > b for some i ∈ [λ].
If there exists constants α0 ≥ 1 and κ > 0 such that
I psel has reproductive rate less than α0
I the random walk process corresponding to pmut satisfies (C1P)
and some other conditions hold,7 then for some constants c, c′ > 0
Pr(τb ≤ ec(b−a)
)= e−c
′(b−a)
7Some details are omitted. See Theorem 1 in [12] for all details.
![Page 126: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/126.jpg)
Population Drift: Decoupling Selection & Variation
Population driftIf there exists a κ > 0 such that
M∆mut(κ) < 1/α0
where
∆mut= g(Xk+1)− g(Xk)
Xk+1∼ pmut(Xk)
and
α0 = maxj
E [#offspring from parent j],
then the runtime is exponential.
![Page 127: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/127.jpg)
Population Drift: Decoupling Selection & Variation
Population driftIf there exists a κ > 0 such that
M∆mut(κ) < 1/α0
where
∆mut= g(Xk+1)− g(Xk)
Xk+1∼ pmut(Xk)
and
α0 = maxj
E [#offspring from parent j],
then the runtime is exponential.
Classical drift [6]If there exists a κ > 0 such that
M∆(κ) < 1
where
∆ = h(Pk+1)− h(Pk),
then the runtime is exponential.
![Page 128: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/128.jpg)
Conclusion
I Drift analysis is a powerful tool for analysis of EAsI Mainly used in EC to bound the expected runtime of EAsI Useful when the EA has non-monotonic progress,
eg. when the fitness value is a poor indicator of progress
I The “art” consists in finding a good distance functionI No simple receipe
I A large number of drift theorems are availableI Additive, multiplicative, variable, population drift...I Significant related literature from other fields than EC
I Not the only tool in the toolbox, alsoI Artificial fitness levels, Markov Chain theory, Concentration of
measure, Branching processes, Martingale theory, Probabilitygenerating functions, ...
![Page 129: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/129.jpg)
Acknowledgements
Thanks to
I David Hodge
I Carsten Witt, and
I Daniel Johannsen
for insightful discussions.
![Page 130: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/130.jpg)
References I
[1] Benjamin Doerr and Leslie Ann Goldberg.Drift analysis with tail bounds.In Proceedings of the 11th international conference on Parallel problem solvingfrom nature: Part I, PPSN’10, pages 174–183, Berlin, Heidelberg, 2010.Springer-Verlag.
[2] Benjamin Doerr, Daniel Johannsen, and Carola Winzen.Multiplicative drift analysis.In GECCO ’10: Proceedings of the 12th annual conference on Genetic andevolutionary computation, pages 1449–1456, New York, NY, USA, 2010. ACM.
[3] Stefan Droste, Thomas Jansen, and Ingo Wegener.On the analysis of the (1+1) Evolutionary Algorithm.Theoretical Computer Science, 276:51–81, 2002.
[4] Simon Fischer, Lars Olbrich, and Berthold Vocking.Approximating wardrop equilibria with finitely many agents.Distributed Computing, 21(2):129–139, 2008.
[5] Oliver Giel and Ingo Wegener.Evolutionary algorithms and the maximum matching problem.In Proceedings of the 20th Annual Symposium on Theoretical Aspects ofComputer Science (STACS 2003), pages 415–426, 2003.
![Page 131: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/131.jpg)
References II
[6] Bruce Hajek.Hitting-time and occupation-time bounds implied by drift analysis withapplications.Advances in Applied Probability, 14(3):502–525, 1982.
[7] Jun He and Xin Yao.Drift analysis and average time complexity of evolutionary algorithms.Artificial Intelligence, 127(1):57–85, March 2001.
[8] Jun He and Xin Yao.A study of drift analysis for estimating computation time of evolutionaryalgorithms.Natural Computing, 3(1):21–35, 2004.
[9] Jens Jagerskupper.Algorithmic analysis of a basic evolutionary algorithm for continuousoptimization.Theoretical Computer Science, 379(3):329–347, 2007.
[10] Jens Jagerskupper.A Blend of Markov-Chain and Drift Analysis.In Proceedings of the 10th International Conference on Parallel Problem Solvingfrom Nature (PPSN 2008), 2008.
![Page 132: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/132.jpg)
References III
[11] Daniel Johannsen.Random combinatorial structures and randomized search heuristics.PhD thesis, Universitat des Saarlandes, 2010.
[12] Per Kristian Lehre.Negative drift in populations.In Proceedings of Parallel Problem Solving from Nature - (PPSN XI), volume6238 of LNCS, pages 244–253. Springer Berlin / Heidelberg, 2011.
[13] Per Kristian Lehre and Carsten Witt.Black-box search by unbiased variation.Algorithmica, pages 1–20, 2012.
[14] Sean P. Meyn and Richard L. Tweedie.Markov Chains and Stochastic Stability.Springer-Verlag, 1993.
[15] B. Mitavskiy, J. E. Rowe, and C. Cannings.Theoretical analysis of local search strategies to optimize network communicationsubject to preserving the total number of links.International Journal of Intelligent Computing and Cybernetics, 2(2):243–284,2009.
[16] Frank Neumann, Dirk Sudholt, and Carsten Witt.Analysis of different mmas aco algorithms on unimodal functions and plateaus.Swarm Intelligence, 3(1):35–68, 2009.
![Page 133: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/133.jpg)
References IV
[17] Pietro Oliveto and Carsten Witt.Simplified drift analysis for proving lower bounds inevolutionary computation.Algorithmica, pages 1–18, 2010.10.1007/s00453-010-9387-z.
[18] Pietro S. Oliveto and Carsten Witt.Simplified drift analysis for proving lower bounds in evolutionary computation.Technical Report Reihe CI, No. CI-247/08, SFB 531, Technische UniversitatDortmund, Germany, 2008.
[19] Galen H. Sasaki and Bruce Hajek.The time complexity of maximum matching by simulated annealing.Journal of the ACM, 35(2):387–403, 1988.
[20] Dirk Sudholt.General lower bounds for the running time of evolutionary algorithms.In Proceedings of Parallel Problem Solving from Nature - (PPSN XI), volume6238 of LNCS, pages 124–133. Springer Berlin / Heidelberg, 2010.
[21] David Williams.Probability with Martingales.Cambridge University Press, 1991.
![Page 134: Drift Analysis - A Tutorial](https://reader034.fdocuments.net/reader034/viewer/2022052411/5562dbd4d8b42a6c498b55fb/html5/thumbnails/134.jpg)
References V
[22] Carsten Witt.Optimizing linear functions with randomized search heuristics - the robustness ofmutation.In Christoph Durr and Thomas Wilke, editors, 29th International Symposium onTheoretical Aspects of Computer Science (STACS 2012), volume 14 of LeibnizInternational Proceedings in Informatics (LIPIcs), pages 420–431, Dagstuhl,Germany, 2012. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.