ujava.org workshop : Reinforcement Learning with Thompson Sampling
-
Upload
- -
Category
Data & Analytics
-
view
306 -
download
2
Transcript of ujava.org workshop : Reinforcement Learning with Thompson Sampling
www.idosi.com .
www.idosi.com .
www.idosi.com .
www.idosi.com .
Reinforcement Learning with Thompson Sampling
(3rd)
ujava.org workshop
2016-08-28
www.idosi.com
CEO Shindong KANG
()
ujava.org
spaceapi.org
Reinforcement Learning for Brick Game
Reinforcement Learning
Forecast
Forecast with probability
Probability ()
Conditional Probability ( )
Bayesian Probability ( )
Bayes Rule Words
Bayesian Probability ( )
P(fair|H) = ?
P(A) = P(fair) = P(B) = P(H) = P(B|A) = P(H|fair) =
1--- = -- 3
Brownian motion, Gaussian distribution
Markov Process
Stochastic Matrix
Stochastic Matrix
0.4 0.60.7 0.3
Exploitation and Exploration ( and )
State-action exploration vs. Parameter exploration
Multi-armed bandit problem
Simulated Bandit Performance
Multi-armed bandit problem
Multi-Armed Bandit Algorithms
MAB Reward
Gaussian Distribution
Gaussian Distribution
GMM (Gaussian Mixture Model)
Gaussian Mixture Model
Gaussian Mixture Model
Function's Probability Distribution
Function's Probability Distribution ?
Function's Probability Distribution
y = ax^2 +b
Function's Probability Distribution with Gaussian Distribution
y = ax^2 +b
Function's Probability Distribution with Gaussian Distribution
Gaussian Process Regreesion
Gaussian Process
From C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006
Bayesian Optimization
Acquisition function
Why Bayesian Optimization works
Bayesian reasoners
Intelligent user interfaces regression
Slot Machine
Multi Armed Bandit
MAB Regret ()
A/B Testing
Greedy Algorithm
Greedy Algorithm (Search Maximum)
Greedy Algorithm (Search Tree)
epsilon Greedy (epsilon = exploration)
Softmax
Softmax
UCB
argmax
UCB
UCB1
Log graph
UCB1
Indicator function ()
Thompson sampling
Probability Matching,
Bayesian Bandit
Thompson sampling
Thompson sampling
(from SlideShare Slice Technologies)
Thompson sampling
Thompson sampling (area = 1)
Thompson sampling
Thompson sampling
Thompson sampling
Thompson sampling
19 / (19 + 9) = 19 / 28 = 0.679
59 / (59 + 39) = 59 / 98 = 0.60
Thompson sampling
Thompson sampling
Thompson sampling
Thompson sampling
Thompson sampling Algorithm for Bernoulli bandits
Thompson sampling Algorithm for general stochastic bandits
Thompson sampling
Thompson sampling
Thompson sampling
Thompson sampling
Thompson sampling
Thompson sampling
Thompson sampling
Thompson sampling
Thompson sampling
Thompson sampling
Thompson sampling
Thompson sampling
Thompson sampling
Thompson sampling
Thompson sampling
Thompson sampling
Thompson sampling
Thompson sampling
Multiplay Thompson Sampling
(from MS Research)
Multiplay Thompson sampling
Multi-play Thompson Sampling (MP-TS)
Improved Multi-play Thompson Sampling (IMP-TS)
Thank you !
()Intelligent City Ltd.
Shindong KANG