Random Forest for the Contextual Bandit Problem...

19
1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin Allesiardo, Tanguy Urvoy, Fabrice Clerot (AISTATS 2016) Jungtaek Kim ([email protected]) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do, Republic of Korea Mar 28, 2017

Transcript of Random Forest for the Contextual Bandit Problem...

Page 1: Random Forest for the Contextual Bandit Problem …mlg.postech.ac.kr/~readinglist/slides/20170328.pdf · 1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin

1/19

Random Forest for the Contextual Bandit ProblemRaphael Feraud, Robin Allesiardo, Tanguy Urvoy, Fabrice

Clerot (AISTATS 2016)

Jungtaek Kim ([email protected])

Machine Learning Group,Department of Computer Science and Engineering, POSTECH,

77-Cheongam-ro, Nam-gu, Pohang-si 37673,Gyungsangbuk-do, Republic of Korea

Mar 28, 2017

Page 2: Random Forest for the Contextual Bandit Problem …mlg.postech.ac.kr/~readinglist/slides/20170328.pdf · 1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin

2/19

Table of Contents

PreliminaryDecision TreeRandom Decision ForestsMulti-Armed BanditContextual Bandit

Random Forest for the Contextual Bandit Problem

Page 3: Random Forest for the Contextual Bandit Problem …mlg.postech.ac.kr/~readinglist/slides/20170328.pdf · 1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin

3/19

Preliminary

Page 4: Random Forest for the Contextual Bandit Problem …mlg.postech.ac.kr/~readinglist/slides/20170328.pdf · 1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin

4/19

Decision Tree

I Decision tree is used for classification and regression.

I Each node has a set, a sum of children’s sets. If a nodebelongs to a binary tree, Sj = SL

j ∪ SRj and SL

j ∩ SRj = ∅ are

satisfied.

I Tree parameters of split functions are estimated to optimizemetrics to compute a objective function of each node andreach stopping criteria.

I MetricsI Gini impurity: GINI (t) = 1 −

∑j (p(j |t))

2.

I Information gain: I = H(S)−∑

i|Si ||S| H(Si ).

I Stopping CriteriaI Maximum depth limitI Node’s set size limit

Page 5: Random Forest for the Contextual Bandit Problem …mlg.postech.ac.kr/~readinglist/slides/20170328.pdf · 1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin

5/19

Decision Tree

Figure 1: Training process (left) estimates the tree parameters tomaximize f (Sj ,SL

j ,SRj , θ) for node j and testing process (right) is to

reach an unseen data to a leaf node and determine an output using apredictor p(c |v).

Page 6: Random Forest for the Contextual Bandit Problem …mlg.postech.ac.kr/~readinglist/slides/20170328.pdf · 1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin

6/19

Random Decision Forests

I A random decision forest is an ensemble of randomly traineddecision trees.

I Commonly, it enhances weak learners to strong learners.I The methods to build a randomized decision tree are

I Random training dataset samplingI Bagging: C̃bag (x) = MajorityVote{C (S∗b , x)}Bb=1.I Random Forests: Refinement of bagging.I Boosting: C (x) = sign[

∑Mm=1 αmCm(x)].

I Randomized node optimization

I A leaf predictor predicts an output based on a distributionover the classes that the test data reached to the leaf mightbe belonged.

Page 7: Random Forest for the Contextual Bandit Problem …mlg.postech.ac.kr/~readinglist/slides/20170328.pdf · 1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin

7/19

Multi-Armed Bandit

Page 8: Random Forest for the Contextual Bandit Problem …mlg.postech.ac.kr/~readinglist/slides/20170328.pdf · 1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin

8/19

Multi-Armed Bandit

I For K arms, ra,t is a reward that is sampled from unknownstochastic process, where a ∈ {1, . . . ,K } and t is each timestep.

I Iteratively, an agent chooses an arm at ∈ {1, . . . ,K }, andreceives a reward rt = rat ,t .

I A sequential decision is built as

at = ft(a1, r1, . . . , at−1, rt−1).

I Cumulative regret goal is to minimize

Rn =

(max

i=1,...,KE

n∑t=1

ri ,t

)− E

n∑t=1

gat ,t .

Page 9: Random Forest for the Contextual Bandit Problem …mlg.postech.ac.kr/~readinglist/slides/20170328.pdf · 1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin

9/19

Contextual Bandit

I Octopus of conventional multi-armed bandit problem isidentical.

I A feature vector, xt summarizes information of both the userand the arm at .

I Common goal for regret bound is achieving O(√T ).

Page 10: Random Forest for the Contextual Bandit Problem …mlg.postech.ac.kr/~readinglist/slides/20170328.pdf · 1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin

10/19

Random Forest for the ContextualBandit Problem

Page 11: Random Forest for the Contextual Bandit Problem …mlg.postech.ac.kr/~readinglist/slides/20170328.pdf · 1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin

11/19

Random Forest for the Contextual Bandit Problem

I Based on the optimal decision stump, an online random forestalgorithm for the contextual bandit problem is proposed.

I The decision stumps are recursively stacked in a randomcollection of decision trees.

I Its computational cost is O(LMDT ) where L is the number ofdecision trees, D is .

Page 12: Random Forest for the Contextual Bandit Problem …mlg.postech.ac.kr/~readinglist/slides/20170328.pdf · 1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin

12/19

Gentle Start

Page 13: Random Forest for the Contextual Bandit Problem …mlg.postech.ac.kr/~readinglist/slides/20170328.pdf · 1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin

13/19

Random Forest for the Contextual Bandit Problem

I It is near optimal. The dependence of the sample complexityupon the number of contextual variables is logarithmic, andthe computational cost of the proposed algorithm with respectto the time horizon is linear.

Page 14: Random Forest for the Contextual Bandit Problem …mlg.postech.ac.kr/~readinglist/slides/20170328.pdf · 1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin

14/19

Procedure: Random Forest for the Contextual BanditProblem

1. Variable selection

2. Action selection

3. Tree update

Page 15: Random Forest for the Contextual Bandit Problem …mlg.postech.ac.kr/~readinglist/slides/20170328.pdf · 1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin

15/19

Variable Selection

Page 16: Random Forest for the Contextual Bandit Problem …mlg.postech.ac.kr/~readinglist/slides/20170328.pdf · 1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin

16/19

Action Selection

Page 17: Random Forest for the Contextual Bandit Problem …mlg.postech.ac.kr/~readinglist/slides/20170328.pdf · 1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin

17/19

Decision Stump

Page 18: Random Forest for the Contextual Bandit Problem …mlg.postech.ac.kr/~readinglist/slides/20170328.pdf · 1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin

18/19

θ-Optimal Greedy Tree

Page 19: Random Forest for the Contextual Bandit Problem …mlg.postech.ac.kr/~readinglist/slides/20170328.pdf · 1/19 Random Forest for the Contextual Bandit Problem Raphael Feraud, Robin

19/19

Bandit Forest