Random Forest for the Contextual Bandit Problem...

Post on 29-Aug-2018

224 views 0 download

Transcript of Random Forest for the Contextual Bandit Problem...

1/19

Random Forest for the Contextual Bandit ProblemRaphael Feraud, Robin Allesiardo, Tanguy Urvoy, Fabrice

Clerot (AISTATS 2016)

Jungtaek Kim (jtkim@postech.ac.kr)

Machine Learning Group,Department of Computer Science and Engineering, POSTECH,

77-Cheongam-ro, Nam-gu, Pohang-si 37673,Gyungsangbuk-do, Republic of Korea

Mar 28, 2017

2/19

Table of Contents

PreliminaryDecision TreeRandom Decision ForestsMulti-Armed BanditContextual Bandit

Random Forest for the Contextual Bandit Problem

3/19

Preliminary

4/19

Decision Tree

I Decision tree is used for classification and regression.

I Each node has a set, a sum of children’s sets. If a nodebelongs to a binary tree, Sj = SL

j ∪ SRj and SL

j ∩ SRj = ∅ are

satisfied.

I Tree parameters of split functions are estimated to optimizemetrics to compute a objective function of each node andreach stopping criteria.

I MetricsI Gini impurity: GINI (t) = 1 −

∑j (p(j |t))

2.

I Information gain: I = H(S)−∑

i|Si ||S| H(Si ).

I Stopping CriteriaI Maximum depth limitI Node’s set size limit

5/19

Decision Tree

Figure 1: Training process (left) estimates the tree parameters tomaximize f (Sj ,SL

j ,SRj , θ) for node j and testing process (right) is to

reach an unseen data to a leaf node and determine an output using apredictor p(c |v).

6/19

Random Decision Forests

I A random decision forest is an ensemble of randomly traineddecision trees.

I Commonly, it enhances weak learners to strong learners.I The methods to build a randomized decision tree are

I Random training dataset samplingI Bagging: C̃bag (x) = MajorityVote{C (S∗b , x)}Bb=1.I Random Forests: Refinement of bagging.I Boosting: C (x) = sign[

∑Mm=1 αmCm(x)].

I Randomized node optimization

I A leaf predictor predicts an output based on a distributionover the classes that the test data reached to the leaf mightbe belonged.

7/19

Multi-Armed Bandit

8/19

Multi-Armed Bandit

I For K arms, ra,t is a reward that is sampled from unknownstochastic process, where a ∈ {1, . . . ,K } and t is each timestep.

I Iteratively, an agent chooses an arm at ∈ {1, . . . ,K }, andreceives a reward rt = rat ,t .

I A sequential decision is built as

at = ft(a1, r1, . . . , at−1, rt−1).

I Cumulative regret goal is to minimize

Rn =

(max

i=1,...,KE

n∑t=1

ri ,t

)− E

n∑t=1

gat ,t .

9/19

Contextual Bandit

I Octopus of conventional multi-armed bandit problem isidentical.

I A feature vector, xt summarizes information of both the userand the arm at .

I Common goal for regret bound is achieving O(√T ).

10/19

Random Forest for the ContextualBandit Problem

11/19

Random Forest for the Contextual Bandit Problem

I Based on the optimal decision stump, an online random forestalgorithm for the contextual bandit problem is proposed.

I The decision stumps are recursively stacked in a randomcollection of decision trees.

I Its computational cost is O(LMDT ) where L is the number ofdecision trees, D is .

12/19

Gentle Start

13/19

Random Forest for the Contextual Bandit Problem

I It is near optimal. The dependence of the sample complexityupon the number of contextual variables is logarithmic, andthe computational cost of the proposed algorithm with respectto the time horizon is linear.

14/19

Procedure: Random Forest for the Contextual BanditProblem

1. Variable selection

2. Action selection

3. Tree update

15/19

Variable Selection

16/19

Action Selection

17/19

Decision Stump

18/19

θ-Optimal Greedy Tree

19/19

Bandit Forest