Random Forest for the Contextual Bandit Problem...

Random Forest for the Contextual Bandit ProblemRaphael Feraud, Robin Allesiardo, Tanguy Urvoy, Fabrice

Clerot (AISTATS 2016)

Jungtaek Kim (jtkim@postech.ac.kr)

Machine Learning Group,Department of Computer Science and Engineering, POSTECH,

77-Cheongam-ro, Nam-gu, Pohang-si 37673,Gyungsangbuk-do, Republic of Korea

Mar 28, 2017

Table of Contents

PreliminaryDecision TreeRandom Decision ForestsMulti-Armed BanditContextual Bandit

Random Forest for the Contextual Bandit Problem

Preliminary

Decision Tree

I Decision tree is used for classification and regression.

I Each node has a set, a sum of children’s sets. If a nodebelongs to a binary tree, Sj = SL

j ∪ SRj and SL

j ∩ SRj = ∅ are

satisfied.

I Tree parameters of split functions are estimated to optimizemetrics to compute a objective function of each node andreach stopping criteria.

I MetricsI Gini impurity: GINI (t) = 1 −

∑j (p(j |t))

I Information gain: I = H(S)−∑

i|Si ||S| H(Si ).

I Stopping CriteriaI Maximum depth limitI Node’s set size limit

Decision Tree

Figure 1: Training process (left) estimates the tree parameters tomaximize f (Sj ,SL

j ,SRj , θ) for node j and testing process (right) is to

reach an unseen data to a leaf node and determine an output using apredictor p(c |v).

Random Decision Forests

I A random decision forest is an ensemble of randomly traineddecision trees.

I Commonly, it enhances weak learners to strong learners.I The methods to build a randomized decision tree are

I Random training dataset samplingI Bagging: C̃bag (x) = MajorityVote{C (S∗b , x)}Bb=1.I Random Forests: Refinement of bagging.I Boosting: C (x) = sign[

∑Mm=1 αmCm(x)].

I Randomized node optimization

I A leaf predictor predicts an output based on a distributionover the classes that the test data reached to the leaf mightbe belonged.

Multi-Armed Bandit

I For K arms, ra,t is a reward that is sampled from unknownstochastic process, where a ∈ {1, . . . ,K } and t is each timestep.

I Iteratively, an agent chooses an arm at ∈ {1, . . . ,K }, andreceives a reward rt = rat ,t .

I A sequential decision is built as

at = ft(a1, r1, . . . , at−1, rt−1).

I Cumulative regret goal is to minimize

i=1,...,KE

n∑t=1

)− E

n∑t=1

gat ,t .

Contextual Bandit

I Octopus of conventional multi-armed bandit problem isidentical.

I A feature vector, xt summarizes information of both the userand the arm at .

I Common goal for regret bound is achieving O(√T ).

Random Forest for the ContextualBandit Problem

I Based on the optimal decision stump, an online random forestalgorithm for the contextual bandit problem is proposed.

I The decision stumps are recursively stacked in a randomcollection of decision trees.

I Its computational cost is O(LMDT ) where L is the number ofdecision trees, D is .

Gentle Start

I It is near optimal. The dependence of the sample complexityupon the number of contextual variables is logarithmic, andthe computational cost of the proposed algorithm with respectto the time horizon is linear.

Procedure: Random Forest for the Contextual BanditProblem

1. Variable selection

2. Action selection

3. Tree update

Variable Selection

Action Selection

Decision Stump

θ-Optimal Greedy Tree

Bandit Forest

Random Forest for the Contextual Bandit Problem...

Documents

Transcript of Random Forest for the Contextual Bandit Problem...

bandit hideout

Mostly Exploration-Free Algorithms for Contextual Bandits · 2020. 7. 28. · standard bandit algorithms may unnecessarily explore. Motivated by these results, we introduce Greedy-First,

Bayesian Contextual Multi-armed Bandits Contextual Multi-armed Bandits ... The Epoch-Greedy Algorithm for Contextual Multi-armed ... topic model w/ a Bayesian multi-armed bandit analysis

Contexual bandit @TokyoWebMining

Clean Bandit

Contextual Bandit Survey

Introduction to Contextual Multi-bandit Algorithm to Contextual Multi-bandit... · Outline §Introduction §Motivation §Contextual-free Bandit Algorithms §Contextual Bandit Algorithms

Wps Bandit

Bandit 2012

Sosial” Bandit

Random Forest for the Contextual Bandit Problem

Bandit Learning for Diversiﬁed Interactive Recommendation · accuracy and diversity, especially on larger and sparser datasets. 3 Problem Formulation We employ contextual bandit

Profit Bandit Webinar

A Smoothed Analysis of Online Lasso for the Sparse Linear ... · the theoretical analysis. 1. Introduction Contextual bandit algorithms have become a referenced solution for sequential

A Contextual Bandit Bake-o - arxiv.org · A Contextual Bandit Bake-off A Contextual Bandit Bake-o Alberto Bietti ... and observes a loss for the chosen action only. Many real-world

Contextual models for object detection using boosted random fields

Contextual Bandit Exploration

Contextual Bandit Algorithms with Supervised Learning ...proceedings.mlr.press/v15/beygelzimer11a/beygelzimer11a.pdf · Contextual Bandit Algorithms with Supervised Learning Guarantees

Bandit Chippers

Conductrics bandit basicsemetrics1016