Report - problem - arXiv · In this paper, for the first time, we show that Thompson Sampling algorithm achieves logarithmic expected regret for the stochastic multi-armed bandit problem.

Please pass captcha verification before submit form