Introduction of "TrailBlazer" algorithm

BLAZING THE TRAILS BEFORE BEATING THE PATH:

SAMPLE-EFFICIENT MONTE-CARLO PLANNING

KATSUKI OHTO

@NIPS2016-YOMI

2017/1/19

INTRODUCED PAPER• Blazing the trails before beating the path:

Sample - efficient Monte-Carlo planning(JB. Grill, M. Valko and R. Munos)

• NIPS 2016 accepted paper (poster session)• Abstract starts with “You are a robot…”• http://papers.nips.cc/paper/6253-blazing-the-trails-before-

beating-the-path-sample-efficient-monte-carlo-planning

TRAILBLAZER

• Nested-fashion Monte-Carlo Planning Algorithm• Problem settings:

MDP (contains MAX nodes and AVG nodes)Actions per each state : Finite State transition candidates : Finite or Infinite• Strong theoretical guarantee

AIM• Input : an MDP (Markov Decision Process)

(discount factor , maximum number of valid actions ), (> 0), (0 < < 1)

• Output : estimated value of current state

• Aim : Get good estimation of real value of current statesuch as

（ means probability of ）with the minimum number of calls to the generative model (state transition function)

1 PLAYER TREE MODELIN STOCHASTIC ENVIRONMENT• Each MAX node means an

opportunity to decide action

• Each AVG node means stochastic state transition

ALGORITHM OVERVIEW

• Global Initializationset , as global valueset as an argument of root node

• Recursive algorithm

ALGORITHM OVERVIEW 2• In both MAX nodes and AVG nodes,

arguments are (desired branching factor)and (admissible estimation error)

• If is large, we can search many children, but we need much time (dilemma)

• If is small, we can search deeply, but we need much time (dilemma)

ALGORITHMFOR AVG NODES• Input : and • Output : estimated value• If admissible error is large, ignore

successive reward• Fill transition samples

(and store immediate reward)• search all of sampled next states• return averaged immediate reward +

estimated successive reward

ALGORITHMFOR MAX NODES• Input : and • Output : estimated value• Fill candidate action pool by all valid actions• is a value like standard error of estimation• Search candidate actions repeatedly until

“Only 1 action left” or “Error might be small”• If “Error might be small”

then return estimated value of best actionelse search best action 1 more time carefully

SAMPLE COMPLEXITY OF TRAILBLAER

• Sample Complexity is a measure of performance of algorithm

• If N (the number of next states) is finite, on condition that (in detail in the paper)else on condition that is a measure of difficulty to identify near-optimal nodes

Introduction of "TrailBlazer" algorithm

Technology

Transcript of Introduction of "TrailBlazer" algorithm

Trailblazer Introduction by Nick Sutterer

2005 Chevrolet TrailBlazer/TrailBlazer EXT Owner Manual M · 2005 Chevrolet TrailBlazer/TrailBlazer EXT Owner Manual M. GENERAL MOTORS, ... Many people read the owner manual from

Trailblazer Times

Hydraulic Reservoir System - TrailBlazer Attachmentstrailblazerattachments.com/assets/trailblazer-hydraulic...2019/07/30 · Hydraulic Reservoir System 6 Trailblazer Attachments,

TRAILBLAZER PRODUCTS

Trailblazer 325 Trailblazer 275

Catalog 625-2 Trailblazer Air-Cooled Chillers...InTroduCTIon 3 CAT 625-2 • AIR-COOLED SCROLL MODEL AMZ CHILLERS InTroduCTIon Daikin Trailblazer ® AMZ air-cooled chillers are a new

Trailblazer - Buckeye Trailbuckeyetrail.org/Trailblazer/2013-03-Fall.pdf2 BTA Trailblazer Fall 2013 Trailblazer Published Quarterly by the Buckeye Trail Association, Inc. P.o. Box

Southwark Prevention Trailblazer delivery plans... · 1 Southwark Early Adopter Prevention Trailblazer Pilot – Detailed Delivery Plan as at 26th January 2017 Introduction The London

JUNE TRAILBLAZER

SIFT Algorithm Introduction

Trailblazer PDF

Introduction to Genetic algorithm

Review of TrailBlazer Health Enterprises, LLC Medicare ...TrailBlazer Health Enterprises . TrailBlazer Health Enterprises, LLC (TrailBlazer), a FI, provides a broad range of services

Algorithm Introduction

TrailBlazer 4x4

Trailblazer 325

Paired Open-Ended Trailblazer (POET): Endlessly Generating ...The paired open-ended trailblazer (POET) algorithm introduced in this paper aims to confront open endedness directly,

TrailBlazer is $38 (plus tax) Super TrailBlazer is $68 ...

Developing a Trailblazer Apprenticeship Guidancebuilduk.org/wp-content/uploads/2017/06/Trailblazer-Guidance-June-2… · Introduction - What is the change? - How will this affect