scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game...

scalingupRLwithfunc1onapproxima1on

arjun.chandra@telenor.com

Human-levelcontrolthroughdeepreinforcementlearning,Mnihet.al.,Nature518,Feb2015hHp://www.nature.com/nature/journal/v518/n7540/full/nature14236.html

•pixelinput

•18joys-ck/bu3onposi-onsoutput

•changeingamescoreasfeedback

• convolu-onalnetrepresen-ngQ

•backpropaga-onfortraining!

humanlevelgamecontrol

neuralnetwork

convolu1on,weightsharing,andpooling

sharedfeaturedetector/kernel/filterw

pooled

featuremap

max(window)

fewerparametersduetosharingandpooling!

reverseprojec-onsofneuronoutputsinpixelspace

whatdoesadeepneuralnetworkdo?

backpropaga1on?Whatisthetargetagainstwhichtominimiseerror?

prac1callyspeaking…minimiseMSEbySGD

experiencereplay

savecurrenttransi-on(s,a,r,s’)inmemory

every1mestep

randomlysampleasetof(s,a,r,s’)frommemoryfortrainingQnetwork

(insteadoflearningfromcurrentstatetransi-on)

everystep=i.i.d+learnfromthepast

freezingtargetQmovingtarget=>oscilla-ons

stabiliselearningbyfixingtarget,movingiteverynowandthen

freeze

doubleDQN

evalua1onoftargetac1on

selec1onoftargetac1on

maxQa’

DeepReinforcementLearningwithDoubleQ-learningvanHasseltet.al.,AAAI2016hHps://arxiv.org/pdf/1509.06461v3.pdf

priori1sedexperiencereplay

sample(s,a,r,s’)frommemory

basedonsurprise

Priori1sedExperienceReplaySchaulet.al.,ICLR2016hHps://arxiv.org/pdf/1511.05952v4.pdf

Combiningdecoupling(double),priori1sedreplay,andduellinghelps!

duellingarchitecture

Q(s,a)=V(s,u)+A(s,a,v)

DuelingNetworkArchitecturesforDeepRLWanget.al.,ICML2016hHps://arxiv.org/pdf/1511.06581v3.pdf

ac1ona’sadvantage

V~avgQQ1

arrowsrepresentadvantageofanac1on

advantage?e.g.4ac1ons,

V=E[Q]~Qaveragedacrossac1ons

howevertrainingis

makingdeepRLfasterandwilder(more

applicableintherealworld)!

dataefficientexplora1on?

parallelism?

transferlearning?

makinguseofamodel?

Q Q Q Q Q

Qt Qt Qt Qt Qt

sharedparamsforQandtargetQ

parallellearnersgemngindividual

experiences

lock-freeparamupdates

AsynchronousMethodsforDeepReinforcementLearning,Mnihet.al.,ICML2016hHp://jmlr.org/proceedings/papers/v48/mniha16.pdf

codeforyoutoplaywith...

Telenor’sownimplementa1onofasynchronousdeepRL:hHps://github.com/traai/async-deep-rl

hHps://openrl.slack.comLet’skeeptheconversa1ongoing:

Joinusingyourstudent.matnat.uio.noemailaddress!

intern with us in summer 2017 CV and motivation to:

arjun.chandra@telenor.com | tina@telenordigital.com (2 positions)

topic still to be finalised, but it will be at the intersection of software

engineering and machine learning

scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game...

Documents

Transcript of scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game...

Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Pooling The Carpool

Pooling Operating Gas Pooling Areas

CS6320 – Connection Pooling

Msom InventInventory Pooling to Deliver Differentiated Serviceory Pooling Informal

Car Pooling

Equipment Pooling

LHM-Pooling - Das Pooling von Ladehilfsmitteln!

Pooling Grua35K

RL 55.950 SITE BOUNDARY...RL RL GROUND 9.850 RL LEVEL 01 14.350 RL LEVEL 02 18.150 RL LEVEL 03 21.950 RL LEVEL 04 25.750 RL LEVEL 05 29.550 RL LEVEL 06 33.350 RL LEVEL 07 37.150 RL

Jet Pooling

Budget Pooling

Gas Price Pooling

The Pooling Effect

Pooling & Service Agreement_1

Forced Pooling in Oklahoma - Oklahoma Corporation … Pooling Process in Oklahoma.pdfThe Pooling Process in Oklahoma Most OCC rules pertaining to forced pooling are found in Chapter

Pixel por Pixel

S –HARDWARE CODESIGN E NEURAL NETWORK … Micro_220.pdfCNN also uses pooling layers for down-sampling on the feature maps, usually max pooling or average pooling. With pooling layers,

Pooling and Spacing Eric R. King. 165:5-7-7: Pooling.

Pooling Basics Fall 2016 - Event Schedule & Agenda …schd.ws/hosted_files/2016falleducationalforum/bf/Pooling...Pooling Basics Fall 2016 Recommended Sessions for Pooling Basics Track: