scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game...

21
scaling up RL with func1on approxima1on [email protected]

Transcript of scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game...

Page 1: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature

scalingupRLwithfunc1onapproxima1on

[email protected]

Page 2: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature

Human-levelcontrolthroughdeepreinforcementlearning,Mnihet.al.,Nature518,Feb2015hHp://www.nature.com/nature/journal/v518/n7540/full/nature14236.html

•pixelinput

•18joys-ck/bu3onposi-onsoutput

•changeingamescoreasfeedback

• convolu-onalnetrepresen-ngQ

•backpropaga-onfortraining!

humanlevelgamecontrol

Page 3: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature

neuralnetwork

Page 4: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature

convolu1on,weightsharing,andpooling

pixel

sharedfeaturedetector/kernel/filterw

pooled

featuremap

max(window)

fewerparametersduetosharingandpooling!

Page 5: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature

reverseprojec-onsofneuronoutputsinpixelspace

whatdoesadeepneuralnetworkdo?

Page 6: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature
Page 7: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature
Page 8: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature

backpropaga1on?Whatisthetargetagainstwhichtominimiseerror?

Page 9: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature

prac1callyspeaking…minimiseMSEbySGD

Page 10: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature

experiencereplay

savecurrenttransi-on(s,a,r,s’)inmemory

every1mestep

randomlysampleasetof(s,a,r,s’)frommemoryfortrainingQnetwork

(insteadoflearningfromcurrentstatetransi-on)

everystep=i.i.d+learnfromthepast

at

st

st+1

rt+1

Page 11: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature

freezingtargetQmovingtarget=>oscilla-ons

stabiliselearningbyfixingtarget,movingiteverynowandthen

freeze

Page 12: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature

doubleDQN

evalua1onoftargetac1on

selec1onoftargetac1on

maxQa’

DeepReinforcementLearningwithDoubleQ-learningvanHasseltet.al.,AAAI2016hHps://arxiv.org/pdf/1509.06461v3.pdf

Page 13: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature

priori1sedexperiencereplay

sample(s,a,r,s’)frommemory

basedonsurprise

Priori1sedExperienceReplaySchaulet.al.,ICLR2016hHps://arxiv.org/pdf/1511.05952v4.pdf

Page 14: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature

Combiningdecoupling(double),priori1sedreplay,andduellinghelps!

duellingarchitecture

Q(s,a)=V(s,u)+A(s,a,v)

u

v

Q

Q

DuelingNetworkArchitecturesforDeepRLWanget.al.,ICML2016hHps://arxiv.org/pdf/1511.06581v3.pdf

ac1ona’sadvantage

Page 15: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature

V~avgQQ1

Q2

Q3

Q4

V~avgQQ1

Q2

Q3

Q4

arrowsrepresentadvantageofanac1on

advantage?e.g.4ac1ons,

V=E[Q]~Qaveragedacrossac1ons

Page 16: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature

howevertrainingis

SLOW

Page 17: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature

makingdeepRLfasterandwilder(more

applicableintherealworld)!

Page 18: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature

dataefficientexplora1on?

parallelism?

transferlearning?

makinguseofamodel?

Page 19: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature

Q Q Q Q Q

Q Q Q Q Q

Q Qt

Qt Qt Qt Qt Qt

Qt Qt Qt Qt Qt

sharedparamsforQandtargetQ

parallellearnersgemngindividual

experiences

lock-freeparamupdates

AsynchronousMethodsforDeepReinforcementLearning,Mnihet.al.,ICML2016hHp://jmlr.org/proceedings/papers/v48/mniha16.pdf

Page 20: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature

codeforyoutoplaywith...

Telenor’sownimplementa1onofasynchronousdeepRL:hHps://github.com/traai/async-deep-rl

hHps://openrl.slack.comLet’skeeptheconversa1ongoing:

Joinusingyourstudent.matnat.uio.noemailaddress!

Page 21: scaling up RL with func1on approxima1on€¦ · •backpropaga-on for training! human level game control. neural network. convolu1on, weight sharing, and pooling pixel shared feature

intern with us in summer 2017 CV and motivation to:

[email protected] | [email protected] (2 positions)

topic still to be finalised, but it will be at the intersection of software

engineering and machine learning