scaling up RL with func1on approxima1on · composi1onal features composi1onal problem solving...

scalingupRLwithfunc1onapproxima1on

Human-levelcontrolthroughdeepreinforcementlearning,Mnihet.al.,Nature518,Feb2015hFp://www.nature.com/nature/journal/v518/n7540/full/nature14236.html

•pixelinput

•18joys-ck/bu3onposi-onsoutput

•changeingamescoreasfeedback

• convolu-onalnetrepresen-ngQ

•backpropaga-onfortraining!

humanlevelgamecontrol

neuralnetwork

convolu1on,weightsharing,andpooling

sharedfeaturedetector/kernel/filterw

pooled

featuremap

max(window)

fewerparametersduetosharingandpooling!

reverseprojec-onsofneuronoutputsinpixelspace

whatdoesadeepneuralnetworkdo?

composi1onalfeaturescomposi1onalproblem

solvingmul1plica1on(circuitdesign)

<—composedofaddingnumbers

<—composedofaddingbits

output:x.y—————multiply————————addingnums———————addingbits————input:xandy

humanknowledgeorganisa1on

findrootsofalinearexpression

<—composedofseWngexpressiontozeroandsolvinglinearequa1ons

<—composedofrearrangingterms

output:x=-2—————(findrootsofx+2)————————————(setx+2=0)—(solve)——————————————————————(rearrange)—————input:x,+,2,=,0

deeplayersmakerepresenta1onofknowledgeandprocesseshappenwithfewerneurons!

backpropaga1on?Whatisthetargetagainstwhichtominimiseerror?

prac1callyspeaking…minimiseMSEbySGD

experiencereplay

savecurrenttransi-on(s,a,r,s’)inmemory

every1mestep

randomlysampleasetof(s,a,r,s’)frommemoryfortrainingQnetwork

(insteadoflearningfromcurrentstatetransi-on)

everystep=i.i.d+learnfromthepast

freezingtargetQmovingtarget=>oscilla-ons

stabiliselearningbyfixingtarget,movingiteverynowandthen

freeze

doubleDQN

evalua1onoftargetac1on

selec1onoftargetac1on

maxQa’

DeepReinforcementLearningwithDoubleQ-learningvanHasseltet.al.,AAAI2016hFps://arxiv.org/pdf/1509.06461v3.pdf

priori1sedexperiencereplay

sample(s,a,r,s’)frommemory

basedonsurprise

Priori1sedExperienceReplaySchaulet.al.,ICLR2016hFps://arxiv.org/pdf/1511.05952v4.pdf

Combiningdecoupling(double),priori1sedreplay,andduellinghelps!

duellingarchitecture

Q(s,a)=V(s,u)+A(s,a,v)

DuelingNetworkArchitecturesforDeepRLWanget.al.,ICML2016hFps://arxiv.org/pdf/1511.06581v3.pdf

howevertrainingis

makingdeepRLfasterandwilder(more

applicableintherealworld)!

dataefficientexplora1on?

parallelism?

transferlearning?

makinguseofamodel?

Q Q Q Q Q

Qt Qt Qt Qt Qt

sharedparamsforQandtargetQ

parallellearnersgeWngindividual

experiences

lock-freeparamupdates

AsynchronousMethodsforDeepReinforcementLearning,Mnihet.al.,ICML2016hFp://jmlr.org/proceedings/papers/v48/mniha16.pdf

codeforyoutoplaywith...

Telenor’sownimplementa1onofasynchronousdeepRL:hFps://github.com/traai/async-deep-rl

hFps://openrl.slack.comLet’skeeptheconversa1ongoing:

scaling up RL with func1on approxima1on · composi1onal features composi1onal problem solving...

Documents

Transcript of scaling up RL with func1on approxima1on · composi1onal features composi1onal problem solving...

SERIES ‘FM’ (NFPA) CYLINDER WITHROD LOCK · 1.000 RL-100400 1550 1.375 RL-138400 1550 5.00 1.000 RL-100500 2150 1.375 RL-138500 2150 6.00 1.375 RL-138600 2850 1.750 RL-175600

La complémentaire santé collective recommandée par la ... · rg rl rg rl rg rl rg rl rg rl ... 290, 00 € 6,86 € 8,23 € 12,35 € 220,00 € 220,00 € 61,77 € 57,65 €

Modern Multidimensional Scaling 8. A Majorization Algorithm for Solving MDS

Scaling the walls of discovery: using semantic metadata for integrative problem solving

FZl jbZevghfZ]gblgh -fZjd_jgZy^hkdZ - rl kl_eeZ`^eyihkh[bc - rl ©Nbabq_kdh_jZa\blb_ª fyqbj_abgh\u_ - rl kdZdZedb - rl h[jmqb - rl ^hjh`dZa^hjh\vy -1 rl gZ[hj©d_]ebª - rl dZjlhl_dbi

RL 1, RL 2, RL 3, RL 4, RL 5, RL 7, RL 9, RL 10, RI 1, RI ...juliebernard.weebly.com/uploads/4/7/0/5/47054607/...Media Literacy pp. 67–69 RI 7 Legacy of the Era pp. 34–35 W 7 Quickwrite

Snareline DISCO SHOW (Part II)percussion.okstate.edu/osup/Past_Music_files/Disco2... · 2014. 1. 27. · Bass Drums ff R L L R L RL RL RL RL RL RL R L R L L R L RL Disco Groove jq

Central Sydney Planning Committee (CSPC) - 23 June 2016 - Item … · 2016. 6. 23. · rl 24.48 rl 24.48 rl 24.35 rl 24.35 rl 24.35 rl 25.60 rl 25.60 rl 24.18 3a lobby 82.9 m² 3a

Solving Database Management, Migration, and Scaling Problems with DevOps Tools by Joshua Arvin Lat

Содержание Вакуумныенасосы · 2019-08-12 · 3 rl-4 rl-8 rl-2-307-rs rl-4-307-rs rl-8-307-rs rl-4 Труженикслёгкимвесом Техническиехарактеристики:

Scaling Up RL Using Evolution Strategies€¦ · Communication is the eventual bottleneck ... • Add noise vector ! to the parameters • If the result improves, keep the change

Tiendetubos RL 44 RL 54 RL 64

Storage: Scaling Out > Scaling Up?

Map 9-1 DRAFT · RH/HR CO RMH RL RL RL RL MG MG MG ML RL RL RL MUC RML IN CN MP RL IN MUC RL RF 1:38,000 1. All plan designation boundaries are intended to follow property lines,

Chi Turf, Chicago, IL Streeterville-Rpt... · 2017. 3. 7. · The Streeterville SOLID Jan-10-17 00:00 RL RL RL RL RL RL Analyzed: mg/kg Page 1 of 27 Final 1.003. Analytical Report

NOMINATED RL LIST2201528586GUNPAL RL\827 2406502093RIYASAT ALI RL\828 2405010222JITENDRA KUMAR SHARMA RL\831 2405519571TILAK CHOUDHARY RL\832 2201007005BALWAN SINGH RL\834 2405500408TARACHAND

CD-704506CI Grade 6 - Carson-Dellosaimages.carsondellosa.com/media/cd/pdfs/...Grade6... · Division Equations • Inequalities • Solving Inequalities • Finding Area ... RL.6.2,

1 Scale: 1:200 · 2020. 8. 26. · SSD1.02/17-202.02/17-212.1 RL 58.38 RL 61.57 RL 65.275 RL 67.95 RL 54.99 RL 71.05 RL 54.65 PROPOSED RIDGE EXG + PROP 5TH FLOOR EXG + PROP 4TH FLOOR

rl c n rl z z rl c z c rl 5 5 z z rl z rt c rl z z z rt c ...

Level Float Switch - RL Series · Ordering Information RL Series Double Float Single Float RL-G1 RL-G2 R-F3 F3 S1A S1A S1 A500 A1000 B900 RL-G1 RL-L RL-G2-R A500 A1000 F3 T3 B400