ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont....

33
ELEC 576: Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE Dept.) 11-08-2017

Transcript of ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont....

Page 1: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

ELEC 576: Deep Reinforcement Learning cont.

and the Future of Deep LearningAnkit B. Patel, CJ Barberan, Tan Nguyen

Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE Dept.)

11-08-2017

Page 2: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Model-based Deep Reinforcement Learning

Page 3: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Learning Models of the Environment

• Generative model of Atari 2600

• Issues

• Errors in transition model compound over the trajectory

• Planning trajectory differ from executed trajectories

• Long, unusual trajectory rewards are totally wrong

[David Silver ICML 2016]

Page 4: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Learning Models of the Environment

[Junhyuk Oh Youtube]

Page 5: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Newer Implementations

Page 6: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

AlphaGo

[Deepmind Nature]

Page 7: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Policy and Value Networks

[Deepmind Nature]

Page 8: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Monte Carlo Tree Search

[Deepmind Nature]

Page 9: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

AlphaGo Results

[Deepmind Nature]

Page 10: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Which Move to Make

[Deepmind Nature]

Page 11: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Five Matches Against Fan Hui

[Deepmind Nature]

Page 12: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Neural Architecture Search with Reinforcement Learning• Uses an RNN to generate model descriptions of the

NNs

• Trains the RNN with RL to maximize expected accuracy of generated architectures on validation set

Page 13: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Neural Architecture Search

[Zoph & Le Arxiv 2016]

Page 14: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

RNN Controller

[Zoph & Le Arxiv 2016]

Page 15: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Training with REINFORCE

[Zoph & Le Arxiv 2016]

Page 16: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Parallel Training

[Zoph & Le Arxiv 2016]

Page 17: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Increase Architecture Complexity

[Zoph & Le Arxiv 2016]

Page 18: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Constructing the Network

[Zoph & Le Arxiv 2016]

Page 19: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Results

[Zoph & Le Arxiv 2016]

Page 20: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Reinforcement Learning with Unsupervised Auxiliary Tasks• Train an agent that maximises other pseudo-reward

functions simultaneously by RL

• Those tasks have to develop in absence of extrinsic rewards

• Outperforms previous state-of-the-art on atari

• Averaging 880% expert human performance

Page 21: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

UNREAL Agent

[Jaderberg et. al Arxiv 2016]

Page 22: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Auxiliary Control Tasks

[Jaderberg et. al Arxiv 2016]

Page 23: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Auxiliary Tasks

[Jaderberg et. al Arxiv 2016]

Page 24: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

UNREAL Algorithm

A3C Loss is minimised on policyValue function is optimised from replayed data

Auxiliary control loss is optimised off-policy from replied dataReward loss is optimised from rebalanced replay data

[Jaderberg et. al Arxiv 2016]

Page 25: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Atari Results

[Jaderberg et. al Arxiv 2016]

Page 26: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Reinforcement Learning with Unsupervised Auxiliary Tasks Video

[DeepMind Youtube]

Page 27: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Deep Sensorimotor Learning

https://research.googleblog.com/2016/03/deep-learning-for-robots-learning-from.html

Page 28: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Deep Learning: The Future

Page 29: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Major areas of Focus• Semi-Supervised Learning

• Reinforcement Learning & Deep Sensorimotor Learning

• Neural Networks with Memory

• True Language Understanding (Not Just Statistical)

• Deep Learning for Hardware and Systems (Low Power)

• Theory of Deep Learning

• ….

Page 30: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Resources• UFLDL Course

• Chris Olah's Blog

• Google Plus Deep Learning Community

• Deep Learning First Textbook

• List of Must-Read Papers

Page 31: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Future of Work

The Guardian

Page 32: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Future of Work

Blake Irving’s Blog

Page 33: ELEC 576: Deep Reinforcement Learning cont. and the Future ... · Deep Reinforcement Learning cont. and the Future of Deep Learning Ankit B. Patel, CJ Barberan, Tan Nguyen Baylor

Future of Life