Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow...

108
Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Transcript of Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow...

Page 1: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Scaling Machine Learning with TensorFlow

Jeff DeanGoogle Brain team

g.co/brainPresenting the work of many people at Google

Page 2: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Our Mission:Make Machines Intelligent.

Improve People’s Lives.

Page 3: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Google Brain Team: Research Impact

● Since 2012, published > 130 papers at top venues in machine learning

● Some highlights:

- 2012: DistBelief, unsupervised learning to discover cats

- 2013: opensource of word2vec

- 2014: sequence to sequence learning, image captioning

- 2015: Inception, DeepDream, TensorFlow

- 2016: neural translation, medical imaging, architecture search

Page 4: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Main Research Areas

● General Machine Learning Algorithms and Techniques

● Computer Systems for Machine Learning

● Natural Language Understanding

● Perception

● Healthcare

● Robotics

● Music and Art Generation

Page 5: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Main Research Areas

● General Machine Learning Algorithms and Techniques

● Computer Systems for Machine Learning

● Natural Language Understanding

● Perception

● Healthcare

● Robotics

● Music and Art Generation

Page 7: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Accuracy

Scale (data size, model size)

1980s and 1990s

neural networks

other approaches

Page 8: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

morecomputeAccuracy

Scale (data size, model size)

neural networks

other approaches

1980s and 1990s

Page 9: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

morecomputeAccuracy

Scale (data size, model size)

neural networks

other approaches

Now

Page 10: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Growing Use of Deep Learning at Google

AndroidAppsdrug discoveryGmailImage understandingMapsNatural language understandingPhotosRobotics researchSpeechTranslationYouTube… many others ...

Across many products/areas:

# of directories containing model description files

Page 11: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Experiment Turnaround Time and Research Productivity

● Minutes, Hours:○ Interactive research! Instant gratification!

● 1-4 days○ Tolerable○ Interactivity replaced by running many experiments in parallel

● 1-4 weeks○ High value experiments only○ Progress stalls

● >1 month○ Don’t even try

Page 12: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Build the right tools

Page 13: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Open, standard software for general machine learning

Great for Deep Learning in particular

First released Nov 2015

Apache 2.0 license

http://tensorflow.org/and

https://github.com/tensorflow/tensorflow

Page 14: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

MatMul

Add Relu

biases

weights

examples

labels

Xent

Graph of Nodes, also called Operations or ops.

Computation is a dataflow graph

Page 15: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

with tensors

MatMul

Add Relu

biases

weights

examples

labels

Xent

Edges are N-dimensional arrays: Tensors

Computation is a dataflow graph

Page 16: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

● Build a graph computing a neural net inference.

import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

x = tf.placeholder("float", shape=[None, 784])

W = tf.Variable(tf.zeros([784,10]))

b = tf.Variable(tf.zeros([10]))

y = tf.nn.softmax(tf.matmul(x, W) + b)

Example TensorFlow fragment

Page 17: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

with state

Add Mul

biases

...

learning rate

−=...

'Biases' is a variable −= updates biasesSome ops compute gradients

Computation is a dataflow graph

Page 18: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

GPU 0 CPU

Add Mul

biases

learning rate

AssignSub

......

distributedComputation is a dataflow graph

Page 19: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

GPU 0 CPU

Add Mul

biases

learning rate

AssignSub

......

Assign Devices to Ops● TensorFlow inserts Send/Recv Ops to transport tensors across devices● Recv ops pull data from Send ops

Send Recv

Page 20: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

GPU 0 CPU

Add Mul

biases

learning rate

AssignSub

......

Assign Devices to Ops● TensorFlow inserts Send/Recv Ops to transport tensors across devices● Recv ops pull data from Send ops

Send Recv

Send Recv

Send

Recv SendRecv

Page 21: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Same mechanism supports large distributed systems

...

Params

Many replicas

Params Params...Computation spread across hundreds of machines and thousands of GPU cards

Page 22: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Paper in OSDI 2016https://arxiv.org/abs/1605.08695

http://tensorflow.org/whitepaper2015.pdf

Page 23: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

http://tensorflow.org/

Page 24: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Why Did We Build TensorFlow?Wanted system that was flexible, scalable, and production-ready

DistBelief, our first system, was good on two of these, but lacked flexibility

Most existing open-source packages were also good on 2 of 3 but not all 3

Page 25: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

TensorFlow GoalsEstablish common platform for expressing machine learning ideas and systems

Make this platform the best in the world for both research and production use

Open source it so that it becomes a platform for everyone, not just Google

Page 26: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google
Page 27: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

ML is done in many places

TensorFlow GitHub stars by GitHub user profiles w/ public locations Source: http://jrvis.com/red-dwarf/?user=tensorflow&repo=tensorflow

Page 28: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Progress

https://github.com/tensorflow/tensorflow/releases

Page 29: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Progress

https://github.com/tensorflow/tensorflow/releases

v1.0 released in Feb. 2017

Page 30: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

TensorFlow: A Vibrant Open-Source Community

● Rapid development, many outside contributors○ 475+ non-Google contributors to TensorFlow 1.0○ 15,000+ commits in 15 months○ Many community created tutorials, models, translations, and projects

■ ~7,000 GitHub repositories with ‘TensorFlow’ in the title

● Direct engagement between community and TensorFlow team○ 5000+ Stack Overflow questions answered○ 80+ community-submitted GitHub issues responded to weekly

● Growing use in ML classes: Toronto, Berkeley, Stanford, ...

Page 32: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google
Page 33: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Performance mattersResearch● Iterate quickly● Train models faster● Run more experiments in parallel

Production● Server farms and embedded● CPUs, GPUs, TPUs, and more● Low-latency serving

Page 34: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

TensorFlow v1.0 Performance Inception-v3 Training - Ideal Scaling Synthetic Data

DGX-1:

Inception-v3 training on P100 GPUs Inception-v3 training on K80 GPUs

K80:

Page 35: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

TensorFlow v1.0 Performance Inception-v3 Training - Synthetic Data

DGX-1: 7.37x speedup at 8 GPUs

Inception-v3 training on P100 GPUs Inception-v3 training on K80 GPUs

K80: 7.5x speedup at 8 GPUs

Page 36: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

TensorFlow v1.0 PerformanceInception-v3 Training - Real Data

DGX-1: 7.2x speedup at 8 GPUs

Inception-v3 training on P100 GPUs Inception-v3 training on K80 GPUs

K80: 7.3x speedup at 8 GPUs

Page 37: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

TensorFlow v1.0 Performance Inception-v3 Distributed Training - Synthetic Data

58x speedup at 64 GPUs (8 Servers / 8 GPUs each)

- GPU: K80 - Network: 20 Gb/sec

Inception-v3 training on K80 GPUs

Page 38: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Just-In-Time Compilationvia XLA, "Accelerated Linear Algebra" compiler

0x00000000 movq (%rdx), %rax0x00000003 vmovaps (%rax), %xmm00x00000007 vmulps %xmm0, %xmm0, %xmm00x0000000b vmovaps %xmm0, (%rdi)

...

TF graphs go in,

Optimized & specialized assembly comes out.

Let's explain that!

Page 39: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Demo:Inspect JIT code in TensorFlowiPython shell

XLA:CPU

XLA:GPU

Page 40: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Computers can now see

Large implications for healthcare

Page 41: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

MEDICAL IMAGINGUsing similar model for detecting diabetic retinopathy in retinal images

Page 42: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google
Page 43: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Performance on par or slightly better than the median of 8 U.S. board-certified ophthalmologists (F-score of 0.95 vs. 0.91).http://research.googleblog.com/2016/11/deep-learning-for-detection-of-diabetic.html

Page 44: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Blog: https://research.googleblog.com/2017/03/assisting-pathologists-in-detecting.htmlPaper: https://arxiv.org/abs/1703.02442

Page 45: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

ML Challenges in Pathology

150k pixels (15 Gigapixel image)

40x

❏ Extremely large images (> 100k x 100k pixels)

❏ Multiscale problem - need detail as well as context

5x

10x

20x

Page 46: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

detail ←→ context

Multi scale model

Multiscale model

resembles microscope magnifications

Page 47: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Detecting breast cancer metastases in lymph nodesbiopsy image ground truth

(from pathologist)model prediction

(early results)model prediction(current results)

Page 48: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Model performance compared to pathologist

our model pathologist*

Tumor localization score (FROC) 0.89 0.73

Sensitivity at 8 FP 0.92 0.73

Slide classification (AUC) 0.97 0.96

* pathologist given infinite time per image (reaching 0 FPs)

Evaluated using Camelyon16 dataset (just 270 training examples!)

Page 49: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Scaling language understanding models

Page 50: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Sequence-to-Sequence Model

A B C

v

D __ X Y Z

X Y Z Q

Input sequence

Target sequence

[Sutskever & Vinyals & Le NIPS 2014]

Deep LSTM

Page 51: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Sequence-to-Sequence Model: Machine Translation

v

Input sentence

Target sentence

[Sutskever & Vinyals & Le NIPS 2014] How

Quelle est taille?votre <EOS>

Page 52: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Sequence-to-Sequence Model: Machine Translation

v

Input sentence

Target sentence

[Sutskever & Vinyals & Le NIPS 2014] How

Quelle est taille?votre <EOS>

tall

How

Page 53: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Sequence-to-Sequence Model: Machine Translation

v

Input sentence

Target sentence

[Sutskever & Vinyals & Le NIPS 2014] How tall are

Quelle est taille?votre <EOS> How tall

Page 54: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Sequence-to-Sequence Model: Machine Translation

v

Input sentence

Target sentence

[Sutskever & Vinyals & Le NIPS 2014] How tall you?are

Quelle est taille?votre <EOS> How aretall

Page 55: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Sequence-to-Sequence Model: Machine Translation

v

Input sentence

[Sutskever & Vinyals & Le NIPS 2014]

At inference time:Beam search to choose most probable

over possible output sequences

Quelle est taille?votre <EOS>

Page 56: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Sequence to Sequence modelapplied to Google Translate

Page 58: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Encoder LSTMs

Decoder LSTMs

<s> Y1 Y3

SoftMax

Y1 Y2 </s>

X3 X2 </s>

8 Layers

Gpu1

Gpu2

Gpu2

Gpu3

+ + +

Gpu8

Attention

+ ++

Gpu1

Gpu2

Gpu3

Gpu8

Google Neural Machine Translation Model

One model replica: one machine w/ 8 GPUs

Page 59: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Model + Data Parallelism

...

Params

Many replicas

Parameters distributed across many parameter server machines

Params Params...

Page 60: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

neural (GNMT)

phrase-based (PBMT)

English >

Spanish

English >

French

English >

Chinese

Spanish >

English

French >

English

Chinese >

English

Translation model

Tran

slat

ion

qual

ity

0

1

2

3

4

5

6

human

perfect translation

Neural Machine Translation

Closes gap between old system and human-quality translation by 58% to 87%

Enables better communication across the world

research.googleblog.com/2016/09/a-neural-network-for-machine.html

Page 61: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation,Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Deanhttps://arxiv.org/abs/1611.04558

https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html

Page 62: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Bigger models, but sparsely activated

Page 63: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Bigger models, but sparsely activated

Motivation:Want huge model capacity for large datasets, but

want individual example to only activate tiny fraction of large model

Page 64: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Per-Example Routing

Page 65: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Outrageously Large Neural Networks: The Sparsely-gated Mixture-of-Experts Layer,Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le & Jeff DeanTo appear in ICLR 2017, https://openreview.net/pdf?id=B1ckMDqlg

Per-Example Routing

Page 66: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Automated machine learning (“learning to learn”)

Page 67: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Current:Solution = ML expertise + data + computation

Page 68: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Current:Solution = ML expertise + data + computation

Can we turn this into:Solution = data + 100X computation

???

Page 69: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Early encouraging signs

Trying multiple different approaches:

(1) RL-based architecture search(2) Model architecture evolution

Page 70: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Idea: model-generating model trained via RL

(1) Generate ten models(2) Train them for a few hours(3) Use loss of the generated models as reinforcement

learning signal

To appear in ICLR 2017

arxiv.org/abs/1611.01578

Page 71: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

CIFAR-10 Image Recognition Task

Page 72: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

“Normal” LSTM cell

Cell discovered by architecture search

Penn Tree Bank Language Modeling Task

Page 73: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

https://arxiv.org/abs/1703.01041

Idea: evolve models via evolutionary algorithm

Page 74: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Evolvethis

DNA

0.92

FitnessModel

Softmax

ReLU

ReLU

ReLU

Linear

0 1 2 3 4 5 6 7 8 9

Trained Model

W

B

W

B

B

B

W

W W

W

W

Softmax0 1 2 3 4 5 6 7 8 9

https://arxiv.org/abs/1703.01041

Page 75: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Evolutionary Step

Worker1

Mutations:● Alter learning rate● Identity● Reset weights● Insert convolution● Remove convolution● Alter strides● Alter # of channels

● Alter horiz. filter size● Alter vert. filters size● Insert nonlinearity● Remove nonlinearity● Add-skip● Remove skip

Page 76: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Evolutionary Step

Worker 1

● Pick 2 at random

● Copy-mutate best

● Kill worst

Worker 2

Worker 4Worker N

Preempted by vincentsrobot.play_fetch

Worker 3

Page 77: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Evolve From Scratch

● Initialize with linear models● Repeat evolutionary step

Page 78: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google
Page 79: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google
Page 80: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Where are we trying to go?

Page 81: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Where are we trying to go?

Combine Several of These Ideas:

Large model, but sparsely activatedSingle model to solve many tasks (100s to 1Ms)

Dynamically learn and grow pathways through large model

Page 82: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

...Tasks

Outputs

Single large model,

sparsely activated

Page 83: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

...Tasks

Outputs

Single large model,

sparsely activated

Page 84: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

...Tasks

Outputs

Single large model,

sparsely activated

Page 85: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

...Tasks

Outputs

Single large model,

sparsely activated

Page 86: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

...Tasks

Outputs

Single large model,

sparsely activated

Page 87: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

...Tasks

Outputs

Single large model,

sparsely activated

Page 88: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

More computational power needed

Deep learning is transforming how we design computers

Page 89: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Special computation properties

reducedprecision

ok

about 1.2

× about 0.6

about 0.7

1.21042

× 0.61127

0.73989343NOT

Page 90: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

handful of specific

operations× =

reducedprecision

ok

about 1.2

× about 0.6

about 0.7

1.21042

× 0.61127

0.73989343NOT

Special computation properties

Page 91: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Tensor Processing UnitCustom Google-designed chip for neural net computations

In production use for >24 months: used on every search query, for neural machine translation,for AlphaGo match, …

Talk at Computer History Museum on April 5th: sites.google.com/view/naeregionalsymposium

Page 92: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Machine Learning forHigher Performance Machine Learning Models

Page 93: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

For large models, model parallelism is important

Page 94: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

For large models, model parallelism is important

But getting good performance given multiple computing devices is non-trivial and non-obvious

Page 95: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

A B C D __ A B C

A B C D

A B C D

LSTM 1

LSTM 2

Attention

Softmax

Page 96: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

A B C D __ A B C

A B C D

GPU1

GPU2

GPU3

GPU4 A B C D

LSTM 1

LSTM 2

Attention

Softmax

Page 97: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Reinforcement Learning forHigher Performance Machine Learning Models

Page 98: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Reinforcement Learning forHigher Performance Machine Learning Models

Placement model (trained via RL) gets graph as input + set of devices, outputs device placement for each graph node

Page 99: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Reinforcement Learning forHigher Performance Machine Learning Models

Measured time per step gives RL reward signal

Placement model (trained via RL) gets graph as input + set of devices, outputs device placement for each graph node

Page 100: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Early results, but it seems to work

Model Hardware Baseline RL Speedup

Neural MT (2 layers) + attention

4 Tesla K80 3.20s 2.47s 22.8%

Inception 4 Tesla K80 4.60s 3.85s 16.3%

Baselines:NMT: human expert placement shown on earlier slideInception: default placement on GPU/0

Per-step running times (secs)

Page 101: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Early results, but it seems to work

Model Hardware Baseline RL Speedup

Neural MT (2 layers) + attention

4 Tesla K80 3.20s 2.47s 22.8%

Inception 4 Tesla K80 4.60s 3.85s 16.3%

Baselines:NMT: human expert placement shown on earlier slideInception: default placement on GPU/0

Per-step running times (secs)

27 vs. 35 hours (+22.8%)

Page 102: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google
Page 103: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

morecomputeAccuracy

Scale (data size, model size)

neural networks

other approaches

Now

Page 104: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

morecomputeAccuracy

Scale (data size, model size)

neural networks

other approaches

Future

Page 105: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Example queries of the future

Which of these eye images shows

symptoms of diabetic retinopathy?

Please fetch me a cup of tea from the kitchen

Describe this video in Spanish

Find me documents related to reinforcement learning for

robotics and summarize them in German

Page 106: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

ConclusionsDeep neural networks are making significant strides in speech, vision, language, search, robotics, healthcare, …

If you’re not considering how to use deep neural nets to solve your problems, you almost certainly should be

Page 107: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

g.co/brainMore info about our work

Page 108: Google Brain team TensorFlow Scaling Machine - Matroid · Scaling Machine Learning with TensorFlow Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google

Thanks!

g.co/brain