TensorFlow On Embedded Devices - Cadence...

15
TensorFlow On Embedded Devices

Transcript of TensorFlow On Embedded Devices - Cadence...

Page 1: TensorFlow On Embedded Devices - Cadence IPip.cadence.com/uploads/presentations/1100AM_TensorFlow... · 2020-05-04 · TensorFlow: Expressing High-Level ML Computations Core in C++

TensorFlow On Embedded Devices

Page 2: TensorFlow On Embedded Devices - Cadence IPip.cadence.com/uploads/presentations/1100AM_TensorFlow... · 2020-05-04 · TensorFlow: Expressing High-Level ML Computations Core in C++

Who am I? Pete Warden ([email protected])

Tech Lead of the TensorFlow Mobile/Embedded team.

Page 3: TensorFlow On Embedded Devices - Cadence IPip.cadence.com/uploads/presentations/1100AM_TensorFlow... · 2020-05-04 · TensorFlow: Expressing High-Level ML Computations Core in C++

Why am I here?

Page 4: TensorFlow On Embedded Devices - Cadence IPip.cadence.com/uploads/presentations/1100AM_TensorFlow... · 2020-05-04 · TensorFlow: Expressing High-Level ML Computations Core in C++

Why is deep learning so important?

Page 5: TensorFlow On Embedded Devices - Cadence IPip.cadence.com/uploads/presentations/1100AM_TensorFlow... · 2020-05-04 · TensorFlow: Expressing High-Level ML Computations Core in C++

Where does TensorFlow fit in? DistBelief (1st system) was great for scalability, and production training of basic kinds of models.

Better understanding of problem space allowed us to make some dramatic simplifications.

Google brings years of production experience, and a large team with a long-term commitment.

Page 6: TensorFlow On Embedded Devices - Cadence IPip.cadence.com/uploads/presentations/1100AM_TensorFlow... · 2020-05-04 · TensorFlow: Expressing High-Level ML Computations Core in C++

TensorFlow: Expressing High-Level ML Computations

Core in C++

Very low overhead

Core TensorFlow Execution System

CPU GPU Android iOS ...

Page 7: TensorFlow On Embedded Devices - Cadence IPip.cadence.com/uploads/presentations/1100AM_TensorFlow... · 2020-05-04 · TensorFlow: Expressing High-Level ML Computations Core in C++

TensorFlow: Expressing High-Level ML Computations

Core in C++

Very low overhead

Different front ends for specifying/driving the computation

Python and C++ today, easy to add more

Core TensorFlow Execution System

CPU GPU Android iOS ...

Page 8: TensorFlow On Embedded Devices - Cadence IPip.cadence.com/uploads/presentations/1100AM_TensorFlow... · 2020-05-04 · TensorFlow: Expressing High-Level ML Computations Core in C++

TensorFlow: Expressing High-Level ML Computations

Core in C++

Very low overhead

Different front ends for specifying/driving the computation

Python and C++ today, easy to add more

Core TensorFlow Execution System

CPU GPU Android iOS ...

C++ front end Python front end ...

Page 9: TensorFlow On Embedded Devices - Cadence IPip.cadence.com/uploads/presentations/1100AM_TensorFlow... · 2020-05-04 · TensorFlow: Expressing High-Level ML Computations Core in C++

MatMul

Add Relu

biases

weights

examples

labels

Xent

Graph of Nodes, also called Operations or ops.

Computation is a dataflow graph

Page 10: TensorFlow On Embedded Devices - Cadence IPip.cadence.com/uploads/presentations/1100AM_TensorFlow... · 2020-05-04 · TensorFlow: Expressing High-Level ML Computations Core in C++

with tensors

MatMul

Add Relu

biases

weights

examples

labels

Xent

Edges are N-dimensional arrays: Tensors

Computation is a dataflow graph

Page 11: TensorFlow On Embedded Devices - Cadence IPip.cadence.com/uploads/presentations/1100AM_TensorFlow... · 2020-05-04 · TensorFlow: Expressing High-Level ML Computations Core in C++

Automatically runs models on range of platforms:

from phones ...

to single machines (CPU and/or GPUs) …

to distributed systems of many 100s of GPU cards

TensorFlow: Expressing High-Level ML Computations

Page 12: TensorFlow On Embedded Devices - Cadence IPip.cadence.com/uploads/presentations/1100AM_TensorFlow... · 2020-05-04 · TensorFlow: Expressing High-Level ML Computations Core in C++

What about the cloud?

Page 13: TensorFlow On Embedded Devices - Cadence IPip.cadence.com/uploads/presentations/1100AM_TensorFlow... · 2020-05-04 · TensorFlow: Expressing High-Level ML Computations Core in C++

What does this mean in practice?

Start with file format:

https://www.tensorflow.org/versions/master/how_tos/

tool_developers/index.html

Then deeper integration.

Eight-bit is enough!

Page 14: TensorFlow On Embedded Devices - Cadence IPip.cadence.com/uploads/presentations/1100AM_TensorFlow... · 2020-05-04 · TensorFlow: Expressing High-Level ML Computations Core in C++

Eight-bit resources

http://github.com/google/gemmlowp - 60 GOPs/s on Nexus 5!

GoogLeNet v1 is 7MB after just quantization.

BNNM API in Android

More example code and models to come.

Page 15: TensorFlow On Embedded Devices - Cadence IPip.cadence.com/uploads/presentations/1100AM_TensorFlow... · 2020-05-04 · TensorFlow: Expressing High-Level ML Computations Core in C++

Next steps

Ask me how - [email protected]

Thanks!