TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static...

TensorFlow

Marco Serafini

COMPSCI 590SLecture 22

Motivations• DistBelief: Previous iteration, parameter server• Limitations:

• Monolithic layers, difficult to define new ones• Difficult to offload computation with complex dependencies to parameter servers

• E.g. Apply updates based on gradients accumulated over multiple iterations• Fixed execution pattern

• Read data, compute loss function (forward pass), compute gradients for parameters (backward pass), write gradients to parameter server

• Not optimized for single workstations and GPUs

TensorFlow• Dataflow graph of operators, but not a DAG

• Loops and conditionals• Deferred (lazy) execution

• Enables optimizations, e.g. pipelining• Composable, simple basic operators

• Matrix multiplication, convolution• Can be combined in more complex operators

• Stateful operators • For shared parameters

• Concept of devices• CPUs, GPUs, mobile devices

Example

Tensors• Format

• n-dimensional arrays• Elements have primitive types (including byte arrays)

• Tensors are dense• All elements are represented• User must find ways to encode sparse data efficiently

Operations• Inputs and outputs are tensors• State is kept through stateful operators• Operations to handle variables (also tensors)

• Variable op: Returns unique reference handle• Read op: Take reference handle, produce value of variable• Write ops: Take reference and value and update. Multiple possible write operatios

• Queues are also stateful operators• Get reference handle, modify through operations• Blocking semantics, backpressure, synchronization

Execution Model• We have a computation graph• Step: client executes a subgraph by indicating:

• Edges to feed the subgraph with input tensors• Edges to fetch the output tensors• Runtime prunes the subgraph to remove unnecessary steps

• Can invoke multiple concurrent steps• Example: concurrent batches for data-parallel training

Example• Data-parallel training looks like this

Stateful queues

Stateful variables

Concurrent steps for data parallelism

Scheduling: Tasks and Devices• Tasks: named processes that send messages

• PS tasks: store parameters, but can also run computations• Worker tasks: the rest• Note: “informal” categories, not enforced by TensorFlow

• Devices: CPU, GPU, TPU, mobile, …• CPU is the host device• Device executes kernel for each operation assigned to it

• Same operation (e.g. matrix multiplication) has different kernels for different devices

• Requirements for a device• Must accept kernel for execution• Must allocate memory for inputs and outputs• Must transfer data to and from host memory

Placement• TensorFlow runtime places operations on devices

• Implicit constraints: stateful operation on same device as state

• Explicit constraints: dictated by the user

• Optimal placement still open question

• Obtain per-device subgraphs• All operations assigned to device

• Send and Receive operations to replace edges across devices

• Specialized per-device implementations• CPU – GPU: CUDA memory copy

• Across tasks: TCP or RDMA

• Placement preserved throughout session

Control Flow• How do enable dynamic control flow with static graph?• Example: recurrent neural network

• Train network for sequence of variable length without unrolling• Conditional: Switch and Merge

SwitchData input

Control input

Output one non-dead

Loops• Uses three additional operators

EnterData input op op Exit

NextIteration

Scaling to Large Models• Parameter server approach to avoid moving terabytes of parameters every time

• Gather: reads tensor data from shard and computes• Part: Partitions the input across shards of parameters• Stitch: Aggregates all partitions

Fault Tolerance• Long running tasks face failures and pre-emption

• Sometimes run at night on idle machines• Small operations, no need to tolerate individual failures

• Even RDDs are overkill• User invokes Save for checkpointing

• Each variable in a task connected to same save for batching• Not consistent

• Other use cases: transfer learning

Synchronous Coordination• Use blocking queues for synchrony • Redundant tasks for stragglers

Implementation

Single-Machine Performance• Four convolutional models using one GPU

Synchronous Microbenchmarks• Null training steps• Sparse performance is close to optimal (scalar)

Scalability• Scalability bound by access to PS tasks (7)

TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static...

Documents

Transcript of TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static...

Charles Serafini WIM_Crit

Optimizing Machine Learning with Tensorflow, ActivePython ...€¦ · •Tensorflow is a data-flow graph. • It offers excellent opportunity for exploiting parallelism Between operators.

La Lectura Serafini

JANE ELETRA SERAFINI DANIEL.pdf

a - Teresa Serafini

"Codex Seraphinianus", Luigi Serafini

Maria serafini

Performance and Control Flow in TensorFlow

Luigi Serafini Codex Seraphinianus

Welcome to TensorFlow! · 2017. 7. 10. · Awesome projects already using TensorFlow 10. Companies using Tensorflow Google OpenAI DeepMind Snapchat Uber Airbus eBay ... TensorFlow

serafini 1998 0490

Multilingual String Verification for Automotive Instrument ... · Tensorflow: Tensorflow [5] is an open source software library for numerical computation using data flow graphs. ...

TensorFlow Basics - York University · TensorFlow Basics by: Chris Dongjoo Kim Basic intro slides derived from web. Why Tensor + Flow ? Tensors: n-dimensional arrays Vector: 1-D tensor

LECTURA MARÍA TERESA SERAFINI

Serafini Como Se Estudia

Cyborg 1.0 - R. Serafini

TensorFlow Extended (TFX) · TensorFlow Transform Estimator or Keras Model TensorFlow Model Analysis TensorFlow Serving Logging Shared Utilities for Garbage Collection, Data Access

Codex Seraphinianus - Luigi Serafini

Ersilia Serafini, directrice exécutive

Ersilia Serafini, Executive Director