TensorFrames: Google Tensorflow on Apache Spark
-
Upload
databricks -
Category
Software
-
view
4.683 -
download
1
Transcript of TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
Tim HunterMeetup 08/2016 - Salesforce
How familiar are you with Spark?
1. What is Apache Spark?
2. I have used Spark
3. I am using Spark in production or I contribute to its development
2
How familiar are you with TensorFlow?
1. What is TensorFlow?
2. I have heard about it
3. I am training my own neural networks
3
Founded by the team who created Apache Spark
Offers a hosted service: - Apache Spark in the cloud - Notebooks - Cluster management - Production environment
About Databricks
4
Software engineer at Databricks
Apache Spark contributor
Ph.D. UC Berkeley in Machine Learning
(and Spark user since Spark 0.5)
About me
5
Outline•Numerical computing with Apache Spark
•Using GPUs with Spark and TensorFlow
•Performance details
•The future
6
Numerical computing for Data Science
•Queries are data-heavy
•However algorithms are computation-heavy
•They operate on simple data types: integers, floats, doubles, vectors, matrices
7
The case for speed•Numerical bottlenecks are good targets for
optimization
• Let data scientists get faster results
• Faster turnaround for experimentations
•How can we run these numerical algorithms faster?
8
Evolution of computing power
9
Failure is not an option: it is a fact
When you can afford your dedicated chip
GPGPU
Scale out
Scal
e up
Evolution of computing power
10
NLTKTheano
Today’s talk:Spark + TensorFlow
Evolution of computing power• Processor speed cannot keep up with memory and
network improvements
• Access to the processor is the new bottleneck
• Project Tungsten in Spark: leverage the processor’s heuristics for executing code and fetching memory
• Does not account for the fact that the problem is numerical
11
Asynchronous vs. synchronous
• Asynchronous algorithms perform updates concurrently• Spark is synchronous model, deep learning frameworks
usually asynchronous• A large number of ML computations are synchronous• Even deep learning may benefit from synchronous updates
12
Outline•Numerical computing with Apache Spark
•Using GPUs with Spark and TensorFlow
•Performance details
•The future
13
GPGPUs
14
•Graphics Processing Units for General Purpose computations
Series1
6000
Theoretical peakthroughput
GPU CPU
Series1
Theoretical peakbandwidth
GPU CPU
• Library for writing “machine intelligence” algorithms
• Very popular for deep learning and neural networks
•Can also be used for general purpose numerical computations
• Interface in C++ and Python
15
Google TensorFlow
Numerical dataflow with Tensorflow
16
x = tf.placeholder(tf.int32, name=“x”)y = tf.placeholder(tf.int32, name=“y”)output = tf.add(x, 3 * y, name=“z”)
session = tf.Session()output_value = session.run(output, {x: 3, y: 5})
x:int32
y:int32
mul 3
z
Numerical dataflow with Spark
df = sqlContext.createDataFrame(…)
x = tf.placeholder(tf.int32, name=“x”)y = tf.placeholder(tf.int32, name=“y”)output = tf.add(x, 3 * y, name=“z”)
output_df = tfs.map_rows(output, df)
output_df.collect()
df: DataFrame[x: int, y: int]
output_df: DataFrame[x: int, y: int, z: int]
x:int32
y:int32
mul 3
z
Demo
18
Outline•Numerical computing with Apache Spark
•Using GPUs with Spark and TensorFlow
•Performance details
•The future
19
20
It is a communication problem
Spark worker process Worker python process
C++buffer
Python pickle
Tungsten binary format
Python pickle
Javaobject
21
TensorFrames: native embedding of TensorFlow
Spark worker process
C++buffer
Tungsten binary format
Javaobject
• Estimation of distribution from samples•Non-parametric•Unknown bandwidth
parameter•Can be evaluated with
goodness of fit
An example: kernel density scoring
22
• In practice, compute:
with:
• In a nutshell: a complex numerical function
An example: kernel density scoring
23
24
Speedup
Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU0
60
120
180
Run
time
(sec
)
def score(x: Double): Double = { val dis = points.map { z_k => - (x - z_k) * (x - z_k) / ( 2 * b * b) } val minDis = dis.min val exps = dis.map(d => math.exp(d - minDis)) minDis - math.log(b * N) + math.log(exps.sum)}
val scoreUDF = sqlContext.udf.register("scoreUDF", score _)sql("select sum(scoreUDF(sample)) from samples").collect()
25
Speedup
Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU0
60
120
180
Run
time
(sec
)def score(x: Double): Double = { val dis = new Array[Double](N) var idx = 0 while(idx < N) { val z_k = points(idx) dis(idx) = - (x - z_k) * (x - z_k) / ( 2 * b * b) idx += 1 } val minDis = dis.min var expSum = 0.0 idx = 0 while(idx < N) { expSum += math.exp(dis(idx) - minDis) idx += 1 } minDis - math.log(b * N) + math.log(expSum)}
val scoreUDF = sqlContext.udf.register("scoreUDF", score _)sql("select sum(scoreUDF(sample)) from samples").collect()
26
Speedup
Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU0
60
120
180
Run
time
(sec
)def cost_fun(block, bandwidth): distances = - square(constant(X) - sample) / (2 * b * b) m = reduce_max(distances, 0) x = log(reduce_sum(exp(distances - m), 0)) return identity(x + m - log(b * N), name="score”)
sample = tfs.block(df, "sample")score = cost_fun(sample, bandwidth=0.5)df.agg(sum(tfs.map_blocks(score, df))).collect()
27
Speedup
Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU0
60
120
180
Run
time
(sec
)def cost_fun(block, bandwidth): distances = - square(constant(X) - sample) / (2 * b * b) m = reduce_max(distances, 0) x = log(reduce_sum(exp(distances - m), 0)) return identity(x + m - log(b * N), name="score”)
with device("/gpu"): sample = tfs.block(df, "sample") score = cost_fun(sample, bandwidth=0.5)df.agg(sum(tfs.map_blocks(score, df))).collect()
Demo: Deep dreams
28
Demo: Deep dreams
29
Outline•Numerical computing with Apache Spark
•Using GPUs with Spark and TensorFlow
•Performance details
•The future
30
31
Improving communication
Spark worker process
C++buffer
Tungsten binary format
Javaobject
Direct memory copy
Columnarstorage
The future• Integration with Tungsten:• Direct memory copy• Columnar storage
•Better integration with MLlib data types
•GPU instances in Databricks: Official support coming this fall
32
Recap•Spark: an efficient framework for running
computations on thousands of computers
•TensorFlow: high-performance numerical framework
•Get the best of both with TensorFrames:• Simple API for distributed numerical computing• Can leverage the hardware of the cluster
33
Try these demos yourself•TensorFrames source code and documentation:
github.com/databricks/tensorframesspark-packages.org/package/databricks/tensorframes
•Demo notebooks available on Databricks
•The official TensorFlow website: www.tensorflow.org
34
Spark Summit EU 2016 15% Discount Code: DatabricksEU16
35
Thank you.