Running Spark in Production

Erich NachbarQuantifind

Agenda

Quantifind?

Hadoop? No, sir!

Choices: SLA, MTBF & Money

Are we there yet?

Shoot!

Quantifind?Products in several verticals

Movie, Gaming, ...

Input Sources

Product Reviews,

Comments / Tweets, ...

Predicts using (un-)/structure data

Box Office Opening $, Customer Satisfaction, ...

Computes the Intentful Audience and its demographics

Yes, we are hiring!

jobs@quantifind.com

Hadoop? No sir!Development Velocity

Quicker iteration cycles than Hadoop

Concise Scala code (10x smaller than Pig UDFs)

Excellent for embedding (unlike Pig)

Runtime

Orders of magnitude faster for cacheable data sets

Forced MapReduce disk spills kill iterative perf.

Jobs latency is short. Enables new product features.

SLA, MTBF & Money

Architecture Considerations

What is noticeable? - “get out of bed”

UI not available

What is not (immediately)? - “I have a few hrs”

UI Data stale

What is irreplaceable? - “Oh, no! Bieber tweets...”

Streamed data

Architecture

Are we there yet?Invest in

External Health Pings Like: UptimeRobot (free)

Measuring everything (counts, timings,...)Like: Twitter Ostrich with Kafka & Graphite

Log centrallyLike: Graylog2, Logstash

Automatic Service RestartsLike: Supervisord

Stats.time("cass.save") { cass.save(key, result)}

What’s next?Streaming & Batch Support in a Single System

“Because coding it twice is lame”

Spark Streaming

Storm Trident

RAM Grids

Core i7 RAM = 20GB/s, 20 ns Fast SSD drive = 0.5 GB/s, 100,000 ns

Questions?

Thank you!

erich@quantifind.com

Appendix

Running Spark in Production - UC Berkeley AMP...

Transcript of Running Spark in Production - UC Berkeley AMP...

Running Spark in Production - UC Berkeley AMP...

Documents

Transcript of Running Spark in Production - UC Berkeley AMP...

Spark introduction RDD Building and running Spark applications · Spark introduction!! RDD!! Building and running Spark applications Lightning-fast cluster computing

Lessons Learned Running Hadoop and Spark in Docker Containers

Apache CarbonData Documentation Ver 1.4...Please visit Apache Spark Documentation for more details on Spark shell. 1.2.1.1 Basics Start Spark shell by running the following command

(BDT303) Running Spark and Presto on the Netflix Big Data Platform

Learning spark ch07 - Running on a Cluster

Running Apache Spark Applications - Cloudera · Apache Spark Introduction Introduction You can run Spark interactively or from a client program: • Submit interactive statements

Running Spark on Kubernetes: Julien Dumazer t, Co-Founder ...

Hive on Spark and MapReduce: A Methodology for Parameter … · 2018-12-24 · Executor in Spark, executors are worker processes responsible for running the individual tasks in a

Beshalach - Totally spark BES… · Beshalach Hello and welcome to Spark! Spark is a new idea from Tribe, aimed at facilitating the smooth running of Toddlers’ Services, Children’s

Running Apache Zeppelin & Spark in Production

Running Apache Spark Applications · Cloudera Runtime Introduction Introduction You can run Spark applications locally or distributed across a cluster, either by using an interactive

Lessons Learned From Running Spark On Docker

Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft) | C* Summit 2016

Running Apache Spark Applications · 2020. 12. 13. · Using the Livy API to run Spark jobs.....19 Running an interactive session with the Livy API.....20 Livy objects for interactive

Running Spark Inside Containers with Haohai Ma and Khalid Ahmed

SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),

Machine Learning on Spark - UC Berkeley AMP Campampcamp.berkeley.edu/.../Machine-Learning-on-Spark... · Machine Learning on Spark Shivaram Venkataraman ... Machine learning algorithms

Behar - Bechukotai · Behar - Bechukotai. Helo anaHedlwcmtoS Hello and welcome to Spark! Spark is a new idea from Tribe, aimed at facilitating the smooth running of Toddlers’ Services,

Tribe Spark 2: Parashat Nitzavim and Vayelech · 1 Tribe Spark 2: Parashat Nitzavim and Vayelech Hello and welcome to Spark! Spark is aimed at facilitating the smooth running of Toddlers

Running Spark In Production