Cassandra + Spark (You’ve got the lighter, let’s start a fire)

67
You’ve got the lighter, let’s start a fire: Spark and Cassandra Robert Stupp Solutions Architect @ DataStax, Committer to Apache Cassandra @snazy

Transcript of Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Page 1: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

You’ve got the lighter, let’s start a fire:Spark and CassandraRobert StuppSolutions Architect @ DataStax, Committer to Apache Cassandra@snazy

Page 2: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

A journey throughDistributed Computing

© 2016 DataStax, All Rights Reserved. 2

Page 3: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

KillrPowr Application

© 2016 DataStax, All Rights Reserved. 3

Show current power consumption in Germany

in real time

• Power consumption of all cities on a map

• Graph of data for one city

Page 4: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

One word…

© 2016 DataStax, All Rights Reserved. 4

Page 5: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Apache CassandraDistributed & resilient database

Page 6: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Cassandra Use-Cases

© 2016 DataStax, All Rights Reserved. 6

• Time Series• Personalization• (or a mix of both)

Page 7: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Cassandra?

© 2016 DataStax, All Rights Reserved. 7

• Joining data … hmm• Analyzing the whole data set … hmm• Ad-Hoc Queries (SQL) … hmm

Page 8: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Apache SparkDistributed & resilient analytics

“Spark lets you run programs up to 100x faster in memory,or 10x faster on disk, than Hadoop.”

Page 9: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark : Simplify the…

© 2016 DataStax, All Rights Reserved. 9

challenging and compute-intensive task of processinghigh volumes of real-time or archived data,

both structured and unstructured,

seamlessly integrating relevantcomplex capabilities

such as machine learning andgraph algorithms.

Source: http://www.toptal.com/spark/introduction-to-apache-spark

Page 10: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Use Cases

© 2016 DataStax, All Rights Reserved. 10

• Event detection• Behavior / anomalies detection• Fraud & intrusion detection• Targeted advertising• E-commerce transaction processing• Recommendations

Page 11: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Use Cases

© 2016 DataStax, All Rights Reserved. 11

• Data migration• Ad hoc queries• Business Intelligence

Page 12: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark

© 2016 DataStax, All Rights Reserved. 12

Page 13: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Architecture

© 2016 DataStax, All Rights Reserved. 13

Page 14: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark RDDs

© 2016 DataStax, All Rights Reserved. 14

Resilient Distributed Datasets

• Fault-tolerant collection of elements• Can be operated on in parallel• Immutable• In-Memory (as much as possible)

Page 15: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark RDD Partitioning

© 2016 DataStax, All Rights Reserved. 15

Page 16: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark RDD Resiliency

© 2016 DataStax, All Rights Reserved. 16

Page 17: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Transformations

© 2016 DataStax, All Rights Reserved. 17

Create a new data set from an existing one

Page 18: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Actions

© 2016 DataStax, All Rights Reserved. 18

Return a value after running a computation

Page 19: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark DAG

© 2016 DataStax, All Rights Reserved. 19

Page 20: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Example (word count)

© 2016 DataStax, All Rights Reserved. 20

Given is a text file with movies and their categories.We want to know the occurences of each genre.

Pirates of the Carribbean,Adventure,Action,FantasyStar Trek I,SciFi,Adventure,ActionDie Hard,Action,Thriller

Page 21: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Why Scala

© 2016 DataStax, All Rights Reserved. 21

Scala is a functional language

Analytics is a functional “problem“

So – use a functional language

Page 22: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Example (word count)

© 2016 DataStax, All Rights Reserved. 22

sc.textFile("file:///home/videos.csv").flatMap(line => line.split(",").drop(1)).map(genre => (genre,1)).reduceByKey{case (x,y) => x + y}.collect().foreach(println)

Page 23: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Example (word count)

© 2016 DataStax, All Rights Reserved. 23

sc.textFile("file:///home/videos.csv").flatMap(line => line.split(",").drop(1)).map(genre => (genre,1)).reduceByKey{case (x,y) => x + y}.collect().foreach(println)Anonymous Scala Function

Page 24: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Example (word count)

© 2016 DataStax, All Rights Reserved. 24

sc.textFile("file:///home/videos.csv").flatMap(line => line.split(",").drop(1)).map(genre => (genre,1)).reduceByKey{case (x,y) => x + y}.collect().foreach(println)

Flat the collection

Page 25: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Example (word count)

© 2016 DataStax, All Rights Reserved. 25

sc.textFile("file:///home/videos.csv").flatMap(line => line.split(",").drop(1)).map(genre => (genre,1)).reduceByKey{case (x,y) => x + y}.collect().foreach(println)Pair RDD

Page 26: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Example (word count)

© 2016 DataStax, All Rights Reserved. 26

sc.textFile("file:///home/videos.csv").flatMap(line => line.split(",").drop(1)).map(genre => (genre,1)).reduceByKey{case (x,y) => x + y}.collect().foreach(println)

Values of TWO Pair RDDs

Page 27: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Example (word count)

© 2016 DataStax, All Rights Reserved. 27

sc.textFile("file:///home/videos.csv").flatMap(line => line.split(",").drop(1)).map(genre => (genre,1)).reduceByKey{case (x,y) => x + y}.collect().foreach(println)

Page 28: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Example (word count)

© 2016 DataStax, All Rights Reserved. 28

sc.textFile("file:///home/videos.csv").flatMap(line => line.split(",").drop(1)).map(genre => (genre,1)).reduceByKey{case (x,y) => x + y}.collect().foreach(println)

Page 29: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Cassandra-Spark-Driver

Page 30: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark-Cassandra-Connector

© 2016 DataStax, All Rights Reserved. 30

• Scala, Java• Open Source (ASF2)• Developed by DataStax and community guys

https://github.com/datastax/spark-cassandra-connector

Page 31: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Batch Processing

© 2016 DataStax, All Rights Reserved. 31

Other REALLY useful stuff like

• Re-partition data & processing• Allows efficient use of Cassandra

Page 32: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

SQL and Data Frames

Page 33: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark SQL

© 2016 DataStax, All Rights Reserved. 33

csc.sql(" SELECT COUNT(*) AS total " +" FROM killr_video.movies_by_actor " +" WHERE actor = 'Johnny Depp' ")

.show

+-----+|total|+-----+| 54|+-----+

Page 34: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Data Frames

© 2016 DataStax, All Rights Reserved. 34

val movies = csc.read.format("org.apache.spark.sql.cassandra").options(Map( "keyspace" -> "killr_video",

"table" -> "movies_by_actor" )).load

movies.filter("actor = 'Johnny Depp'").agg(Map("*" -> "count")).withColumnRenamed("COUNT(1)", "total").show

Same as:

SELECT COUNT(*) AS totalFROM killr_video.movies_by_actorWHERE actor = 'Johnny Depp'

Page 35: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Data Frames

© 2016 DataStax, All Rights Reserved. 35

Data frame is RDD that has named columns

• Has schema• Has rows• Has columns• Has data types

Page 36: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Straming

Page 37: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Streaming

© 2016 DataStax, All Rights Reserved. 37

Continuous Micro Batching

• Collect data of time slides• Compute on that data

Page 38: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Streaming

© 2016 DataStax, All Rights Reserved. 38

Page 39: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Streaming Architecture

Confid© 2016 DataStax, All Rights Reserved.ential

39

Page 40: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Streaming

© 2016 DataStax, All Rights Reserved. 40

Page 41: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Spark Streaming Windows

© 2016 DataStax, All Rights Reserved. 41

time

Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide

00:00:00 00:00:01 00:00:02 00:00:03 00:00:04 00:00:05 00:00:06 00:00:07 00:00:08 00:00:09 00:00:10

Window

Window

Window

Window

Window

Window

Window

Window

Window

Process data for this slide –i.e. 1 second in this example

Process data for this slide –i.e. 1 second in this example

Process data for this slide –i.e. 1 second in this example

Time window ~=Grouping data by time

Page 42: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Apache KafkaDistributed & resilient messaging

Page 43: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Kafka

© 2016 DataStax, All Rights Reserved. 43

Publish-subscribe messagingrethought as a distributed commit log

Page 44: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Kafka

© 2016 DataStax, All Rights Reserved. 44

Kafka ClusterProducer

Producer

Producer

Consumer

Consumer

Kafka Node

Kafka Node

Kafka Node

Page 45: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Kafka Use-Cases

© 2016 DataStax, All Rights Reserved. 45

• Messaging• Website activity tracking• Metrics• Log aggregation• Event sourcing• Stream processing

Page 46: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Common Characteristics

© 2016 DataStax, All Rights Reserved. 46

Kafka, Spark & Cassandra are

• Scalable• Distributed• Partitioned• Replicated

Page 47: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

KillrPowr

© 2016 DataStax, All Rights Reserved. 47

Show current power consumption in Germanyin real time

• Power consumption of all cities on a map• Graph of data for a selected city

Page 48: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

KillrPowr – Data modeling 101

© 2016 DataStax, All Rights Reserved. 48

1. Collect UI Requirements2. Model data (flow) per web UI request3. Build Spark Streaming code to generate that data4. Connect sensors and Spark Streaming using Kafka

Page 49: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

KillrPowr – Tech Stack

© 2016 DataStax, All Rights Reserved. 49

Power Sensor

Power Sensor

Kafka Spark Streaming

Cassandra

Web Site

Queue dataAggregate data

and savefor web UI

Page 50: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

KillrPowr – Tech Stack

© 2016 DataStax, All Rights Reserved. 50

Techs used in the demo

• Spray (HTTP for Akka Actors)• AngularJS• Kafka• Spark Streaming• Cassandra

Page 51: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Tips

© 2016 DataStax, All Rights Reserved. 51

• Push down predicates to Cassandra

• Partition your data “right”• Think distributed

Page 52: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Cassandra + Spark in DSE

© 2016 DataStax, All Rights Reserved. 52

DSE integratesSpark + Cassandra

https://academy.datastax.com/courses/getting-started-apache-spark

Page 53: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

KillrPowr Live Demo

© 2016 DataStax, All Rights Reserved. 53

Page 54: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

You’ve got the lighter, let’s start a fire:Spark and CassandraRobert StuppSolutions Architect @ DataStax, Committer to Apache Cassandra@snazy

Page 55: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

Thank You

© 2016 DataStax, All Rights Reserved. 55

Page 56: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

BACKUP SLIDES

Confidential 56

Page 57: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

BACKUP - DDL

© 2014 DataStax, All Rights Reserved. Company Confidential 57

Page 58: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

BACKUP - DDL

© 2014 DataStax, All Rights Reserved. Company Confidential 58

Page 59: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

BACKUP - DDL

© 2014 DataStax, All Rights Reserved. Company Confidential 59

Page 60: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

BACKUP - DDL

© 2014 DataStax, All Rights Reserved. Company Confidential 60

Page 61: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

BACKUP – Spark Streaming

© 2014 DataStax, All Rights Reserved. Company Confidential 61

Page 62: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

BACKUP – Spark Streaming

© 2014 DataStax, All Rights Reserved. Company Confidential 62

Page 63: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

BACKUP – Spark Streaming

© 2014 DataStax, All Rights Reserved. Company Confidential 63

Page 64: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

BACKUP – Spark Streaming

© 2014 DataStax, All Rights Reserved. Company Confidential 64

Page 65: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

BACKUP – Spark Streaming

© 2014 DataStax, All Rights Reserved. Company Confidential 65

Page 66: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

BACKUP – Spark Streaming

© 2014 DataStax, All Rights Reserved. Company Confidential 66

Page 67: Cassandra + Spark (You’ve got the lighter, let’s start a fire)

© 2014 DataStax, All Rights Reserved. Company Confidential 67