Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

16
©2013 DataStax Confidential. Do not distribute without consent. @PatrickMcFadin Patrick McFadin Chief Evangelist, DataStax Spark and Cassandra: An amazing Apache love story 1

Transcript of Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

Page 1: Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

©2013 DataStax Confidential. Do not distribute without consent.

@PatrickMcFadin

Patrick McFadinChief Evangelist, DataStax

Spark and Cassandra: An amazing Apache love story

1

Page 2: Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

Store a ton of data Analyze a ton of data

Page 3: Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

Community Response?

Page 4: Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

CassandraOnly DC

Page 5: Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

CassandraOnly DC

Cassandra+ Spark DC

Spark Jobs

Page 6: Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

CassandraOnly DC

Cassandra+ Spark DC

Spark Jobs

Spark Streaming

Page 7: Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

Worker

Worker

Worker Worker

Analytics WorkloadTransactional Workload

Page 8: Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

DataStax Enterprise

Page 9: Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

DataStax Enterprise

Page 10: Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Page 11: Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

• 10T of high frequency event data daily•Constant increasing volume

“The web server that powers the interface can query both datacenters, depending on which the user is closest to,”

“A small set of signals tend to double every eight months. So we needed a model that can scale linearly.”

- Arun Jayandra, Microsoft

Page 12: Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

RESTAPI

O365

EventHub

IngestionWorker

(AzureworkerroleusingDataStax C#

driver)

C* Analytics

RESTAPI

O365

KafkaC*/Spark

StreamingAnalytics

G4– LocalSSD

Kafka:G4– DataDiskZooKeeper:A7– DataDisk

PaaSSmall

G4– LocalSSD

Cluster1:

Cluster2:

20k – 50k events/sec

200k+ events/sec

Page 13: Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

Data Protection•Maximilian Schrems v Data Protection Commissioner•No longer OK to ship EU data to US under “Safe

Harbour”

Product_Catalog RF=3Product_Catalog RF=3 Customer_Data RF=3Customer_Data RF=0

Product_Catalog RF=3Customer_Data RF=3

Page 14: Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

• 300k customers•Report on energy usage• Predict boiler failure

“We’re dealing largely with time series data, and Spark is 10 to 100 times quicker as it is operating on data in-memory…Cassandra delivers what we need today and if you look at the Internet of Things space; that is what is really useful right now.” - Jim Anning, British Gas

Hive Active Heating™

Page 15: Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

CassandraOnly DC

Cassandra+ Spark DC

Spark Jobs

Spark Streaming

Home Data Center

Hive Active Heating™

Page 16: Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

Store a ton of data Analyze a ton of data

Thank you!