Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

16
©2013 DataStax Confidential. Do not distribute without consent. @PatrickMcFadin Patrick McFadin Chief Evangelist, DataStax Spark and Cassandra: An amazing Apache love story 1

Transcript of Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

©2013 DataStax Confidential. Do not distribute without consent.

@PatrickMcFadin

Patrick McFadinChief Evangelist, DataStax

Spark and Cassandra: An amazing Apache love story

1

Store a ton of data Analyze a ton of data

Community Response?

CassandraOnly DC

CassandraOnly DC

Cassandra+ Spark DC

Spark Jobs

CassandraOnly DC

Cassandra+ Spark DC

Spark Jobs

Spark Streaming

Worker

Worker

Worker Worker

Analytics WorkloadTransactional Workload

DataStax Enterprise

DataStax Enterprise

• 10T of high frequency event data daily•Constant increasing volume

“The web server that powers the interface can query both datacenters, depending on which the user is closest to,”

“A small set of signals tend to double every eight months. So we needed a model that can scale linearly.”

- Arun Jayandra, Microsoft

RESTAPI

O365

EventHub

IngestionWorker

(AzureworkerroleusingDataStax C#

driver)

C* Analytics

RESTAPI

O365

KafkaC*/Spark

StreamingAnalytics

G4– LocalSSD

Kafka:G4– DataDiskZooKeeper:A7– DataDisk

PaaSSmall

G4– LocalSSD

Cluster1:

Cluster2:

20k – 50k events/sec

200k+ events/sec

Data Protection•Maximilian Schrems v Data Protection Commissioner•No longer OK to ship EU data to US under “Safe

Harbour”

Product_Catalog RF=3Product_Catalog RF=3 Customer_Data RF=3Customer_Data RF=0

Product_Catalog RF=3Customer_Data RF=3

• 300k customers•Report on energy usage• Predict boiler failure

“We’re dealing largely with time series data, and Spark is 10 to 100 times quicker as it is operating on data in-memory…Cassandra delivers what we need today and if you look at the Internet of Things space; that is what is really useful right now.” - Jim Anning, British Gas

Hive Active Heating™

CassandraOnly DC

Cassandra+ Spark DC

Spark Jobs

Spark Streaming

Home Data Center

Hive Active Heating™

Store a ton of data Analyze a ton of data

Thank you!