Cassandra summit 2015 - Simplifying Streaming Analytics

13
© 2015 Mesosphere, Inc. All Rights Reserved. SIMPLIFYING STREAMING ANALYTICS 1 Cassandra Summit 2015 Brenden Matthews @brndnmtthws

Transcript of Cassandra summit 2015 - Simplifying Streaming Analytics

© 2015 Mesosphere, Inc. All Rights Reserved.

SIMPLIFYING STREAMING ANALYTICS

1

Cassandra Summit 2015

Brenden Matthews @brndnmtthws

© 2015 Mesosphere, Inc. All Rights Reserved.

AGENDA

2

• Introduction • Streaming analytics:

• What is it? • Why do it? • When do I need it? • How? - Demo! • What are the limitations?

© 2015 Mesosphere, Inc. All Rights Reserved.

ABOUT ME - BRENDEN MATTHEWS

3

• ASF member, Mesos committer • Have contributed to a number of related OSS projects,

including Spark, Storm, Kafka, Presto, and a number of Mesos schedulers

• SA @Mesosphere, formerly on the DI team @Airbnb

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAMING ANALYTICS: WHAT IS IT?

4

Indeed.

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAMING ANALYTICS: WHAT IS IT?

5

• Perform joins, aggregations, mutations on data as it happens

• Components typically include: • Producer • Message broker • [E] Consumer • [T] Processing engine • [L] Storage

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAMING ANALYTICS: WHAT IS IT?

6

• Perform joins, aggregations, mutations on data as it happens

• Components typically include: • Producer • Message broker • [Extract] Consumer • [Transform] Processing engine • [Load] Storage

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAMING ANALYTICS: WHY DO IT?

7

• Data is constantly being generated • HTTP traffic, clickstream, IoT, metrics

• Most data is correlated (requires joins) • Data can be pre-denormalized (i.e.,

flattened) • Immutability • Build “real time” services—what’s

happening right now? • Compute once

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAMING ANALYTICS: WHEN DO I NEED IT?

8

• Messaging platform • Compliance • Fraud detection • Firehose consumption • Recommendation engine

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAMING ANALYTICS: HOW?

9

Producer

Broker

Consumer/ML

Storage

Pipeline

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAMING ANALYTICS: HOW?

10

Pipeline

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAMING ANALYTICS: HOW?

11

Demo time!

github.com/mesosphere/iot-demo

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAMING ANALYTICS: WHAT ARE THE LIMITATIONS?

12

• Not a replacement for all batch workloads

• Backfilling is tricky • Unless you retain a log of all data

mutations, backfilling my be impossible

• Maintaining a completely immutable system may explode storage costs

© 2015 Mesosphere, Inc. All Rights Reserved.

QUESTIONS?

13