Real-Time Analytics with MemSQL and Spark

29
Neil Dahlke, Engineer 2016 November 4 Real-Time Analytics with MemSQL and Spark

Transcript of Real-Time Analytics with MemSQL and Spark

Page 1: Real-Time Analytics with MemSQL and Spark

Neil Dahlke, Engineer

2016 November 4

Real-Time Analytics with MemSQL and Spark

Page 2: Real-Time Analytics with MemSQL and Spark

About Me: Neil Dahlke Engineer

MemSQL • real-time database for transactions / analytics

Formerly Globus • high performance data transfer for research scientists

Past talks• Real-time, Geospatial, Maps

Slides: http://www.slideshare.net/MemSQL/realtime-geospatial-maps-by-neil-dahlke

Page 3: Real-Time Analytics with MemSQL and Spark

WHAT WEARE SEEING

A WORLD OF CONNECTED MACHINES AND PEOPLE

Page 4: Real-Time Analytics with MemSQL and Spark

WHAT WE ARE SEEING:Sensors. Applications. Machines. And us.Generating more data every single day.

By 2020, over 20 billion connected things willbe in use across a range of industries.

Page 5: Real-Time Analytics with MemSQL and Spark

REAL-TIMEINPUTS

LIVEOUTPUTS

Sensors

Logs

Events

Streaming

Inserts

Upserts

Queries

DashboardsBusiness

Intelligence

Applications

Predict Analytics

Page 6: Real-Time Analytics with MemSQL and Spark

WHAT DO REAL TIME BUSINESSES NEED?

FAST DATAINGEST

The volume of data that can be ingested

into the database

Page 7: Real-Time Analytics with MemSQL and Spark

WHAT DO REAL TIME BUSINESSES NEED?

LOW LATENCYQUERIES

The time it takes to execute queries and

receive results

Page 8: Real-Time Analytics with MemSQL and Spark

WHAT DO REAL TIME BUSINESSES NEED?

HIGHCONCURRENCYThe ability to scale

simultaneous operations

Page 9: Real-Time Analytics with MemSQL and Spark

WHAT DO REAL TIME BUSINESSES NEED?

FAST DATAINGEST

The volume of data that can be ingested

into the database

LOW LATENCYQUERIES

The time it takes to execute queries and

receive results

HIGHCONCURRENCYThe ability to scale

simultaneous operations

Page 10: Real-Time Analytics with MemSQL and Spark

REAL-TIMEINPUTS

LIVEOUTPUTS

Sensors

Logs

Events

Streaming

Inserts

Upserts

Queries

DashboardsBusiness

Intelligence

Applications

Predict Analytics

Page 11: Real-Time Analytics with MemSQL and Spark

A massively scalable database and ingest solution allowed for massive growth, real-time analytic applications and faster, targeted.

+

Page 12: Real-Time Analytics with MemSQL and Spark

Kafka• Component we kept

S3 • Persisted all logs to cold storage for eventual analysis

Hadoop• Nighly map-reduce jobs

Redshift• Took a full day to load data from previous day• Reaching overlap of times caused data crisis

Before

Page 13: Real-Time Analytics with MemSQL and Spark

No real time access to analytics No SQL interface for analysts and data scientists Massive nightly Hadoop batch jobs (late data) Unfiltered and incomplete data (silos) Expensive

Why was this bad for their business operations?

Page 14: Real-Time Analytics with MemSQL and Spark

Why was this bad for their data operations?

Too slow Not scalable No deduplication

• aka not exactly-once Low concurrency

FAST DATAINGEST LOW

LATENCYQUERIES

HIGHCONCURRENCY

Page 15: Real-Time Analytics with MemSQL and Spark

How It Works Now

Page 16: Real-Time Analytics with MemSQL and Spark

After

Page 17: Real-Time Analytics with MemSQL and Spark
Page 18: Real-Time Analytics with MemSQL and Spark
Page 19: Real-Time Analytics with MemSQL and Spark

TECHNICAL BENEFITS Instant accuracy to the latest re-pin 1 GB/sec totaling 72 TB/day

THE PINTEREST REAL-TIME ARCHITECTURE

REAL-TIMEANALYTICS

Page 20: Real-Time Analytics with MemSQL and Spark

Accelerated ingesttime by 200,000x

1 GB/sec totaling 72 TB/day

RESULTS

Page 21: Real-Time Analytics with MemSQL and Spark
Page 22: Real-Time Analytics with MemSQL and Spark

Visualizing The Data

Page 23: Real-Time Analytics with MemSQL and Spark

23

Page 24: Real-Time Analytics with MemSQL and Spark

24

Visualizing the Data Demo built using 

• Mapbox• Websockets• Tornado web server

When an image is re pinned, the circles on the globe expand, showing higher volume areas

Reads data from MemSQL directly

Page 25: Real-Time Analytics with MemSQL and Spark

25

DEMO

Page 26: Real-Time Analytics with MemSQL and Spark

Questions?

Page 29: Real-Time Analytics with MemSQL and Spark

Thank You