Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo
-
Upload
memsql -
Category
Data & Analytics
-
view
414 -
download
0
Transcript of Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo
![Page 1: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/1.jpg)
Bringing OLAP Fully OnlineAnalyze Changing Datasets in MemSQL and Spark with Pinterest Demo
Eric Frenkiel, MemSQL CEO
Rob Stepeck, Novus CTO
Yu Yang, Pinterest Software Engineer
Feb 19, 2015 • San Jose, CA
![Page 2: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/2.jpg)
What’s in store for this presentation
▸MemSQL: The real-time database for transactions and analytics
▸Case Study with Novus CTO, Rob Stepeck
▸New Developments in Spark
▸Advanced Analytics with Demo from Pinterest SofwareEngineer, Yu Yang
![Page 3: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/3.jpg)
THE REAL-TIME DATABASE FOR
TRANSACTIONS AND ANALYTICS
MemSQL Story
![Page 4: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/4.jpg)
MemSQL Snapshot
▸Experienced Leadership
• Microsoft, Facebook, Oracle, Fusion-io
▸ Inspired by Enterprise architecture gap
▸A real-time database for transactionsand analytics
• In-memory, distributed, SQL
▸Broad customer adoption across verticals
▸Top tier investors
4
![Page 5: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/5.jpg)
Four ways your DBMS is holding you back
▸ETL (Extract, Transform, Load)
▸Analytic Latency
▸Synchronization
▸Copies of data
Source: Gartner Hybrid/Transactional/Analytical Processing Will Foster Opportunities for Dramatic Business Innovation
![Page 6: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/6.jpg)
The Real-Time Database for Transactions and Analytics
6
MemSQL Cluster
Data Loading and Queries
Aggregator Nodes
Leaf Nodes
Availability Group 1
Availability Group 2
![Page 7: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/7.jpg)
HOW NOVUS ENABLES INVESTORS TO
CONSISTENTLY MAXIMIZE THEIR
PERFORMANCE POTENTIAL USING
MEMSQL
Novus Case Study
![Page 8: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/8.jpg)
Quick Background on Novus
Rob Stepeck
Chief Technology Officer▸ Investment acumen, risk, insights
and data management
▸$2 trillion in client assets
▸Used by 100 of the world’s top
investment managers and investors
▸Founded in 2007 by group of
investors, data scientists and
engineers
8
![Page 9: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/9.jpg)
Before MemSQL
Problem:
▸Write operations inefficient
▸ Loading data was a 24 hour operation
▸ Failures could significantly impact subsequent processes
▸ Loading client data degraded system performance
▸ Scaling was non-trivial
▸ Prospect data integration trade-offs
9
![Page 10: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/10.jpg)
MemSQL Implementation
Reduce Latency SQL Support
10
Scale with Ease
Novus choose to use MemSQL based on the following
data management requirements
![Page 11: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/11.jpg)
After MemSQL
Results:
▸ 24 hour data cycle down to several hours
▸ Scale is achieved by adding/removing
clusters with ease
▸ Learning curve is non existent
▸ Eliminated data ‘hand-holding’ so team
can focus on more important initiatives
▸ Sales are more effective because they can
use a customer’s actual data
11
![Page 12: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/12.jpg)
Example: ‘Refresh a Client’
12
Convert to
In-memory
Backing
Store
Before MemSQL:
After MemSQL:
90 Min.
Raw Data
2 Min.
![Page 13: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/13.jpg)
NEW DEVELOPMENTS IN SPARK
MemSQL Spark Connector
![Page 14: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/14.jpg)
Interest in Spark
▸Recent survey of 2100 developers
– 82% of users choose Spark to replace MapReduce
– 78% of users need faster processing of larger datasets
Source: Typesafe, APACHE SPARK - Preparing for the Next Wave of Reactive Big Data
![Page 15: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/15.jpg)
Spark Data Processing Framework
▸Intuitive, concise, and expressive operations needed for analytics
15
Spark
SQL
Spark
Streaming
Mllib
(machine
learning)
GraphX
(graph)
Apache Spark
![Page 16: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/16.jpg)
Enterprises Seek Simple Ways to Use Spark
▸Spark with operational data stores delivers new use cases
▸In-memory, distributed databases such as MemSQL fit well
![Page 17: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/17.jpg)
Understanding MemSQL and Spark
17
Cluster-wide Parallelization | Bi-Directional
![Page 18: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/18.jpg)
MemSQL and Spark Use Cases
▸Operationalize models built in Spark
▸Stream and event processing
▸Live dashboards and automated reports
▸Extend MemSQL analytics
18
![Page 19: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/19.jpg)
Operationalize Models Built in Spark
▸Process in Spark, persist to MemSQL
▸Go to production and iterate faster
19
MemSQL ClusterSpark Cluster
Enterprise
Consumption
Data into
Spark
Model CreationModel
Persistence
![Page 20: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/20.jpg)
Stream and Event processing
▸Structure event data on the fly
▸Pass to MemSQL for persistent, queryable format
20
MemSQL ClusterSpark Cluster
Enterprise
Consumption
Real-time
Streaming Data
Data
Transformation
Persistent,
Queryable Format
![Page 21: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/21.jpg)
Extend MemSQL Analytics
▸The freshest data for analysis in Spark
▸Load from MemSQL to Spark and write results on return
21
MemSQL ClusterSpark Cluster
Applications,
Data Streams
Interactive Analytics,
Machine Learning
MemSQL
Replicated
Cluster
Access to Live
Production DataReal-time Replica
![Page 22: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/22.jpg)
Live Dashboards and Automated Reports
▸Serve live dashboards from MemSQL
▸Run custom reports on live data with Spark
22
MemSQL ClusterSpark Cluster
Live
DashboardsCustom Reporting
Access to Live
Production Data
SQL Transactions
and Analytics
![Page 23: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/23.jpg)
REAL-TIME ANALYTICS IN PRACTICE
Pinterest Demo
![Page 24: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/24.jpg)
Pinterest Demo
▸Yu Yang Software Engineer at Pinterest
![Page 25: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/25.jpg)
Prototypeevents
Kafka
App
Realtime Analytics at Pinterest
Singer
Insights
Spark
Secor
![Page 26: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/26.jpg)
Why Spark
▸Pinterest has high traffic and an active community
▸Always looking for new ways to help users
▸Processing event data presents unique challenges
▸Spark is the leading processing framework for big data
deployments
▸Spark Streaming is ideal for real-time data structuring
![Page 27: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/27.jpg)
How It Works
All at sub-second speed
27
![Page 28: Bringing olap fully online analyze changing datasets in mem sql and spark with pinterest demo](https://reader031.fdocuments.net/reader031/viewer/2022032216/55a68e8c1a28abb97d8b4842/html5/thumbnails/28.jpg)