Couchbase Analytics: an overview – Connect Silicon Valley 2017
-
Upload
couchbase -
Category
Technology
-
view
55 -
download
0
Transcript of Couchbase Analytics: an overview – Connect Silicon Valley 2017
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
COUCHBASE ANALYTICSAn Overview
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
AGENDA01/
02/
03/
04/
What is Couchbase Analytics
How to use it?
From the inside out
Developer Preview 4
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
Why Couchbase Analytics?
• Support OLTP and OLAP processing in a single platform
• Eliminate the need for a separate OLAP system
• Eliminate ETL
• Reduces latency
• Reduces complexity
• Enables more intelligent applications
• Enable data exploration and ad hoc analytics
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
What is Couchbase Analytics?
• Common programming model & data model
• Unified management
• Fast data synchronization
• Extend Couchbase Platform to power real-time analytics
• Ad-hoc queries (“Ask me anything!”)
• Workload isolation
• Independent scaling
Scale out
architecture
Query Mobile & IoT AnalyticsPreview
Memory-first
architecture
Unified Programming
Search
Core Database Engine
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
Data: Beer Sample
{
"name": "Commonwealth Brewing #1",
"city": "Boston",
"state": "Massachusetts",
"code": "",
"country": "United States",
"phone": "",
"website": "",
"type": "brewery",
"updated": "2010-07-22 20:00:20",
"description": "",
"address": [ ],
"geo": {
"accuracy": "APPROXIMATE",
"lat": 42.3584,
"lng": -71.0598
}
}
{
"name": "Piranha Pale Ale",
"abv": 5.7,
"ibu": 0,
"srm": 0,
"upc": 0,
"type": "beer",
"brewery_id": "110f04166d",
"updated": "2010-07-22 20:00:20",
"description": "",
"style": "American-Style Pale Ale",
"category": "North American Ale"
}
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
Simple Join
[{
"brewer": "(512) Brewing Company",
"beer": "(512) ALT"
},
{
"brewer": "(512) Brewing Company",
"beer": "(512) Bruin"
},
{
"brewer": "(512) Brewing Company",
"beer": "(512) IPA"
}]
"Get 3 beers with their breweries"
SELECT bw.name AS brewer, br.name AS beer
FROM breweries bw, beers br
WHERE br.brewery_id = meta(bw).id
ORDER BY bw.name, br.name
LIMIT 3;
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
Non-key Self Join
[{
"brewer1": "aberdeen_brewing",
"brewer2": "hoffbrau_steaks_brewery_2",
"beer": "Scottish Ale"
},
{
"brewer1": "aberdeen_brewing",
"brewer2": "carlyle_brewing",
"beer": "Scottish Ale"
},
{
"brewer1": "aberdeen_brewing",
"brewer2": "belhaven_brewery",
"beer": "Scottish Ale"
}]
"Get 3 beer names used by different breweries"
SELECT b1.name AS beer,
b1.brewery_id AS brewer1,
b2.brewery_id AS brewer2
FROM beers b1, beers b2
WHERE b1.name = b2.name
AND b1.brewery_id != b2.brewery_id
ORDER BY b1.brewery_id
LIMIT 3;
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
Nested Outer Join
[{
"beers": [
{ "abv": 8.2, "name": "(512) Pecan Porter" },
{ "abv": 5.8, "name": "(512) Pale" }, ...
],
"brewer": "(512) Brewing Company"
},
{
"beers": [
{ "abv": 7.2, "name": "21A IPA" },
{ "abv": 5.8, "name": "North Star Red" }, ...
],
"brewer": "21st Amendment Brewery Cafe"
}]
"Get 2 breweries and the list of their beers"
SELECT bw.name AS brewer, (
SELECT br.name, br.abv
FROM beers br
WHERE br.brewery_id = meta(bw).id
) AS beers
FROM breweries bw
ORDER BY bw.name
LIMIT 2;
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
Grouping and Aggregation
[{
"num_beers": 57,
"brewery_id": "midnight_sun_brewing_co"
},
{
"num_beers": 49,
"brewery_id": "rogue_ales"
},
{
"num_beers": 38,
"brewery_id": "anheuser_busch"
}
]
"Get all breweries that produce more than 37 beers"
SELECT br.brewery_id,
COUNT(*) AS num_beers
FROM beers br
GROUP BY br.brewery_id
HAVING num_beers > 37
ORDER BY num_beers DESC;
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
Putting it all together
[{
"num_beers": 5,
"beer_strength": 12.02,
"city": "Vorchdorf"
},
{
"num_beers": 8,
"beer_strength": 10.3125,
"city": "Buggenhout"
},
{
"num_beers": 11,
"beer_strength": 10.045454545454545,
"city": "Fraserburgh"
}]
"Explore beer characteristics by city"
SELECT bw.city, COUNT(*) AS num_beers,
AVG(br.abv) AS beer_strength
FROM beers br, breweries bw
WHERE br.brewery_id = meta(bw).id
GROUP BY bw.city
HAVING num_beers > 1
ORDER BY beer_strength DESC
LIMIT 3;
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
Couchbase Analytics DDL: Lifecycle
• DDL for shadow datasets
CREATE BUCKET `beer-sample`;
CREATE SHADOW DATASET beers ON `beer-sample` WHERE `type` = "beer";
CREATE SHADOW DATASET breweries ON `beer-sample` WHERE `type` = "brewery";
CONNECT BUCKET `beer-sample`;
SELECT * FROM beers ORDER BY abv DESC LIMIT 12;
DISCONNECT BUCKET `beer-sample`;
DROP DATASET breweries ;
DROP DATASET beers;
DROP BUCKET `beer-sample`;
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
Couchbase Analytics DDL: Lifecycle
• DDL for shadow datasets for external data
CREATE BUCKET `beer-sample` WITH { "nodes": "node1.mydomain.com,node2.mydomain.com" };
CREATE SHADOW DATASET beers ON `beer-sample` WHERE `type` = "beer";
CREATE SHADOW DATASET breweries ON `beer-sample` WHERE `type` = "brewery";
CONNECT BUCKET `beer-sample` WITH { "password": "!@#", "timeout": 2000 };
SELECT * FROM beers ORDER BY abv DESC LIMIT 12;
DISCONNECT BUCKET `beer-sample`;
DROP DATASET breweries ;
DROP DATASET beers;
DROP BUCKET `beer-sample`;
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
Why another service?
• Common programming model & data model
• Unified management
• Fast data synchronization
• Extend Couchbase Platform to power real-time analytics
• Ad-hoc queries (“Ask me anything!”)
• Workload isolation
• Independent scaling
Scale out
architecture
Query Mobile & IoT AnalyticsPreview
Memory-first
architecture
Unified Programming
Search
Core Database Engine
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
Couchbase Query and Analytics
Many queries Each touches a little data Fewer queries Each touches a lot of data
Couchbase Query Couchbase Analytics
Optimized for
Analytics
(OLAP)
Optimized for
Operations
(OLTP)
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
"Get the 10 chattiest users in a timeframe"
SELECT user.id, COUNT(message) AS count
FROM gbook_messages AS message, gbook_users AS user
WHERE message.author_id = user.id
AND message.send_time BETWEEN "2001-11-28T09:57:13" AND "2001-11-29T09:57:13"
GROUP BY user.id
ORDER BY count DESC
LIMIT 10;
Example: Join, Grouping, and Aggregation
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
Couchbase Query and Analytics – Performance Tradeoff
1m (<10) 1h (<500) 1d (<5000)
Join GBy CBA Join GBy N1QL GSI
1w (<25K) 1mo (<100K) 3mo (<300K) 6mo (<600K)
Join GBy CBA Join GBy N1QL GSI
interval (# records)
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
"Secret" Sauce: Query Parallelism
• Massively Parallel Query Processor (MPP) executes complex queries on large datasets
• Comprehensive query language
Query takes 1 minute Query takes 15 seconds
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
Couchbase Analytics Coupling
• Separate services, separate nodes
• Multi-Dimensional Scaling
• Workload isolation
• Parallel shadowing of data(sets) via DCP
• Low impact on data nodes
• Low latency
ANALYTICS
ANALYTICS
ANALYTICS
ANALYTICS
DATA
DATA
DATA
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
What is in the Developer Preview?
• Common programming model & data model
• Unified management
• Fast data synchronization
• Extend Couchbase Platform to power real-time analytics
• Ad-hoc queries (“Ask me anything!”)
• Workload isolation
• Independent scaling
Scale out
architecture
Query Mobile & IoT AnalyticsPreview
Memory-first
architecture
Unified Programming
Search
Core Database Engine
✔
✔
✔
✔
✔
✔
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
Workbench
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
Get it
https://www.couchbase.com/downloads
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
THANK YOU
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
Couchbase Analytics and friends
Operations Analytics
BatchOnline
Key Value CB Query CB Analytics Spark Hadoop
𝜇s ms 30s Minutes+
1 record Trillions of records
Start up overhead
Job-based
Parallel query
ETL