Top five questions to ask when choosing a big data solution
-
Upload
jbellis -
Category
Technology
-
view
109 -
download
3
description
Transcript of Top five questions to ask when choosing a big data solution
![Page 1: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/1.jpg)
Five factors to consider when choosing a big data solution!Jonathan EllisCTO, DataStaxProject Chair, Apache Cassandra
![Page 2: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/2.jpg)
©2012 DataStax
how do I
modelmy application?
![Page 3: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/3.jpg)
©2012 DataStax
Popular options• Key/value
• Tabular
• Document
• Graph?
![Page 4: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/4.jpg)
©2012 DataStax
Schema is your friend
{ "id": "e451dd42-ece3-11e1-a0a3-34159e154f4c", "name": "jbellis", "state": "TX", "birthdate": "1/1/1976", "email_addresses": ["jbellis@gmail", "[email protected]"],}
![Page 5: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/5.jpg)
©2012 DataStax
SQL can be your friend too
CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date);
CREATE INDEX ON users(state);
SELECT * FROM usersWHERE state=‘Texas’ AND birth_date > ‘1950-01-01’;
![Page 6: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/6.jpg)
©2012 DataStax
CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date);
CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text);
SELECT *FROM users NATURAL JOIN users_addresses;
Collections
![Page 7: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/7.jpg)
©2012 DataStax
CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date);
CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text);
SELECT *FROM users NATURAL JOIN users_addresses;
Collections
X
![Page 8: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/8.jpg)
©2012 DataStax
CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date, email_addresses set<text>);
UPDATE usersSET email_addresses = email_addresses + {‘[email protected]’, ‘[email protected]’};
Collections
![Page 9: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/9.jpg)
©2012 DataStax
Joins don’t scale• No joins
• No subqueries
• No aggregation functions* or GROUP BY
• ORDER BY?
![Page 10: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/10.jpg)
©2012 DataStax
SELECT * FROM tweetsWHERE user_id IN (SELECT follower FROM followers WHERE user_id = ’driftx’)
followers
?
tweets
![Page 11: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/11.jpg)
©2012 DataStax
CREATE TABLE timeline ( user_id uuid, tweet_id timeuuid, tweet_author uuid, tweet_body text, PRIMARY KEY (user_id, tweet_id));
Clustering in Cassandrauser_id tweet_id _author _body
jbellis 3290f9da.. rbranson loremjbellis 3895411a.. tjake ipsum
... ... ...
driftx 3290f9da.. rbranson loremdriftx 71b46a84.. yzhang dolor
... ... ...
yukim 3290f9da.. rbranson loremyukim e451dd42.. tjake amet
... ... ...
![Page 12: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/12.jpg)
©2012 DataStax
CREATE TABLE timeline ( user_id uuid, tweet_id timeuuid, tweet_author uuid, tweet_body text, PRIMARY KEY (user_id, tweet_id));
Clustering in Cassandrauser_id tweet_id _author _body
jbellis 3290f9da.. rbranson loremjbellis 3895411a.. tjake ipsum
... ... ...
driftx 3290f9da.. rbranson loremdriftx 71b46a84.. yzhang dolor
... ... ...
yukim 3290f9da.. rbranson loremyukim e451dd42.. tjake amet
... ... ...
SELECT * FROM timelineWHERE user_id = ’driftx’;
![Page 13: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/13.jpg)
©2012 DataStax
how does it
perform?
![Page 14: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/14.jpg)
©2012 DataStax
Larger than memory datasets
![Page 15: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/15.jpg)
©2012 DataStax
Locking
![Page 16: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/16.jpg)
©2012 DataStax
Efficiency
![Page 17: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/17.jpg)
©2012 DataStax
UPDATE usersSET email_addresses = email_addresses + {...}WHERE user_id = ‘jbellis’;
![Page 18: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/18.jpg)
©2012 DataStax
Durability
![Page 19: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/19.jpg)
©2012 DataStax
C* storage engine very briefly
Memory
Hard drive
Memtable
write( , )k1 c1:v1
Commit log
![Page 20: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/20.jpg)
©2012 DataStax
Memory
Hard drive
Memtable
write( , )k1 c1:v1
Commit log
k1 c1:v1
k1 c1:v1
![Page 21: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/21.jpg)
©2012 DataStax
Memory
Hard drive
write( , )k1 c2:v2
k1 c1:v1
k1 c1:v1
k1 c2:v2
c2:v2
![Page 22: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/22.jpg)
©2012 DataStax
Memory
Hard drive
k1 c1:v1
k1 c1:v1
k1 c2:v2
c2:v2
write( , )k2 c1:v1 c2:v2
k2 c1:v1 c2:v2
k2 c1:v1 c2:v2
![Page 23: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/23.jpg)
©2012 DataStax
Memory
Hard drive
k1 c1:v1
k1 c1:v4
k1 c2:v2
c2:v2
write( , )k1 c1:v4 c3:v3
k2 c1:v1 c2:v2
k2 c1:v1 c2:v2
k1 c1:v4 c3:v3
c3:v3
![Page 24: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/24.jpg)
©2012 DataStax
Memory
Hard drive
SSTable
flush
k1 c1:v4 c2:v2
k2 c1:v1 c2:v2
c3:v3
index
cleanup
![Page 25: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/25.jpg)
©2012 DataStax
No random writes
![Page 26: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/26.jpg)
©2012 DataStax
0
5000
10000
15000
20000
25000
30000
35000
Cassandra 0.6
Cassandra 1.0
reads/s writes/s
![Page 27: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/27.jpg)
©2012 DataStax
how does it handle
failure?
![Page 28: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/28.jpg)
©2012 DataStax
Classic partitioning with SPOFpartition 1 partition 2 partition 3 partition 4
router
client
![Page 29: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/29.jpg)
©2012 DataStax
Availability• “High availability implies that a single fault will not bring
down your system. Not ‘we’ll recover quickly.’” -- Ben Coverston: DataStax
• “The biggest problem with failover is that you're almost never using it until it really hurts. It's like backups that you never test.” -- Rick Branson: Instagram
![Page 30: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/30.jpg)
©2012 DataStax
Fully distributed, no SPOFclient
p1
p1
p1p3
p6
![Page 31: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/31.jpg)
©2012 DataStax
Multiple datacenters
![Page 32: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/32.jpg)
©2012 DataStax
![Page 33: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/33.jpg)
©2012 DataStax
how does it
scale?
![Page 34: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/34.jpg)
©2012 DataStax
Scaling antipatterns• Metadata servers
• Router bottlenecks
• Overloading existing nodes when adding capacity
![Page 35: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/35.jpg)
©2012 DataStax
![Page 36: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/36.jpg)
©2012 DataStax
how
flexibleis it?
![Page 37: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/37.jpg)
36
![Page 38: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/38.jpg)
©2012 DataStax
Data model: Realtime
Portfolios
StockHist
stock lastGOOG $95.52AAPL $186.10AMZN $112.98
LiveStocks
stock date priceGOOG 2011-01-01 $8.23GOOG 2011-01-02 $6.14GOOG 2011-001-03 $7.78
user stock sharesjbellis GOOG 80jbellis LNKD 20yukim AMZN 100
![Page 39: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/39.jpg)
©2012 DataStax
Data model: Analytics
worst_date loss2011-07-23 -$34.812011-03-11 -$11432.242011-05-21 -$1476.93
Portfolio1
HistLoss
Portfolio2Portfolio3
![Page 40: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/40.jpg)
©2012 DataStax
Data model: Analyticsstock rdate returnGOOG 2011-07-25 $8.23GOOG 2011-07-24 $6.14GOOG 2011-07-23 $7.78AAPL 2011-07-25 $15.32AAPL 2011-07-24 $12.68
10dayreturns
INSERT OVERWRITE TABLE 10dayreturnsSELECT a.stock, b.date as rdate, b.price - a.priceFROM StockHist a JOIN StockHist b ON (a.stock = b.stock AND date_add(a.date, 10) = b.date);
![Page 41: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/41.jpg)
©2012 DataStax
Data model: Analytics
portfolio rdate preturnPortfolio1 2011-07-25 $118.21Portfolio1 2011-07-24 $60.78Portfolio1 2011-07-23 -$34.81Portfolio2 2011-07-25 $2143.92Portfolio3 2011-07-24 -$10.19
portfolio_returns
INSERT OVERWRITE TABLE portfolio_returnsSELECT portfolio, rdate, SUM(b.return)FROM portfolios a JOIN 10dayreturns b ON (a.stock = b.stock)GROUP BY portfolio, rdate;
![Page 42: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/42.jpg)
©2012 DataStax
Data model: Analytics
INSERT OVERWRITE TABLE HistLossSELECT a.portfolio, rdate, minpFROM ( SELECT portfolio, min(preturn) as minp FROM portfolio_returns GROUP BY portfolio) a JOIN portfolio_returns b ON (a.portfolio = b.portfolio and a.minp = b.preturn);
worst_date loss2011-07-23 -$34.812011-03-11 -$11432.242011-05-21 -$1476.93
Portfolio1
HistLoss
Portfolio2Portfolio3
![Page 43: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/43.jpg)
42
![Page 44: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/44.jpg)
©2012 DataStax
Some Cassandra users
![Page 45: Top five questions to ask when choosing a big data solution](https://reader034.fdocuments.net/reader034/viewer/2022051613/54c666464a7959f3208b45b4/html5/thumbnails/45.jpg)
Questions?
• http://www.!ickr.com/photos/26817893@N05/2573006312/
• http://www.!ickr.com/photos/rowanbank/7686239548
• http://www.!ickr.com/photos/mervtheswerve/6081933265
• http://www.!ickr.com/photos/dg_pics/2526208830
• http://www.!ickr.com/photos/wainwright/351684037
• http://www.!ickr.com/photos/mikeneilson/1606662529
• http://www.!ickr.com/photos/sbisson/3852905534
• http://www.!ickr.com/photos/breadnbadger/2674928517
Image credits