C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump
-
Upload
planet-cassandra -
Category
Technology
-
view
1.413 -
download
2
description
Transcript of C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump
#CASSANDRA13
Ma#hew Stump | Architect @ KISSmetrics
Real-time Large Queries
#CASSANDRA13
#CASSANDRA13
KISSmetrics Customers Want*Churn Prediction
*AB Tests
*Which Blog Posts and Ad Campaigns Attract High Value Customers?
*User Conversion Funnel
*Revenue Prediction
*Customer Acquisition Costs
*Customer Lifetime Value
#CASSANDRA13
Understanding Queries
#CASSANDRA13
RowKey username first_name last_name postal_code
cstar cstar Cassandra Database 94110
user2 user2 Some Guy 94112
#CASSANDRA13
RowKey username first_name last_name postal_code
cstar cstar Cassandra Database 94110
user2 user2 Some Guy 94112
#CASSANDRA13
RowKey username first_name last_name postal_code
cstar cstar Cassandra Database 94110
user2 user2 Some Guy 94112
#CASSANDRA13
RowKey
94110 cstar
94112 user2 user4 user7 ...
#CASSANDRA13
Where Secondary Indexes Break
Source: Place source content or footnote here. Delete if not needed.
*High Cardinality Data
*Only one index per query
*Indexes are distributed
*Only some datatypes; no counters
*Range queries are expensive
#CASSANDRA13
What Do I Want?
Source: Place source content or footnote here. Delete if not needed.
*Index high cardinality data; e.g. counters
*Complex queries, with multiple clauses
*Results in < 500ms for billions of rows
*Sub-field searching with regular expressions
*Range queries
#CASSANDRA13
Bitmap and Bit-Slice Indexes
#CASSANDRA13
#CASSANDRA13
RowKey
94110 cstar
94112 user2 user4 user7 ...
#CASSANDRA13
RowKey
94110 00001000 01000000 00000000 000000000
94112 10000110 01000000 00000000 000000000
#CASSANDRA13
RowKey
94110 00001000 01000000 00000000 000000000
94112 10000110 01000000 00000000 000000000
hash(“cstar”) = 4
#CASSANDRA13
SELECT * FROM users WHERE zipcode = 94110 OR zipcode = 94112
94112 or 94110
10001110 01000000 00000000 000000000
Field Index
94110 10001010 01000000 00000000 000000000
94112 10000110 01000000 00000000 000000000
#CASSANDRA13
SELECT * FROM users WHERE Event1 = true AND Event2 = true
Event1 and Event2
10000010 01000000 00000000 000000000
Field Index
Event1 10001010 01000000 00000000 000000000
Event2 10000110 01000000 00000000 000000000
#CASSANDRA13
Field Value Slice
event_counter 1 10001010 01000000 00000000 000000000
event_counter 2 10000110 01000000 00000000 000000000
SELECT * FROM users WHERE event_counter < 5
Value1 or Value2
10000010 01000000 00000000 000000000
#CASSANDRA13
"this is a test string"
#CASSANDRA13
['thi', 's i', 's a', ' te', 'st ', 'str', 'ing']
#CASSANDRA13
['0x746869', '0x732069', '0x732061', '0x207465', '0x737420', '0x737472', '0x696e67']
#CASSANDRA13
Field Value Slice
text_field 0x207465 ' te' 10001010 01000000 00000000 000000000
text_field 0x696e67 'ing' 10111110 10001000 00000000 000001000
text_field 0x732061 's a' 10001010 01000001 00001000 110101110
text_field 0x732069 's i' 10001010 01000000 10110011 000000000
text_field 0x737420 'st ' 10001010 01001100 10110111 000000000
text_field 0x737472 'str' 10001010 01000000 00011010 011000000
text_field 0x746869 'thi' 10001010 01000000 10110111 000000010
#CASSANDRA13
"thi.*ing"
#CASSANDRA13
"thi" AND "ing"
#CASSANDRA13
0x746869 AND 0x696e67
#CASSANDRA13
Field Value Slice
text_field 0x207465 ' te' 10001010 01000000 00000000 000000000
text_field 0x696e67 'ing' 10111110 10001000 00000000 000001000
text_field 0x732061 's a' 10001010 01000001 00001000 110101110
text_field 0x732069 's i' 10001010 01000000 10110011 000000000
text_field 0x737420 'st ' 10001010 01001100 10110111 000000000
text_field 0x737472 'str' 10001010 01000000 00011010 011000000
text_field 0x746869 'thi' 10001010 01000000 10110111 000000010
#CASSANDRA13
"th.*ing"
#CASSANDRA13
"th" AND "ing"
#CASSANDRA13
range(0x746800, 0x7468FF) AND 0x696e67
range("th" + 0x00, "th" + 0xFF) AND "ing"
#CASSANDRA13
Field Value Slice
text_field 0x207465 ' te' 10001010 01000000 00000000 000000000
text_field 0x696e67 'ing' 10111110 10001000 00000000 000001000
text_field 0x732061 's a' 10001010 01000001 00001000 110101110
text_field 0x732069 's i' 10001010 01000000 10110011 000000000
text_field 0x737420 'st ' 10001010 01001100 10110111 000000000
text_field 0x737472 'str' 10001010 01000000 00011010 011000000
text_field 0x746869 'thi' 10001010 01000000 10110111 000000010
text_field 0x74687A 'thz' 10000000 00000001 00011100 000110010
range(0x746800, 0x7468FF) AND 0x696e67
#CASSANDRA13
#CASSANDRA13
Implementation
#CASSANDRA13
Query & Indexing Engine
Queries and Events
#CASSANDRA13
RowKey Offset 0x00 Offset 0x01 Offset 0x02 Offset 0x03
event1_0x00 10011000 10011000
event1_0x01 10011000 10011000 10011000
#CASSANDRA13
Results So Far*Results returned for an 8 clause query for 4 billion rows < 2 second
*Full regular expression support
*Full support for range queries
*Ability to index any numeric value, or value which can be hashed.
#CASSANDRA13
What isn't finished*Support for atomic counters
*"Group By" query aggregation
*Still working on event processing and distribution
#CASSANDRA13
https://github.com/project-z/
#CASSANDRA13
[email protected]@mattstump
#CASSANDRA13
THANK YOU