Cassandra 3 new features @ Geecon Krakow 2016
-
Upload
duyhai-doan -
Category
Technology
-
view
377 -
download
1
Transcript of Cassandra 3 new features @ Geecon Krakow 2016
![Page 1: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/1.jpg)
Cassandra 3.0 new features
DuyHai DOAN Apache Cassandra Evangelist
Speaker’s Name, 11-13 May 2016
![Page 2: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/2.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Apache Cassandra Evangelist!• talks, meetups, confs!• open-source devs (Achilles, Apache Zeppelin)!• OSS Cassandra point of contact!
☞ [email protected]! ☞ @doanduyhai
Who Am I ?
![Page 3: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/3.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Datastax • Founded in April 2010!• We contribute a lot to Apache Cassandra™!• 400+ customers (25 of the Fortune 100), 450+ employees!• Headquarter in San Francisco Bay area!• EU headquarter in London, offices in France and Germany!
• Datastax Enterprise = OSS Cassandra + extra features!
![Page 4: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/4.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Agenda • Materialized Views (MV)!• User Defined Functions (UDF) & User Defined Aggregates (UDA)!• JSON syntax!• New SASI full text search!
![Page 5: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/5.jpg)
Materialized Views (MV)
DuyHai DOAN (@doanduyhai) Kraków, 11-13 May 2016
![Page 6: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/6.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Why Materialized Views ? • Relieve the pain of manual denormalization!
CREATE TABLE user(id int PRIMARY KEY, country text, …); CREATE TABLE user_by_country( country text, id int, …, PRIMARY KEY(country, id));
![Page 7: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/7.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Materialized Views creation
CREATE TABLE user_by_country ( country text, id int, firstname text, lastname text, PRIMARY KEY(country, id));
CREATE MATERIALIZED VIEW user_by_country AS SELECT country, id, firstname, lastname FROM user WHERE country IS NOT NULL AND id IS NOT NULL PRIMARY KEY(country, id)
![Page 8: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/8.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Materialized View Demo
![Page 9: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/9.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Materialized Views Performance • Write performance
• slower than normal write!• local lock + read-before-write cost (but paid only once for all views)!• for each base table update, worst case: mv_count x 2 (DELETE +
INSERT) extra mutations for the views!
![Page 10: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/10.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Materialized Views Performance • Write performance vs manual denormalization
• MV better because no client-server network traffic for read-before-write • MV better because less network traffic for multiple views (client-side
BATCH)
• Makes developer life easier à priceless
![Page 11: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/11.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Materialized Views Performance • Read performance vs secondary index
• MV better because single node read (secondary index can hit many nodes)
• MV better because single read path (secondary index = read index + read data)
![Page 12: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/12.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Materialized Views Consistency • Consistency level!
• CL honoured for base table, ONE for MV + local batchlog!
• Weaker consistency guarantees for MV than for base table !
![Page 13: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/13.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Q & A
! "
![Page 14: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/14.jpg)
User Defined Functions (UDF)
DuyHai DOAN (@doanduyhai) Kraków, 11-13 May 2016
![Page 15: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/15.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Rationale • Push computation server-side!
• save network bandwidth (1000 nodes!)!• simplify client-side code!• provide standard & useful function (sum, avg …)!• accelerate analytics use-case (pre-aggregation for Spark)!
![Page 16: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/16.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] [keyspace.]functionName (param1 type1, param2 type2, …) CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT RETURNS returnType LANGUAGE language AS $$ // source code here $$;
![Page 17: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/17.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] [keyspace.]functionName (param1 type1, param2 type2, …) CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT RETURNS returnType LANGUAGE language AS $$ // source code here $$;
Param name to refer to in the code!Type = Cassandra type!
![Page 18: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/18.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] [keyspace.]functionName (param1 type1, param2 type2, …) CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT RETURNS returnType LANGUAGE language AS $$ // source code here $$;
Always called. Null-check mandatory in code !
![Page 19: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/19.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] [keyspace.]functionName (param1 type1, param2 type2, …) CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT RETURNS returnType LANGUAGE language AS $$ // source code here $$;
If any input is null, function execution is skipped and return null!
![Page 20: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/20.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] [keyspace.]functionName (param1 type1, param2 type2, …) CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT RETURNS returnType LANGUAGE language AS $$ // source code here $$;
Cassandra types!• primitives (boolean, int, …)!• collections (list, set, map)!• tuples!• UDT!
![Page 21: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/21.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] [keyspace.]functionName (param1 type1, param2 type2, …) CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT RETURNS returnType LANGUAGE language AS $$ // source code here $$;
JVM supported languages!• Java, Scala!• Javascript (slow)!• Groovy, Jython, JRuby!• Clojure ( JSR 223 impl issue)!
![Page 22: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/22.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
UDF Demo
![Page 23: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/23.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
User Define Aggregate (UDA) • Real use-case for UDF!
• Aggregation server-side à huge network bandwidth saving !
• Provide similar behavior for Group By, Sum, Avg etc …!
![Page 24: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/24.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
How to create an UDA ?
CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS] [keyspace.]aggregateName(type1, type2, …) SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction] INITCOND initCond;
Only type, no param name!
State type!Initial state type!
![Page 25: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/25.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
How to create an UDA ?
CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS] [keyspace.]aggregateName(type1, type2, …) SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction] INITCOND initCond;
Accumulator function signature:!accumulatorFunction(stateType, type1, type2, …)!RETURNS stateType!!Accumulator function ≈ foldLeft function !
![Page 26: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/26.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
How to create an UDA ?
CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS] [keyspace.]aggregateName(type1, type2, …) SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction] INITCOND initCond;
Optional final function signature: finalFunction(stateType)
![Page 27: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/27.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
How to create an UDA ?
CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS] [keyspace.]aggregateName(type1, type2, …) SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction] INITCOND initCond;
Optional final function signature: finalFunction(stateType)
![Page 28: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/28.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
UDA Demo
![Page 29: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/29.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Gotchas • UDA in Cassandra is not distributed !!
• Do not execute UDA on a large number of rows (106 for ex.)!• single fat partition!• multiple partitions!• full table scan!!
• à Increase client-side timeout!• default Java driver timeout = 12 secs!
![Page 30: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/30.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Cassandra UDA or Apache Spark ?
Consistency Level
Single/MultiplePartition(s)
RecommendedApproach
ONE Single partition! UDA with token-aware driver because node local!
ONE Multiple partitions! Apache Spark because distributed reads!
> ONE Single partition! UDA because data-locality lost with Spark!
> ONE Multiple partitions! Apache Spark definitely!
![Page 31: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/31.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Q & A
! "
![Page 32: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/32.jpg)
JSON Syntax
DuyHai DOAN (@doanduyhai) Kraków, 11-13 May 2016
![Page 33: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/33.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Why JSON ? • JSON is a very good exchange format
• But a terrible schema …!!
• How to have best of both worlds ?!• use Cassandra schema!• convert rows to JSON format!
![Page 34: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/34.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
JSON Syntax Demo
![Page 35: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/35.jpg)
SASI full text search index
DuyHai DOAN (@doanduyhai) Kraków, 11-13 May 2016
![Page 36: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/36.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Why SASI ? • Searching (and full text search) was always a pain point for
Cassandra!• limited search predicates (=, <=, <, > and >= only)!• limited scope (only on primary key columns)!
• Existing secondary index performance is poor!• reversed-index!• use Cassandra itself as index storage …!• limited predicate ( = ). Inequality predicate = full cluster scan😱!
![Page 37: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/37.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
How is it implemented ? • New index structure = suffix trees
• Extended predicates (=, inequalities, LIKE %)!
• Full text search (tokenizers, stop-words, stemming …)!
• Query Planner to optimize AND predicates!
• NO, we don’t use Apache Lucene
![Page 38: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/38.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Who made it ? • Open source contribution by an engineers team from …!!
![Page 39: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/39.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Full Text Search Demo
![Page 40: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/40.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
When is it available ? • Right now with Cassandra ≥ 3.5!
• available in Cassandra 3.4 but critical bugs!
• Later improvement!• index on collections (List, Set & Map) !!• OR clause (WHERE (xxx OR yyy) AND zzz)!• != operator!
![Page 41: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/41.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
SASI vs Search Engine SASI vs Solr/ElasticSearch/Datastax Enterprise Search ?!
• Cassandra is not a search engine !!! (database = durability)!• always slower because 2 passes (SASI index read + original Cassandra
data)!• no scoring• no ordering (ORDER BY)!• no grouping (GROUP BY) à Apache Spark for analytics!
!
!
![Page 42: Cassandra 3 new features @ Geecon Krakow 2016](https://reader031.fdocuments.net/reader031/viewer/2022030317/586fde6e1a28ab18428b6bc3/html5/thumbnails/42.jpg)
Duyhai DOAN (@doanduyhai) Kraków, 11-13 May 2016
Q & A
! "