C* path

64
#CASSANDRAEU CASSANDRASUMMITEU C* Path: Denormalize your data Eric Zoerner | Software Developer, eBuddy BV Cassandra Summit Europe 2013 London dinsdag 22 oktober 13

description

Library for decomposing your structured data and storing it in Cassandra. Same simple API implemented for both Thrift and CQL.

Transcript of C* path

Page 1: C* path

#CASSANDRAEU CASSANDRASUMMITEU

C* Path:Denormalize your data

Eric Zoerner | Software Developer, eBuddy BV Cassandra Summit Europe 2013 London

dinsdag 22 oktober 13

Page 2: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Topics

• About eBuddy

• Introducing C* Path

• How does it work?

• Design and Challenges

• Cassandra Data Model

• Futures

dinsdag 22 oktober 13

Page 3: C* path

#CASSANDRAEU CASSANDRASUMMITEU

About eBuddy

dinsdag 22 oktober 13

Page 4: C* path

#CASSANDRAEU CASSANDRASUMMITEU

XMS

dinsdag 22 oktober 13

Page 5: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Cassandra ineBuddy Messaging Platform

• User Data Service

dinsdag 22 oktober 13

Page 6: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Cassandra ineBuddy Messaging Platform

• User Data Service

• User Discovery Service

dinsdag 22 oktober 13

Page 7: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Cassandra ineBuddy Messaging Platform

• User Data Service

• User Discovery Service

• Persistent Session Store

dinsdag 22 oktober 13

Page 8: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Cassandra ineBuddy Messaging Platform

• User Data Service

• User Discovery Service

• Persistent Session Store

• Message History

dinsdag 22 oktober 13

Page 9: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Cassandra ineBuddy Messaging Platform

• User Data Service

• User Discovery Service

• Persistent Session Store

• Message History

• Location-based Discovery

dinsdag 22 oktober 13

Page 10: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Some Statistics

• Current size of data– 1,4 TB total (replication of 3x); 467 GB actual data

• 16 million sessions (11 million users plus groups)

• Almost a billion rows in one column family(inverse social graph)

dinsdag 22 oktober 13

Page 11: C* path

#CASSANDRAEU CASSANDRASUMMITEU

C* Path

dinsdag 22 oktober 13

Page 12: C* path

#CASSANDRAEU CASSANDRASUMMITEU

The Problem (a “classic”)

Complex Object

name: Stringbirthdate: Datenickname: String

Person

street: Stringcity: Stringprovince: StringpostalCode: StringcountryCode: String

Address

*1

name: Stringnumber: String

Phone*

1

??

??

??

? ?

Key-Value Store(RDB table, NoSQL, etc.)

dinsdag 22 oktober 13

Page 13: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Some Strategies

Serialization!

dinsdag 22 oktober 13

Page 14: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Some StrategiesSerialization!

Normalization!

Personid

John

birthdate

Jack

1979-11-30

110 1985-04-06

Mary111 Mary

name nickname

person_id

001

003

street

New York

78 Hoofd Str

456 Singel

110 123 Main St

Amsterdam110 002

address_id city

London111

Address

person_id

mobile

mobile

phone

+44030393

+44884800

110 +15551234

111 home

name

111

Phone

dinsdag 22 oktober 13

Page 15: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Some StrategiesSerialization!

Normalization!

Decomposition!

Personid

John

birthdate

Jack

1979-11-30

110 1985-04-06

Mary111 Mary

name nickname

person_id

001

003

street

New York

78 Hoofd Str

456 Singel

110 123 Main St

Amsterdam110 002

address_id city

London111

Address

person_id

mobile

mobile

phone

+44030393

+44884800

110 +15551234

111 home

name

111

Phone

name/ John

addresses/@0/street 123 Main St.

phones/@0/number +31123456789

... ...

dinsdag 22 oktober 13

Page 16: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Strategies Comparison

✔ ✘ ✔

✔ ✘ ✔

✔ ✔

✘ ✔ ✔

✔ ✔ ✘

Serialization Normalization Decomposition

Single Write

Single Read

Consistent Updates not enforced

Structural Access

Cycles

dinsdag 22 oktober 13

Page 17: C* path

#CASSANDRAEU CASSANDRASUMMITEU

C* Path

Open Source Java Library for decomposing complex objects into Path-Value pairs —and storing them in Cassandra

https://github.com/ ebuddy/c-star-path

* Artifacts available at Maven Central.

dinsdag 22 oktober 13

Page 18: C* path

#CASSANDRAEU CASSANDRASUMMITEU

C* Path: Decomposition

• Easy to Use • Simple API

dinsdag 22 oktober 13

Page 19: C* path

#CASSANDRAEU CASSANDRASUMMITEU

C* Path: Decomposition

• Easy to Use • Simple API

• Good for Cassandra because:

– Structural Access: Write parts of objects without reading first

dinsdag 22 oktober 13

Page 20: C* path

#CASSANDRAEU CASSANDRASUMMITEU

C* Path: Decomposition

• Easy to Use • Simple API

• Good for Cassandra because:

– Structural Access: Write parts of objects without reading first

– Good for denormalizing data, can read or write large complex objects with one read or write operation

dinsdag 22 oktober 13

Page 21: C* path

#CASSANDRAEU CASSANDRASUMMITEU

How does it work?

dinsdag 22 oktober 13

Page 22: C* path

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Write to a Path

StructuredDataSupport<UUID> dao = … ;UUID rowKey = … ;Pojo pojo = … ;

dinsdag 22 oktober 13

Page 23: C* path

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Write to a Path

StructuredDataSupport<UUID> dao = … ;UUID rowKey = … ;Pojo pojo = … ;

Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”);

dinsdag 22 oktober 13

Page 24: C* path

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Write to a Path

StructuredDataSupport<UUID> dao = … ;UUID rowKey = … ;Pojo pojo = … ;

Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”);

dao.writeToPath(rowKey, path, pojo);

dinsdag 22 oktober 13

Page 25: C* path

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Read from a Path

Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”);

dinsdag 22 oktober 13

Page 26: C* path

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Read from a Path

Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”);

Pojo pojo = dao.readFromPath(rowKey, path, new TypeReference<Pojo>() { });

dinsdag 22 oktober 13

Page 27: C* path

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Delete

dao.deletePath(rowKey, path);

dinsdag 22 oktober 13

Page 28: C* path

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Batch Operations

BatchContext batch = dao.beginBatch();

dao.writeToPath(rowKey1, path, pojo1, batch);dao.writeToPath(rowKey2, path, pojo2, batch);dao.deletePath(rowKey3, path, pojo3, batch);

dao.applyBatch(batch);

dinsdag 22 oktober 13

Page 29: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Read or write at any level of a path

Person person = …;

Path path = dao.createPath(“x”);dao.writeToPath(rowKey, path, person);

dinsdag 22 oktober 13

Page 30: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Read or write at any level of a path

Person person = …;

Path path = dao.createPath(“x”);dao.writeToPath(rowKey, path, person);

Path pathToName = path.withElements(“name”);String name = dao.readFromPath(rowKey, pathToName, stringTypeReference);

dinsdag 22 oktober 13

Page 31: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Write Implementation: Decomposition

• Step 1:

– Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations

dinsdag 22 oktober 13

Page 32: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Write Implementation: Decomposition

• Step 1:

– Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations

• Step 2:

– Decompose this basic structure into a map of paths to simple values (i.e. String, Number, Boolean), done by Decomposer

dinsdag 22 oktober 13

Page 33: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Write Implementation: Decomposition

• Step 1:

– Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations

• Step 2:

– Decompose this basic structure into a map of paths to simple values (i.e. String, Number, Boolean), done by Decomposer

• Step 3:

– Write this map as key-value pairs in the database

dinsdag 22 oktober 13

Page 34: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Example Decomposition - step 1

name: Stringbirthdate: Datenickname: String

Person

street: Stringcity: Stringprovince: StringpostalCode: StringcountryCode: String

Address

*1

name: Stringnumber: String

Phone*

1

Simplify structure into regular Maps, Lists, and simple values

dinsdag 22 oktober 13

Page 35: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Example Decomposition - step 1

Simplify structure into regular Maps, Lists, and simple values

Map

name = "John" birthdate = "-39080932298" nickname="Jack" addresses=<List>

[0] = <Map>

[1] = <Map>

street="Singel 45"

place="Amsterdam"

street="123 Main"

place="New York"

phones=<List>

[0] = <Map>

name="mobile"

number="+31651234567"

dinsdag 22 oktober 13

Page 36: C* path

#CASSANDRAEU CASSANDRASUMMITEU

path value

name/ “John”

birthdate/ “-39080932298”

nickname/ “Jack”

addresses/@0/street “123 Main St.”

addresses/@0/place “New York”

addresses/@1/street “Singel 45”

addresses/@1/place “Amsterdam”

phones/@0/name “mobile”

phones/@1/number "+31651234567"

Example Decomposition - step 2

dinsdag 22 oktober 13

Page 37: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Read implementation: Composition

• Step 1:

– Read path-value pairs from database

dinsdag 22 oktober 13

Page 38: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Read implementation: Composition

• Step 1:

– Read path-value pairs from database

• Step 2:

– “Merge” path-value maps back into basic structure(Maps, Lists, simple values), done by Composer

dinsdag 22 oktober 13

Page 39: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Read implementation: Composition

• Step 1:

– Read path-value pairs from database

• Step 2:

– “Merge” path-value maps back into basic structure(Maps, Lists, simple values), done by Composer

• Step 3:

– Use Jackson to convert basic structure back into domain object using a TypeReference

dinsdag 22 oktober 13

Page 40: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Design & Challenges

dinsdag 22 oktober 13

Page 41: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Path Encoding

• Paths stored as strings

• Forward slashes in paths (but hidden by Path API)

• Path elements are internally URL encoded allowing use of special characters in the implementation

• Special characters: @ for list indices(@0, @1, @2, ...)

dinsdag 22 oktober 13

Page 42: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Challenge: “Shrinking Lists”

➀ Write a list.

x/@0/ “1”

x/@1/ “2”dao.writeToPath(key, “x”, {“1”,”2”});

dinsdag 22 oktober 13

Page 43: C* path

#CASSANDRAEU CASSANDRASUMMITEU

➀ Write a list.➁ Write a shorter list.

x/@0/ “1”

x/@1/ “2”dao.writeToPath(key, “x”, {“1”,”2”});

x/@0/ “3”

x/@1/ “2”dao.writeToPath(key, “x”, {“3”});

Challenge: “Shrinking Lists”

dinsdag 22 oktober 13

Page 44: C* path

#CASSANDRAEU CASSANDRASUMMITEU

➀ Write a list.➁ Write a shorter list.➂ Read the list.

x/@0/ “1”

x/@1/ “2”dao.writeToPath(key, “x”, {“1”,”2”});

x/@0/ “3”

x/@1/ “2”dao.writeToPath(key, “x”, {“3”});

dao.readFromPath(key, “x”, new TypeReference<List<String>>() {});

{“3”,”2”}

Challenge: “Shrinking Lists”

dinsdag 22 oktober 13

Page 45: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Solution:Implementation writes a list terminator value.

x/@0/ “1”

x/@1/ “2”

x/@2/ 0xFFFFFFFF

dao.writeToPath(key, “x”, {“1”,”2”});

x/@0/ “3”

x/@1/ 0xFFFFFFFF

x/@2/ 0xFFFFFFFF

dao.writeToPath(key, “x”, {“3”});

dao.readFromPath(key, “x”, new TypeReference<List<String>>() {});

{“3”}

Challenge: “Shrinking Lists”

dinsdag 22 oktober 13

Page 46: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Solution:Implementation writes a list terminator value.

Challenge: “Shrinking Lists”

Unfortunately, this is only a partial solution, because it is still possible to read “stale” list elements using a positional index in the path.

This can be avoided by doing a delete before a write, but for performance reasons the library will not do that automatically.

Conclusion: The user must know what they are doing and understand the implementation.

dinsdag 22 oktober 13

Page 47: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Challenge: Inconsistent UpdatesBecause objects can be updated at any path, there is no protection against a write “corrupting” an object

structure

x/address/street/ “Singel 45”

x/name/ “John”

Path path = dao.createPath(“x”);dao.writeToPath(key, path, person1);

dinsdag 22 oktober 13

Page 48: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Challenge: Inconsistent UpdatesBecause objects can be updated at any path, there is no protection against a write “corrupting” an object

structure

x/address/street/ “Singel 45”

x/name/ “John”

Path path = dao.createPath(“x”);dao.writeToPath(key, path, person1);

path = dao.createPath(“x”,”name”);dao.writeToPath(key, path, person1);

x/address/street/ “Singel 45”

x/name/ “John”

x/name/address/street/ “Singel 45”

x/name/name/ “John”✘

dinsdag 22 oktober 13

Page 49: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Challenge: Inconsistent Updates

Solution:Don’t do that!

* If it does happen...

The implementation provides a way to still get the “corrupted” data as simple structures, but an attempt to convert to a now incompatible POJO will fail.

Conclusion: The user must know what they are doing and understand the implementation.

dinsdag 22 oktober 13

Page 50: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Issue: Sorting

Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements?

dinsdag 22 oktober 13

Page 51: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Issue: Sorting

Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements?

Instead of storing paths as strings, the implementation could have used DynamicComposite.

dinsdag 22 oktober 13

Page 52: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Issue: Sorting

Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements?

Instead of storing paths as strings, the implementation could have used DynamicComposite.

We tried it.

dinsdag 22 oktober 13

Page 53: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Issue: Sorting

Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements?

It can work. CQL supports it as a user-defined type.

Unfortunately it causes cqlsh to crash, making it difficult to “browse” the data.

dinsdag 22 oktober 13

Page 54: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Issue: Sorting

Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements?

It is still in consideration to use DynamicComposite for paths in a future version.

dinsdag 22 oktober 13

Page 55: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Cassandra Data Model

dinsdag 22 oktober 13

Page 56: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Thriftx/address/street/ “Singel 45”

x/name “John”

… …

<UUID>

row key column name column value

column family

- OR -

super column family

(coming soon)

xxaddress/street/ “Singel 45”name “John”… …

<UUID>

row keysuper column name

dinsdag 22 oktober 13

Page 57: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Thrift

ColumnFamilyOperations<K,String,Object> operations = new ColumnFamilyTemplate<K,String,Object>( keyspace,KeySerializer,StringSerializer,StructureSerializer);

StructuredDataSupport<K> dao = new ThriftStructuredDataSupport<K>(operations);

Thrift implementation relies on the Hector client.

dinsdag 22 oktober 13

Page 58: C* path

#CASSANDRAEU CASSANDRASUMMITEU

CQLCREATE TABLE person (

key text, path text, value text, PRIMARY KEY (key, path) )

• Cannot use the path itself as a column name because it is “dynamic”

• Dynamic column family

dinsdag 22 oktober 13

Page 59: C* path

#CASSANDRAEU CASSANDRASUMMITEU

CQL: Data Model Constraints

• Need to do a range (“slice”) query on the path ⇒ path must be a clustering key

• Also, the path must be the first clustering key, since otherwise we would need to have to provide an equals condition on previous clustering keys in a query.

• One might try putting a secondary index on the path instead of making it a clustering key, but this doesn’t work since Cassandra indexes only work with equals conditionsBad Request: No indexed columns present in by-columns clause with Equal operator

CREATE TABLE person ( key text, path text, value text, PRIMARY KEY (key, path) )

dinsdag 22 oktober 13

Page 60: C* path

#CASSANDRAEU CASSANDRASUMMITEU

CQL

StructuredDataSupport<K> dao = new CqlStructuredDataSupport<K>(String tableName, String partitionKeyColumnName, String pathColumnName, String valueColumnName, Session session);

CQL implementation relies on the DataStax Java driver.

dinsdag 22 oktober 13

Page 61: C* path

#CASSANDRAEU CASSANDRASUMMITEU

And the rest…

dinsdag 22 oktober 13

Page 62: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Planned Features

• Sets with simple values: element values stored in path

• DynamicComposites?

• Multiple row reads and writes

• Slice queries on path ranges

dinsdag 22 oktober 13

Page 63: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Credits and Acknowledgements

• Thanks to Joost van de Wijgerd at eBuddy for his ideas and feedback

• jackson JSON Processor, which is core to the C* Path implementationhttp://wiki.fasterxml.com/JacksonHome

• Image credits:

Slide image name author link

Some Strategies binary noegranado http://www.flickr.com/photos/43360884@N04/6949896929/

dinsdag 22 oktober 13

Page 64: C* path

#CASSANDRAEU CASSANDRASUMMITEU

C* Path

Open Source Java Library for decomposing complex objects into Path-Value pairs —and storing them in Cassandra

https://github.com/ ebuddy/c-star-path

* Artifacts available at Maven Central.

dinsdag 22 oktober 13