Introduction to Apache Cassandra and support within WSO2 Platform

29
Introduction to Apache Cassandra and support within WSO2 Platform Srinath Perera WSO2 Inc.

description

 

Transcript of Introduction to Apache Cassandra and support within WSO2 Platform

Page 1: Introduction to Apache Cassandra and support within WSO2 Platform

Introduction to Apache Cassandra and support within

WSO2 PlatformSrinath Perera

WSO2 Inc.

Page 2: Introduction to Apache Cassandra and support within WSO2 Platform

Cassandra within the WSO2 Platform

We support Apache Cassandra within WSO2 Platform This is to provide NoSQL data support within the platform Cassandra can be used for both Column family or Key-

value pair usecases. Fully integrated with the Platform

o We will discuss what this means.

Page 3: Introduction to Apache Cassandra and support within WSO2 Platform

What is Cassandra?

Apache Cassandra http://cassandra.apache.org/ NoSQL column family implementation (more about it later) Highly scalable, available and no Single Point of Failure. Very high write throughput and good read throughput. It is

pretty fast. SQL like query language (from 0.8) and support search

through secondary indexes (well no JOINs, Group By etc. ..).

Tunable consistency and support replication Flexible Schema

Page 4: Introduction to Apache Cassandra and support within WSO2 Platform

Column Family Data Model Column – name, value, and a timestamp (ignore this for

now). Column is bit of a misnomer, may be they should have called it a named cell. o E.g. author=“Asimov” .

Row – row is a collection of Columns with a name. entries are sorted by the column names. You can do a slice and get some of the columns only. o E.g. “Second Foundation”->{author=“Asmiov”, publishedDate=“..”,

tag=“sci-fi”, tag2=“asimov” } Column family – Collection of rows, usually no sort order

among rows*. o Books->{

“Foundation”->{author=“Asmiov”, publishedDate=“..”}, “Second Foundation”->{author=“Asmiov”,

publishedDate=“..”},…..

} There are other stuff, but these are the key.

Page 5: Introduction to Apache Cassandra and support within WSO2 Platform

Column Family Data Model (Contd.)

It is crucial to understand that Cassandra Columns are very different from RDBMS Columns.

Columns are only applied within a given row, different row may have different columns.

You can have thousands to millions of column for a row (2 million max, and a row should fit in one node).

Column names may represent data, not just metadata like with RDBMS.

You will understand more with the example.

Page 6: Introduction to Apache Cassandra and support within WSO2 Platform

OK?? How can I do something useful with

this?

Page 7: Introduction to Apache Cassandra and support within WSO2 Platform

Example: Book Rating Site

Let us take a Book rating site as an example. Users add books, comment them and tag them. o Can Add books (author, rank, price, link)o Can add Comments for books (text, time, name)o Can add tags for bookso Need to list books sorted by ranko Need to list books by tag o Need to list comments for a book

Page 8: Introduction to Apache Cassandra and support within WSO2 Platform

Relational Approach

Schema o Books(bookid, author, rank, price, link) o Comments->(id, text, user, time, bookid) o Tags(id, bookid, tag)

Queries o Select * from Books orderby rank;o Select text, time, user from Comments where bookid=? Orderby

timeo Select tag from Tags where bookid=?o Select bookid from Tags where tag=“”o Select distinct author from Tags, Books where

Tags.bookid=Books.bookid and tag=?

Page 9: Introduction to Apache Cassandra and support within WSO2 Platform

Cassandra Approach Schema

o Books[BookID->(author, rank, price, link, tag1, tag2 ..) ] o Tags2Books[TagID->(timestamp1=bookID1, timestamp2=bookID2, ..) ] o Tags2Authors[TagID->(timestamp1=bookID1, timestamp2=bookID2, ..) ]o Comments[BookID-> (timestamp1= text + “-” + author …)]o Ranks[“RANK” -> (rank=bookID)]

Example data snapshotTable Data Items

Books “Foundation” -> (author=asimov, rank=9, price=14, tag1=“sci-fi” tag2=“future”)“I Robot” -> (author=asimov, rank=7, price=14, tag1=“sci-fi” tag2=“robots”)

Tags2Books “sci-fi” -> 1311031405918=“Foundation”, 1311031405919=“I Robot”“future” -> …

Tags2Authors “sci-fi” -> 1311031405920=“Asimov”“future” -> …

Comments “Foundation” -> (1311031405922=“best book-sanjiva”, 1311031405923=“well I disagree-srinath”)“I Robot” -> (1311031405924=“Asimov’s best-srinath”, 1311031405928=“I like foundation better-sanjiva”)

Ranks Rank -> (9=“Foundation”, 7=“I Robot”)

Page 10: Introduction to Apache Cassandra and support within WSO2 Platform

Potential Solution [Contd.]

Handling Queries

Column Family Row ID Column Details

Books BookID Author rank Price ..

Tags2Books TagID ts1=BookID ts2=BookID …

Tags2Authors TagID ts1=Author1 Ts2=Author2 ….

Comments BookID Ts1=text+”-”+ author

Rank “Rank” Rank=BookID

SQL Query Cassandra Implementation

Select * from Books order by rank and then do Select tag from Tags where bookid=?” on each result

Get ordered list of books from rank and for each book do one read.

Select text, time, user from Comments where bookid=? Order by time

one read from comments

Select distinct author from Tags, Books where Tags.bookid=Books.bookid and tag=?

One read from Tags2Books

Select bookid from Tags where tag=“” One read from Tags2Authros

Page 11: Introduction to Apache Cassandra and support within WSO2 Platform

Some Queries You Can Not Do

Above setup can do some queries it designed for. It can not queries it can not designed for For example, it can not do following

o Select * from Books where price > 50; o Select * from Books where author=“Asimov”o Select * from Books, Comments where rank> 9 &&

Comments.bookid=Books.bookid; Well it can, but by writing code to walk through. It is like

supporting search by going through all the data. This is a limitation, specially when queries are provided at

the runtime.

Page 12: Introduction to Apache Cassandra and support within WSO2 Platform

A Sample ProgramCluster cluster = HFactory.createCluster("TestCluster",

new CassandraHostConfigurator("localhost:9160”));Keyspace keyspace = HFactory.createKeyspace(keyspaceName, cluster);

Mutator<String> mutator = HFactory.createMutator(keyspace, sser);mutator.insert(“wso2”, columnFamily,

HFactory.createStringColumn("address", ”4131 El Camino Real Suite 200, Palo Alto, CA 94306"));

ColumnQuery<String, String, String> columnQuery = HFactory.createStringColumnQuery(keyspace);

columnQuery.setColumnFamily(columnFamily).setKey(”wso2”).setName("address");QueryResult<HColumn<String, String>> result = columnQuery.execute();

System.out.println("received "+ result.get().getName() + "= " + result.get().getValue() + " ts = "+ result.get().getClock());

Page 13: Introduction to Apache Cassandra and support within WSO2 Platform

Cassandra: How does it work?

Nodes are arranged in a circle according to a key space(P2P network and uses consistent hashing).

Each node owns the next clockwise address space.

If replicated, each node owns next two clockwise address spaces.

Any node can accept any request and route it to the correct node.

Page 14: Introduction to Apache Cassandra and support within WSO2 Platform

Cassandra: How does it work? (Contd.)

Writes are written to enough nodes, and Cassandra repairs data while reading. (As you would guess, that is how writes are fast.)

Data is updated in the memory, and it keeps an append only commit log to recover from failures. (This avoid rotational latency at the disk). Can do about 80-360MB/sec per node.

When ever a read happens, Cassandra will sync all the nodes having replicas (read repair).

Page 15: Introduction to Apache Cassandra and support within WSO2 Platform

All these are great, but what is the catch?

Do not get me wrong, Cassandra is a great tool, but you have to know

where it does not work.

Page 16: Introduction to Apache Cassandra and support within WSO2 Platform

Surprises if you are using Cassandra

No transactions, no JOINs. Hope there is no surprise here. No foreign keys, and keys are immutable. (well no JOINs,

and use surrogate keys if you need to change keys) Keys has to be unique (use composite keys) Super Columns and order preserving partitioner are

discouraged. Searching is complicated

o No Search coming from the core. Secondary indexes are layered on top, and they do not do range search or pattern search.

o When secondary indexes does not work, have to learn the data model and build your indexes using sort orders and slices.

Sort orders are complicatedo Column are always sorted by name, but row order depends on the

partitioner. Sort orders are crucial when you build your own indexes.

Page 17: Introduction to Apache Cassandra and support within WSO2 Platform

Surprises if you are using Cassandra (Cont.)

Failed Operations may leave changes o If operation is successful, all is wello If it failed, actually changes may have been applied. But

operations are idempotent, so you can retry until successful. Batch operations are not atomic, but you can retry until

successful (as operations are idempotent). If a node fails, Cassandra does not figure it out and do a

self healing. Assuming you have replicas, things will continue to work. But the whole system recovers only when a manual recover operation is done.

It remembers deletes o When we delete a data item, a node may be down at the time and

may come back after the delete is done. To avoid this, Cassandra mark the as deleted (Tombstones) but does not delete this until configurable timeout or a repair. Space is actually freed up only then.

Page 18: Introduction to Apache Cassandra and support within WSO2 Platform

Cassandra within WSO2 Platform

Page 19: Introduction to Apache Cassandra and support within WSO2 Platform

Cassandra within the WSO2 Platform

As a part of WSO2 data solutions Because one storage cannot handle all cases Specifically for applications that need to scale. For

applications that can work with a single DB, we have “Database as a Service”

Two offeringso Provide Cassandra as a Serviceo Provide Cassandra within Carbon as a standalone product

(integrated with WSO2 security model)

Page 20: Introduction to Apache Cassandra and support within WSO2 Platform

Apache Cassandra as a Service

Users can log in to the Web Console (both in Stratos and in WSO Data Server) and create Cassandra key spaces.

Page 21: Introduction to Apache Cassandra and support within WSO2 Platform

Apache Cassandra as a Service (Contd.)

Key spaces o will be allocated from a Cassandra clustero they are isolated from other tenants in Stratoso it is integrated with WSO2 Security model.

Users can manage and share his key spaces through Stratos Web Console and use those key spaces through Hector Client (Java Client for Cassandra)

In essence we provide o Cassandra as a part of Stratos as a Serviceo Multi-tenancy supporto Security integration with WSO2 security model

Page 22: Introduction to Apache Cassandra and support within WSO2 Platform

A sample ProgramMap<String, String> credentials = new HashMap<String, String>();credentials.put(USERNAME_KEY, "[email protected]");credentials.put(PASSWORD_KEY, "admin1234");

Cluster cluster = HFactory.createCluster("TestCluster", new CassandraHostConfigurator("localhost:9160”, credentials));

Keyspace keyspace = HFactory.createKeyspace(keyspaceName, cluster);

Mutator<String> mutator = HFactory.createMutator(keyspace, sser);mutator.insert(“wso2”, columnFamily,

HFactory.createStringColumn("address", ”4131 El Camino Real Suite 200, Palo Alto, CA 94306"));

ColumnQuery<String, String, String> columnQuery = HFactory.createStringColumnQuery(keyspace);

columnQuery.setColumnFamily(columnFamily).setKey(”wso2”).setName("address");QueryResult<HColumn<String, String>> result = columnQuery.execute();

System.out.println("received "+ result.get().getName() + "= " + result.get().getValue() + " ts = "+ result.get().getClock());

Page 23: Introduction to Apache Cassandra and support within WSO2 Platform

Implementation

Page 24: Introduction to Apache Cassandra and support within WSO2 Platform

Implementation (Contd.)

Cassandra includes a plug point to add support for different security models at the server (Authentication and authorization for server).

We do security integration and support isolation among tenants (multi-tenancy) by writing new implementation of this plug point.

Also we provide a Web console to manage Cassandra Key spaces.

Cassandra is highly scalable and highly available, so no work needed at that department.

Page 25: Introduction to Apache Cassandra and support within WSO2 Platform

Cassandra within Carbon Platform

Users may choose to run Carbon enabled Cassandra also in two other alternative settings. o Running whole Stratos within a private Cloud

- Gets full support for the Multi-tenancy and other cloud benefits - Let user run it in his own controlled environment

o Running a standalone Cassandra node (without Multi-tenancy)- Get seamless integration with WSO2 Security model - Use the Configuration Console for Cassandra

Page 26: Introduction to Apache Cassandra and support within WSO2 Platform

Demo

Page 27: Introduction to Apache Cassandra and support within WSO2 Platform

Summary

We discuss what Cassandra is, its strength, weaknesses, and Column Family Data Model. o Has a data model very different from relational style o Need users to rethink their data modelo There is a complexity at design, which is a tradeoff for achieving

higher scalability. o Of course, Cassandra is not the solution for everything. It should

be used when it make sense based on the usecase. We discuss Cassandra integration to WSO2 platform

o Carbon integration – how to run Cassandra that is integrated with WSO2 Carbon platform security model.

o Cassandra as a Service – how to use Cassandra as a Service from WSO2 Stratos Platform as a Service offering.

Page 28: Introduction to Apache Cassandra and support within WSO2 Platform

References

Apache Cassandra o http://cassandra.apache.org o Understanding Column family Model -

http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model

Hector Client o http://github.com/rantav/hectoro http://prettyprint.me/2010/08/06/hector-api-v2/

Some Theoryo Malae, N., Cassandra--A Decentralized Structured Storage Systemo Chang, F. and Dean, J. and Ghemawat, S. and Hsieh, W.C. and

Wallach, D.A. and Burrows, M. and Chandra, T. and Fikes, A. and Gruber, R.E., Bigtable: A distributed storage system for structured data, ACM Transactions on Computer Systems (TOCS), 2008

Page 29: Introduction to Apache Cassandra and support within WSO2 Platform

Questions?