Apache Cassandra overview

12
Apache Cassandra overview by Taras Tymoshchuk, software developer at ElifTech

Transcript of Apache Cassandra overview

Apache Cassandraoverview

by Taras Tymoshchuk, software developer at ElifTech

IntroductionWhat is Apache Cassandra?

Apache Cassandra™ is a free

Distributed…High performance…Extremely scalable…Fault tolerant (i.e. no single point of failure)…

post-relational database solution. Cassandra can serve as both real-time datastore (the “system of record”) for online/transactional applications, and as a read-intensive database for business intelligence systems.

Top Use Cases● Internet of things applications – Cassandra is perfect for consuming lots of fast

incoming data from devices, sensors and similar mechanisms that exist in many different locations.

● Product catalogs and retail apps – Cassandra is the database of choice for many retailers that need durable shopping cart protection, fast product catalog input and lookups, and similar retail app support.

● User activity tracking and monitoring – many media and entertainment companies use Cassandra to track and monitor the activity of their users’ interactions with their movies, music, website and online applications.

● Messaging – Cassandra serves as the database backbone for numerous mobile phone and messaging providers’ applications.

● Social media analytics and recommendation engines – many online companies, websites, and social media providers use Cassandra to ingest, analyze, and provide analysis and recommendations to their customers.

Key Cassandra Features and Benefits

● Gigabyte to Petabyte scalability

● Linear performance

● No SPOF

● Easy replication / data distribution

● Multi datacenter and cloud capable

● No need for separate caching layer

● Tunable data consistency

● Flexible schema design

● Data compaction

● CQL language (like SQL)

● Support for key languages and platforms

● No need for special hardware or software

Architecture OverviewIn Cassandra, all nodes play an identical role; there is no concept of a master node.

Cassandra’s built-for-scale architecture means that it is capable of handling large amounts of data and thousands of concurrent users.

Cassandra’s architecture also means that, unlike other master-slave or sharded systems, it has no single point of failure and therefore is capable of offering true continuous availability and uptime.

CQLAstyanix / Hector API:

SliceQuery<string,string,string>query=...

query.set Key (“x”)

query.set Column Family (“y”)

CQL:

SELECT A FROM Y WHERE ID=”X”

Cassandra Data Objects

OverviewCassandra data model

COL1 VAL1 (TS1)COL2 VAL2 (TS2)KEY

Writing Data

Reading Data

Rake

● Bad implemented range scan, Cassandra can not currently transfer data;

● Compaction backing a request;

● Many settings made on the cluster level, type, storage strategy and etc.;

● Counters.

Thank you for your attention!

Find us at eliftech.comHave a question? Contact us:[email protected]