No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively...

36
No-SQL Databases for High Volume Data Edward Wijnen 3 November 2014 Target Conference 2014

Transcript of No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively...

Page 1: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

No-SQL Databases for High Volume Data

Edward Wijnen3 November 2014

Target Conference 2014

Page 2: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

The New Connected World Needsa Revolutionary New DBMS“The Internet of Things”

Client-Server

Semi-Connected

Isolated

Social

Radically Connected©2014 DataStax Confidential. Do not distribute without consent.

Mobile

Cloud

Mainframe

1970’s

1990’s

Today

Page 3: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Businesses Must Close the Gap…and Fast

Your Business Your Customers

Page 4: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Connected

Customers

Connected

PartnersConnected

EmployeesConnected

Devices

Connected

Products

If You Started With This

Page 5: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Connected

PartnersConnected

EmployeesConnected

Devices

Connected

Products

You Would End With This

Distributed

Transactional

DatabaseConnected

Customers

Page 6: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Apache Cassandra™

• Apache Cassandra™ is a massively scalable, open source, NoSQL,distributed database built for modern, mission-critical online applications

• Written in Java and is a hybrid of Amazon Dynamo and Google BigTable• Masterless with no single point of failure• Distributed and data centre aware• 100% uptime• Predictable scaling

Dynamo

BigTable

Cassandra

BigTable: http://research.google.com/archive/bigtable-osdi06.pdfDynamo: http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf

Page 7: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Apache Cassandra™

You are already using it!

Page 8: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Distributed Transactional Database Advantages

The Hague

C*

Page 9: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Distributed Transactional Database Advantages

The Hague

C*

Page 10: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Distributed Transactional Database Advantages

The Hague

C*

Page 11: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Distributed Transactional Database Advantages

The Hague

C*

Page 12: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

 

Distributed Transactional Database Advantages

The HagueLondon

C*C*

Page 13: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

 

Distributed Transactional Database Advantages

The Hague London

C*C*

Page 14: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Delivers 150+ Billion Content Recommendations Per MonthServes content for largest media brands in the world: Reuters, Wall St Journal, USA TodayNeeded a massively scalable data store High velocity of data with 58,000 links to content per secondAlways-on data architecture

Use Case: Recommendations / Personalization

Page 15: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

 

Distributed Transactional Database Advantages

The HagueLondon

C*C*

Page 16: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

 

Distributed Transactional Database Advantages

The Hague

Groningen

London

C*

C*C*

Page 17: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

 

Distributed Transactional Database Advantages

The Hague

Groningen

London

C*

C*C*

Page 18: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

 

Distributed Transactional Database Advantages

The Hague

Groningen

London

C*

C*C*

Page 19: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

 

Distributed Transactional Database Advantages

The Hague

Groningen

London

C*

C*C*

Page 20: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

 

Distributed Transactional Database Advantages

The Hague

Groningen

London

C*

C*C*

Page 21: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

 

Distributed Transactional Database Advantages

The Hague

Groningen

London

C*

C*C*

C*

Page 22: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Netflix Delights Customers with Personal RecommendationsWorld’s leading streaming media provider with digital revenue $1.5BN+Tailors content delivery based on viewing preference data captured in CassandraIncreased market cap by 600% since 2012Introduction of ‘Profiles’ drove throughput to over 10M transactions per secondReplaced Oracle in six data centers, worldwide, 100% in the cloud

Use Case: Personalization

Page 23: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

 

Cassandra – Always On, No Matter What

The Hague

Groningen

London

C*

C*

C*C*

l Transactional Backbonel Industry Leading

Performance l Predictable Scalability l Operational Simplicity l Business Flexibility

Page 24: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Cassandra – Operational Simplicity

Page 25: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

CAP Theorem – Oracle vs Cassandra

Page 26: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Cassandra – Tunable Consistency

• Consistency Level (CL)• Client specifies per read or write• Handles multi-data center operations

• ALL = All replicas ack• QUORUM = > 51% of replicas ack• LOCAL_QUORUM = > 51% in local DC ack• ONE = Only one replica acks• Plus more…. (see docs)

• Blog: Eventual Consistency != Hopeful Consistencyhttp://planetcassandra.org/blog/post/a-netflix-experiment-eventual-consistency-hopeful-consistency-by-christos-kalantzis/

Node 11st copy

Node 4

Node 5 Node 22nd copy

Node 33rd copy

ParallelWriteWrite

CL=QUORUM

5 μs ack

12 μs ack

500 μs ack

12 μs ack

Node 4

Page 27: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Messaging

Product Catalogs andPlaylists

Recommendation/Personalization

Fraud detection

Internet ofthings/ Sensordata

Common Use Cases

Page 28: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Challenges Customers

A product catalog is an organized collection of products or services. Playlists refer to user-defined queues of songs, movies, games and lessons.Examples: Shopping carts, gift registries, media playlists.

Why DataStax?• Rigidity of relational databases• Increase in volume and diversity of

data• Application must have zero

downtime• Predictable scalability is hard• Desire to operate in the cloud

• Real-time database infrastructure• Rich analytics for flexible access to

information• Fast search and indexing of data• Add new features while the

application is online• Multiple data centers to ensure

applications and data have 100%uptime

Product Catalogs and Playlists

Page 29: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Challenges Customers

Recommendation and Personalization Engines understand each person’s unique habits andpreferences and bring to light products and items that a user may be unaware of and not lookingfor. Examples: News sites, shopping carts.

Why DataStax?• Large volumes of user data makes

accuracy challenging• Merging real-time and historical

information• Cross-product information• Response times need to be fast• Predictable scalability is hard

• Rich query language and enterprisesearch to store, search and analyzeuser activity data

• Integrations with data lakes allow forthe merging of real time andhistorical data

• Multi-data center replication ensuresapplications and data suffer nodowntime

• Linear scalability is predictable

Recommendation Engine

Page 30: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Challenges Customers

Fraud detection solutions identify out-of-the-ordinary patterns to prevent malicious attacks ondigital and physical assets from unauthorized applications and individuals. Examples: credit cardmonitoring, application infiltration

Why DataStax?• Increasing volume of fraudulent

attacks across all industries• Technology sophistication• Limited historical and trend

information • Information is stored across multiple

channels• The customer can be the first to spot

the fraud

• Easy management of high-datavolumes

• Real-time monitoring acrosschannels, sites and data centers

• Integrations with data lakes allow forthe merging of real time andhistorical data

• Ease of use in managing andmonitoring data

• Multi-data center replication ensuresapplications and data suffer nodowntime

Fraud Detection

Page 31: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Challenges Customers

IOT refers to the revolution of a growing number of internet-connected devices that cannetwork and communicate with each other.

Why DataStax?• Vast and diverse amounts of

unstructured data from internetenabled devices

• Volume of sensors is increasingexponentially

• Fast-changing technology• Support multiple channels with

varying data types• Predictable scalability is hard

• Easy management of high-datavolumes

• Rich query language and enterprisesearch to store, search and analyzedata

• Dynamic database schema• Linear scalability offers predictability

Internet of Things/ Sensor Data

Page 32: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Challenges Customers

Messaging facilitates communication, interaction and collaboration between diverse user-groupsand applications via social networks, cloud services and more. Examples: SMS, email and instant messaging.

Why DataStax?• Managing large data volumes at a

reasonable cost• Real-time updates and information,

getting detailed alerts andnotifications

• Predictable scalability is hard• Information is stored across multiple

platforms and systems• Agility

• Easy management of high-data volumes• Real-time monitoring across

channels, sites and data centers• Multi-data center replication ensures

applications and data suffer nodowntime

• Ease of use in managing andmonitoring data

• Dynamic database schema

Messaging

Page 33: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

The Weather Channel onLearning to use Cassandra

“If you had a look in the past, you may have foundCassandra had a high learning curve and a fair amountof complexity. CQL3, the native drivers, and virtualnodes have changed the game entirely, makingCassandra a much more accessible and friendlyplatform.

While I have years of experience using Cassandra, myteam was mostly new to it; CQL made their transitionessentially painless. But where Cassandra reallyshines is in speed and operational simplicity, and Iwould say those two points were critical.”

Robbie Strickland Software Dev Manager

Page 34: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

• Application drivers and connectors for all popular developerlanguages exist for Cassandra and DataStax Enterprise.

• CQL (Cassandra Query Language) is the primary API• Drivers/connectors include:

• Java• C++• Python• Ruby• PHP

API’s and Drivers

Page 35: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Connected

PartnersConnected

EmployeesConnected

Devices

Connected

Products

DataStax – Enabling The Future

Distributed

Transactional

DatabaseConnected

Customers

Page 36: No-SQL Databases for High Volume Data · Apache Cassandra™ •Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed database built for modern, mission-critical

Thank you!

[email protected]@edwardwijnen