Building Antifragile Applications with Apache Cassandra

@PatrickMcFadin

Patrick McFadinSolutions Architect, DataStax

1Wednesday, August 21, 13

Who I am

• Patrick McFadin• Solution Architect at DataStax• Cassandra MVP• User for years• Follow me for more:

I talk about Cassandra and building scalable, resilient apps ALL THE TIME!

@PatrickMcFadin

Wednesday, August 21, 13

Background - Why are we doing this?

•We live in an always-on society• Data driven applications rule the day (for now)• Failure as reality

•Mike Christian-Yahoo! Director of Engineering, Infrastructure Resilience• Frying Squirrels and unspun gyros

Background - Antifragile as a practice• Antifragile: Things That Gain From Disorder• Nassim Nicholas Taleb• Things that get better with a little chaos

• Jesse Robbins• Master of Disaster at Amazon

• Bringer of “Game Day”

Background - Distributed in a global economy

• Closer to your users == happy users• Latency is just physics• Best chance of light in fibre from US East to US West? 20ms

From To Latency

New York London 75.07

New York Rio De Janeiro 110.28

San Francisco Tokyo 97.33

Singapore Los Angeles 183.19

Tokyo London 242.88

Challenges - When it was just HTTP

• Roaring 90s. Just static HTTP • Some of it was data driven, but most not• One awesome web server

Challenges - More than one web server?

• Pentium 2 web server wasn’t going to do it.•More than one web server? Wow• Distribute the HTML and then...?

Challenges - Spreading the load

• Clients have to find your now spread content• Round Robin DNS to the rescue!• Crazy router hacks• Hardware based load balancers• Resilient to failure

Challenges - Content Distribution Networks

• Cat pictures suck bandwidth• Cat videos suck even MORE bandwidth• User experience depends on response time• Content closer to users? Sweet

9Source: http://www.paulund.co.uk/content-delivery-network-review

Challenges - What about data?

• Databases have been designed as singletons (ACID)•Master - Slave replication dominates• Sharding - No more joins• Distributed replication is hard if not impossible

Panic!!

Challenges - Facebook and MySQL

• Heavily sharded but in one DC• Needed a second data center• 2007 opted for slave DBs• Re-wrote query parser and “hijacked the MySQL replication stream”• Transparent to application

Ref: https://www.facebook.com/note.php?note_id=23844338919&id=9445547199&index=0

Challenges - Finally some science!

• 2007 - Amazon and the Dynamo paper• Attempted to answer:• How do we distribute our data?• How do we maximize uptime?• How can we keep our data safe?

• Consolidated distributed database science• 24 research papers cited• Almost 30 years of distributed computing thought

http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

So now what?

Let’s put it all together

•More than one server - Uptime and Scaling• Get closer to users - Reduce latency.• Transparent - Make it a natural part of the system.

Techniques - Step one

• Control your traffic

Techniques - Hardware traffic control

• F5 GTM - Global Traffic Manager• A10 Thunder• Citrix Netscaler• Cisco ACE• Barracuda ADC

Source: http://www.f5.com

Techniques - Service based control

• DYN - Active failover service• Akamai - Global Traffic Management• Amazon- Route 53

Techniques - DIY control

• Client level awareness• Client chooses which path to take

http://imaginethefutur.blogspot.com/2012/09/javascript-load-balancer-using-cookies.html

A good example

Techniques - Step 2

•Make your app ok with being part of a team• Fail fast - Short circuits• Transactions are expensive• Partial or eventual data is ok

Great discussion on this topic:

http://www.planetcassandra.org/blog/post/a-netflix-experiment-eventual-consistency-hopeful-consistency-by-christos-kalantzis

Techniques - Application Architecture

• Embrace the pod (or application units)• Each stands alone

East Coast10000 Customers

West coast10000 Customers

Next deployable unit10000 Customers

Techniques - State management

• State in the browser?• State in the data layer?• State in both?

Techniques - Step 3

•Make your persistence layer resilient• If your app layer can fail, then why not the DB?•Master-Master - Less complexity• Of course I’m talking about Cassandra

Same data. Fully replicated.

Techniques - Step 4

• Test!! Test!! Test!!

Know these guys?

• Constantly breaking things• Chaos Monkey - Shut down random services• Always be failing so it’s normal• Read this: http://queue.acm.org/detail.cfm?

id=2499552

Cassandra - Intro

• Based on Amazon Dynamo and Google BigTable paper• Shared nothing• Data safe as possible• Predictable scaling

Dynamo

BigTable

Cassandra - More than one server

• All nodes participate in a cluster• Shared nothing• Add or remove as needed•More capacity? Add a server

Cassandra - Locally Distributed

• Client writes to any node• Node coordinates with others• Data replicated in parallel• Replication factor: How many

copies of your data?• RF = 3 here

Cassandra - Geographically Distributed

• Client writes local• Data syncs across WAN• Replication Factor per DC

Cassandra - Consistency

• Consistency Level (CL)• Client specifies per read or write

• ALL = All replicas ack• QUORUM = > 51% of replicas ack• LOCAL_QUORUM = > 51% in local DC ack• ONE = Only one replica acks

Cassandra - Transparent to the application

• A single node failure shouldn’t bring failure• Replication Factor + Consistency Level = Success• This example:• RF = 3• CL = QUORUM

>51% Ack so we are good!

Cassandra Applications - Drivers

• DataStax Drives for Cassandra• Java• C#• Python•more on the way

Cassandra Applications - Connecting

• Create a pool of local servers• Client just uses session to interact with Cassandra

contactPoints = {“10.0.0.1”,”10.0.0.2”}

keyspace = “videodb”

public VideoDbBasicImpl(List<String> contactPoints, String keyspace) {

cluster = Cluster .builder() .addContactPoints(! contactPoints.toArray(new String[contactPoints.size()])) .withLoadBalancingPolicy(Policies.defaultLoadBalancingPolicy()) .withRetryPolicy(Policies.defaultRetryPolicy()) .build();

session = cluster.connect(keyspace); }

Cassandra Applications - Load balancing• Token aware - Request sent to primary node with data• Calls can be asynchronous and in parallel

6Client

Thread

ClientThread

Driver

Cassandra Applications - Fault tolerance

• Try first with a Consistency Level of QUORUM• If fails, retry with Consistency Level ONE

Client Node

Node Replica

Replica

NodeReplica

Application Example - Layout

• Active-Active• Service based DNS routing

Cassandra Replication

Application Example - Uptime

• Normal server maintenance• Application is unaware

Cassandra Replication

Application Example - Failure

• Data center failure• Data is safe. Route traffic.

Another happy user!

Conclusion

• Cassandra is THE BEST persistence tier for your application• Plan for chaos. Inject your own. • Now go write your app.

1. The Data Model is Dead, Long Live the Data Model

http://www.youtube.com/watch?v=px6U2n74q3g

2. Become a Super Modeler

http://www.youtube.com/watch?v=qphhxujn5Es

3. The World's Next Top Data Model

http://www.youtube.com/watch?v=HdJlsOZVGwM

Data Modeling!

https://github.com/pmcfadin/cql3-videodb-example

Example code

Thank you!

CALL FOR PAPERSSPONSORSHIP 30+ SessionsTWO DAYS TRAINING DAYCALL FOR PAPERS

SPONSORSHIP OPPORTUNITY

TWO DAYS30+ SESSIONS

TRAINING DAY

Oh yeah. We are hiring.

Building Antifragile Applications with Apache Cassandra

Technology

Transcript of Building Antifragile Applications with Apache Cassandra

Talk About Apache Cassandra

Введение в Apache Cassandra

State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

About "Apache Cassandra"

Apache Cassandra 2.0

Apache Cassandra - Einführung

Pycon 2012 Apache Cassandra

Introduction to Apache Cassandra

Introduction to Apache Cassandra - DataStax - · PDF fileIntroduction to Apache Cassandra . 2" ... Apache Cassandra™ is a massively scalable NoSQL database. Cassandra’s technical

Cassandra Summit 2014: Apache Cassandra on Pivotal CloudFoundry

Apache Cassandra en SmartPolitech

DevCenter Apache cassandra

Cassandra Community Webinar: Apache Cassandra Internals

Apache Cassandra at Target - Cassandra Summit 2014

Taller Apache Cassandra - eventos.citius.usc.eseventos.citius.usc.es/bigdata/workshops/Cassandra.pdf · Introducción Que es Apache Cassandra 3 Apache Cassandra es un motor de bases

Apache Cassandra in Action - O'Reilly Mediaassets.en.oreilly.com/1/event/55/Apache Cassandra in Action... · Apache Cassandra in Action. Why Cassandra? ... Cassandra in production.

Apache Cassandra at Wayin

Apache cassandra cosnola

Apache Cassandra, part 3 – machinery, work with Cassandra

Cassandra Day Denver 2014: Introduction to Apache Cassandra