Post on 06-May-2015
description
@PatrickMcFadin
Patrick McFadinSolutions Architect, DataStax
Building Antifragile Applications with Apache Cassandra
1Wednesday, August 21, 13
Who I am
2
• Patrick McFadin• Solution Architect at DataStax• Cassandra MVP• User for years• Follow me for more:
I talk about Cassandra and building scalable, resilient apps ALL THE TIME!
@PatrickMcFadin
Wednesday, August 21, 13
Background - Why are we doing this?
•We live in an always-on society• Data driven applications rule the day (for now)• Failure as reality
3
•Mike Christian-Yahoo! Director of Engineering, Infrastructure Resilience• Frying Squirrels and unspun gyros
Wednesday, August 21, 13
Background - Antifragile as a practice• Antifragile: Things That Gain From Disorder• Nassim Nicholas Taleb• Things that get better with a little chaos
4
• Jesse Robbins• Master of Disaster at Amazon
• Bringer of “Game Day”
Wednesday, August 21, 13
Background - Distributed in a global economy
• Closer to your users == happy users• Latency is just physics• Best chance of light in fibre from US East to US West? 20ms
5
From To Latency
New York London 75.07
New York Rio De Janeiro 110.28
San Francisco Tokyo 97.33
Singapore Los Angeles 183.19
Tokyo London 242.88
Wednesday, August 21, 13
Challenges - When it was just HTTP
• Roaring 90s. Just static HTTP • Some of it was data driven, but most not• One awesome web server
6Wednesday, August 21, 13
Challenges - More than one web server?
• Pentium 2 web server wasn’t going to do it.•More than one web server? Wow• Distribute the HTML and then...?
7Wednesday, August 21, 13
Challenges - Spreading the load
• Clients have to find your now spread content• Round Robin DNS to the rescue!• Crazy router hacks• Hardware based load balancers• Resilient to failure
8Wednesday, August 21, 13
Challenges - Content Distribution Networks
• Cat pictures suck bandwidth• Cat videos suck even MORE bandwidth• User experience depends on response time• Content closer to users? Sweet
9Source: http://www.paulund.co.uk/content-delivery-network-review
Wednesday, August 21, 13
Challenges - What about data?
• Databases have been designed as singletons (ACID)•Master - Slave replication dominates• Sharding - No more joins• Distributed replication is hard if not impossible
10
Panic!!
Wednesday, August 21, 13
Challenges - Facebook and MySQL
• Heavily sharded but in one DC• Needed a second data center• 2007 opted for slave DBs• Re-wrote query parser and “hijacked the MySQL replication stream”• Transparent to application
11
Ref: https://www.facebook.com/note.php?note_id=23844338919&id=9445547199&index=0
Wednesday, August 21, 13
Challenges - Finally some science!
• 2007 - Amazon and the Dynamo paper• Attempted to answer:• How do we distribute our data?• How do we maximize uptime?• How can we keep our data safe?
• Consolidated distributed database science• 24 research papers cited• Almost 30 years of distributed computing thought
12
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Wednesday, August 21, 13
So now what?
13
Let’s put it all together
•More than one server - Uptime and Scaling• Get closer to users - Reduce latency.• Transparent - Make it a natural part of the system.
Wednesday, August 21, 13
Techniques - Step one
• Control your traffic
14Wednesday, August 21, 13
Techniques - Hardware traffic control
• F5 GTM - Global Traffic Manager• A10 Thunder• Citrix Netscaler• Cisco ACE• Barracuda ADC
15
Source: http://www.f5.com
Wednesday, August 21, 13
Techniques - Service based control
• DYN - Active failover service• Akamai - Global Traffic Management• Amazon- Route 53
16Wednesday, August 21, 13
Techniques - DIY control
• Client level awareness• Client chooses which path to take
17
http://imaginethefutur.blogspot.com/2012/09/javascript-load-balancer-using-cookies.html
A good example
Wednesday, August 21, 13
Techniques - Step 2
•Make your app ok with being part of a team• Fail fast - Short circuits• Transactions are expensive• Partial or eventual data is ok
18
Great discussion on this topic:
http://www.planetcassandra.org/blog/post/a-netflix-experiment-eventual-consistency-hopeful-consistency-by-christos-kalantzis
Wednesday, August 21, 13
Techniques - Application Architecture
• Embrace the pod (or application units)• Each stands alone
19
East Coast10000 Customers
West coast10000 Customers
Next deployable unit10000 Customers
Wednesday, August 21, 13
Techniques - State management
• State in the browser?• State in the data layer?• State in both?
20
No!
Wednesday, August 21, 13
Techniques - Step 3
•Make your persistence layer resilient• If your app layer can fail, then why not the DB?•Master-Master - Less complexity• Of course I’m talking about Cassandra
21
Same data. Fully replicated.
Wednesday, August 21, 13
Techniques - Step 4
• Test!! Test!! Test!!
22
Know these guys?
• Constantly breaking things• Chaos Monkey - Shut down random services• Always be failing so it’s normal• Read this: http://queue.acm.org/detail.cfm?
id=2499552
Wednesday, August 21, 13
Cassandra - Intro
• Based on Amazon Dynamo and Google BigTable paper• Shared nothing• Data safe as possible• Predictable scaling
23
Dynamo
BigTable
Wednesday, August 21, 13
Cassandra - More than one server
• All nodes participate in a cluster• Shared nothing• Add or remove as needed•More capacity? Add a server
24Wednesday, August 21, 13
Cassandra - Locally Distributed
• Client writes to any node• Node coordinates with others• Data replicated in parallel• Replication factor: How many
copies of your data?• RF = 3 here
25Wednesday, August 21, 13
Cassandra - Geographically Distributed
• Client writes local• Data syncs across WAN• Replication Factor per DC
26Wednesday, August 21, 13
Cassandra - Consistency
• Consistency Level (CL)• Client specifies per read or write
27
• ALL = All replicas ack• QUORUM = > 51% of replicas ack• LOCAL_QUORUM = > 51% in local DC ack• ONE = Only one replica acks
Wednesday, August 21, 13
Cassandra - Transparent to the application
• A single node failure shouldn’t bring failure• Replication Factor + Consistency Level = Success• This example:• RF = 3• CL = QUORUM
28
>51% Ack so we are good!
Wednesday, August 21, 13
Cassandra Applications - Drivers
• DataStax Drives for Cassandra• Java• C#• Python•more on the way
29Wednesday, August 21, 13
Cassandra Applications - Connecting
• Create a pool of local servers• Client just uses session to interact with Cassandra
30
contactPoints = {“10.0.0.1”,”10.0.0.2”}
keyspace = “videodb”
public VideoDbBasicImpl(List<String> contactPoints, String keyspace) {
cluster = Cluster .builder() .addContactPoints(! contactPoints.toArray(new String[contactPoints.size()])) .withLoadBalancingPolicy(Policies.defaultLoadBalancingPolicy()) .withRetryPolicy(Policies.defaultRetryPolicy()) .build();
session = cluster.connect(keyspace); }
Wednesday, August 21, 13
Cassandra Applications - Load balancing• Token aware - Request sent to primary node with data• Calls can be asynchronous and in parallel
31
1
23
45
6Client
Thread
Node
Node
Node
ClientThread
ClientThread
Node
Driver
Wednesday, August 21, 13
Cassandra Applications - Fault tolerance
• Try first with a Consistency Level of QUORUM• If fails, retry with Consistency Level ONE
32
Client Node
Node Replica
Replica
NodeReplica
Wednesday, August 21, 13
Application Example - Layout
• Active-Active• Service based DNS routing
33
Cassandra Replication
Wednesday, August 21, 13
Application Example - Uptime
34
• Normal server maintenance• Application is unaware
Cassandra Replication
Wednesday, August 21, 13
Application Example - Failure
35
• Data center failure• Data is safe. Route traffic.
33
Another happy user!
Wednesday, August 21, 13
Conclusion
36
• Cassandra is THE BEST persistence tier for your application• Plan for chaos. Inject your own. • Now go write your app.
1. The Data Model is Dead, Long Live the Data Model
http://www.youtube.com/watch?v=px6U2n74q3g
2. Become a Super Modeler
http://www.youtube.com/watch?v=qphhxujn5Es
3. The World's Next Top Data Model
http://www.youtube.com/watch?v=HdJlsOZVGwM
Data Modeling!
https://github.com/pmcfadin/cql3-videodb-example
Example code
Wednesday, August 21, 13
Thank you!
37
CALL FOR PAPERSSPONSORSHIP 30+ SessionsTWO DAYS TRAINING DAYCALL FOR PAPERS
SPONSORSHIP OPPORTUNITY
TWO DAYS30+ SESSIONS
TRAINING DAY
Oh yeah. We are hiring.
Q&A?
Wednesday, August 21, 13