Netflix: A State of Xen - Chaos Monkey & Cassandra

33
A State of Xen Chaos Monkey & Cassandra

Transcript of Netflix: A State of Xen - Chaos Monkey & Cassandra

Page 1: Netflix: A State of Xen - Chaos Monkey & Cassandra

A State of Xen Chaos Monkey & Cassandra

Page 2: Netflix: A State of Xen - Chaos Monkey & Cassandra

Who we are

Jean-Sebastien Jeannotte – JS

Senior Software Engineer Platform Automation Engineering

[email protected]

@jsjeannotte

http://www.linkedin.com/in/jsjeannotte

Nir Alfasi

Senior Software Engineer Platform Automation Engineering

[email protected]

@niralfasi

http://www.linkedin.com/in/alfasin

Christos Kalantzis

Director of Engineering Cloud Database Engineering

Cassandra MVP

[email protected]

@chriskalan

http://www.linkedin.com/in/christoskalantzis

Page 3: Netflix: A State of Xen - Chaos Monkey & Cassandra

AWS Boot re: September 2014, Every AZ

Page 4: Netflix: A State of Xen - Chaos Monkey & Cassandra
Page 5: Netflix: A State of Xen - Chaos Monkey & Cassandra
Page 6: Netflix: A State of Xen - Chaos Monkey & Cassandra
Page 7: Netflix: A State of Xen - Chaos Monkey & Cassandra
Page 8: Netflix: A State of Xen - Chaos Monkey & Cassandra

Our stack during Re:boot 2014

C* Priam

C* Priam

C* Priam

REST + SSH

Page 9: Netflix: A State of Xen - Chaos Monkey & Cassandra

Our stack during Re:boot 2014

Page 10: Netflix: A State of Xen - Chaos Monkey & Cassandra

Our stack during Re:boot 2014

Page 11: Netflix: A State of Xen - Chaos Monkey & Cassandra

Our stack during Re:boot 2014

C* Priam

C* Priam

C* Priam

REST + SSH

Atlas Atlas App 1

App 2

Page 12: Netflix: A State of Xen - Chaos Monkey & Cassandra

Our stack during Re:boot 2014

Page 13: Netflix: A State of Xen - Chaos Monkey & Cassandra

Our stack during Re:boot 2014

Disappearing  instance?  

Launch  new  instance  

All  good  

Is  the  C*  ring  healthy?  

Yes  

Are  all  instances  healthy?  

Yes  

All  good  

Can  we  fix  automa>cally?  

Replace  bad  instance  

All  good  

Is  there  an  offline  maintenance?  

First  failure?  Sleep  for  X  minutes  and  

retry  

PagerDuty  No   Is  there  an  offline  maintenance?  

First  failure?  

All  good  

Every 30 min

Page 14: Netflix: A State of Xen - Chaos Monkey & Cassandra

Our stack during Re:boot 2014

AWS Boot re: September 2014, Every AZ

Page 15: Netflix: A State of Xen - Chaos Monkey & Cassandra

Gaps we identified

Page 16: Netflix: A State of Xen - Chaos Monkey & Cassandra

Gaps we identified

Page 17: Netflix: A State of Xen - Chaos Monkey & Cassandra

Gaps we identified

Page 18: Netflix: A State of Xen - Chaos Monkey & Cassandra

Gaps we identified

Page 19: Netflix: A State of Xen - Chaos Monkey & Cassandra

New direction

Page 20: Netflix: A State of Xen - Chaos Monkey & Cassandra

New direction – What others are doing

Page 21: Netflix: A State of Xen - Chaos Monkey & Cassandra

New direction – What we decided to do

Page 22: Netflix: A State of Xen - Chaos Monkey & Cassandra

New direction – What we decided to do

Page 23: Netflix: A State of Xen - Chaos Monkey & Cassandra

New direction – What we decided to do

C* Priam

C* Priam

C* Priam

Atlas Atlas App 1

App 2

Page 24: Netflix: A State of Xen - Chaos Monkey & Cassandra

New direction – What we learned (principles)

Page 25: Netflix: A State of Xen - Chaos Monkey & Cassandra

New direction – What we learned (principles)

Page 26: Netflix: A State of Xen - Chaos Monkey & Cassandra

New direction – What we learned (principles)

Synchronous   Asynchronous  

SSH   HTTP  /  REST  

Page 27: Netflix: A State of Xen - Chaos Monkey & Cassandra

New direction – What we learned (principles)

Page 28: Netflix: A State of Xen - Chaos Monkey & Cassandra

New direction – What we learned (principles)

Page 29: Netflix: A State of Xen - Chaos Monkey & Cassandra

What does the future look like?

Page 30: Netflix: A State of Xen - Chaos Monkey & Cassandra

What does the future look like?

Page 31: Netflix: A State of Xen - Chaos Monkey & Cassandra

What does the future look like?

Page 32: Netflix: A State of Xen - Chaos Monkey & Cassandra

Check out our https://jobs.netflix.com page for current openings

Page 33: Netflix: A State of Xen - Chaos Monkey & Cassandra

Who we are

Jean-Sebastien Jeannotte – JS

Senior Software Engineer Platform Automation Engineering

[email protected]

@jsjeannotte

http://www.linkedin.com/in/jsjeannotte

Nir Alfasi

Senior Software Engineer Platform Automation Engineering

[email protected]

@niralfasi

http://www.linkedin.com/in/alfasin

Christos Kalantzis

Director of Engineering Cloud Database Engineering

Cassandra MVP

[email protected]

@chriskalan

http://www.linkedin.com/in/christoskalantzis