Dark launching with Consul at Hootsuite - Bill Monkman

33
Dark Launching with Consul Senior Specialist Engineer @bmonkman Bill Monkman

Transcript of Dark launching with Consul at Hootsuite - Bill Monkman

Page 1: Dark launching with Consul at Hootsuite - Bill Monkman

Dark Launching with Consul

Senior Specialist Engineer@bmonkman

Bill Monkman

Page 2: Dark launching with Consul at Hootsuite - Bill Monkman

• The most widely used platform for managing social media• Integrates with Twitter, Facebook, Instagram, LinkedIn, G+, etc.• Started 7 years ago, now over 10 million users• Used by over 800 of the Fortune 1000

Hootsuite

Page 3: Dark launching with Consul at Hootsuite - Bill Monkman

• Everything in Amazon AWS (low thousands of servers)• Primary languages PHP, Scala, Python, Go• 10+ releases to production per day• 20+ Microservices• Started using Consul in late 2014• Also using Vagrant, Packer, Terraform, Vault

Hootsuite

Page 4: Dark launching with Consul at Hootsuite - Bill Monkman

• Deployed in all datacenters (AWS Regions) in staging and prod• Clusters of 3-5 servers (Multi-AZ)• Consul agent installed on almost every server• First use: Dark Launching

Consul at Hootsuite

Page 5: Dark launching with Consul at Hootsuite - Bill Monkman

• AKA Feature Flagging, Feature Toggle, etc.• Allow dynamic control of your systems in real time• Used extensively at Facebook, Etsy, Flickr, others• Integrated with all the languages we use, both front and back-end• Very powerful tool for continuous delivery• Key to engineers at HS pushing code quickly and confidently• Allowed even other departments to control the system (Support,

Marketing)

Dark Launching

Page 6: Dark launching with Consul at Hootsuite - Bill Monkman

Dark Launching

Page 7: Dark launching with Consul at Hootsuite - Bill Monkman

Various restriction types:• boolean• percentage_static• percentage_random• user_list• organization_list• plan_code• language• webserver• etc.

Dark Launching

Page 8: Dark launching with Consul at Hootsuite - Bill Monkman

Use Cases

Typical

Push new code then:

● Dark launch to yourself or your team to test● Launch to the whole Hootsuite organization● 10% of all users● Watch graphs● 50%● 100%● Simple means of rollback if necessary

Page 9: Dark launching with Consul at Hootsuite - Bill Monkman

Use Cases

Migration

● Controlled migration to new services● Phased rollouts● Allowing beta group of users to try new features ahead of full

release

Page 10: Dark launching with Consul at Hootsuite - Bill Monkman

Use Cases

Load Testing

● When creating a new feature or service, send partial traffic to it, slowly ramp up

● Shadow reads/writes

Page 11: Dark launching with Consul at Hootsuite - Bill Monkman

Use Cases

Security / Protection

● “Kill twitter streams” flag● Attack mitigation

Page 12: Dark launching with Consul at Hootsuite - Bill Monkman

Use Cases

A/B Testing

● Test a feature to half the user base to gauge impact/adoption● Try to limit it to simple tests. Anything more complex needs a real

A/B framework

Page 13: Dark launching with Consul at Hootsuite - Bill Monkman

Wrap code in a dark launch block

Newly added flags will be automatically registered in the KV store the first time the code executes (with some stampede protection)

Dark Launching at Hootsuite

Page 14: Dark launching with Consul at Hootsuite - Bill Monkman

Managed via a web interface

(screenshot)

Dark Launching at Hootsuite

Page 15: Dark launching with Consul at Hootsuite - Bill Monkman

Managed via a web interface

(screenshot)

Dark Launching at Hootsuite

Page 16: Dark launching with Consul at Hootsuite - Bill Monkman

• Has become core to our continuous delivery workflow• Changed the way we use source control• Branching in production• Comes with some associated costs - cleanup / complexity

Dark Launching at Hootsuite

Page 17: Dark launching with Consul at Hootsuite - Bill Monkman

Web Server

Memcached

Web ServerInitial implementation

Dark Launching at HootsuiteWeb Server

Memcached

PHP-FPMPHP-FPMPHP-FPM

MemcachedMySQL

Page 18: Dark launching with Consul at Hootsuite - Bill Monkman

Problems with the old way

● As Dark Launching became important to our process, usage skyrocketed● Initial implementation with Mysql and Memcached ran into various issues

○ Hot cache keys○ Too tied in to our core dashboard ○ Not suitable for a distributed system (move to microservices)

● Outages!

Page 19: Dark launching with Consul at Hootsuite - Bill Monkman

Enter Consul

● Fans of Hashicorp products already● Saw potential for a “push” based solution to dark launch management● Wanted to explore it for other uses, this was a useful test ground● Evaluated a few tools, and though Consul was fairly bleeding-edge, we

liked the feature set and direction of it and had faith in the team behind it.● Based on well known algorithms/protocols (RAFT and SWIM)● Started experimenting with a small-scale deployment

Page 20: Dark launching with Consul at Hootsuite - Bill Monkman

Implementation

Base data stored in Consul KV store (with metadata in MongoDB)

Page 21: Dark launching with Consul at Hootsuite - Bill Monkman

Implementation

Watch added using Ansible, baked into image

Page 22: Dark launching with Consul at Hootsuite - Bill Monkman

Implementation (PHP)

● Handler that receives all KV data for a project● Writes out a PHP syntax config file with all data as an array● Hits webserver on localhost to clear APC cache (in-memory cache)● PHP code then checks cache, reloads from file if missing and does a KV lookup on

the array of dark launch data● If the checked flag does not exist in the data, communicate with the local consul

agent to add it.

Page 23: Dark launching with Consul at Hootsuite - Bill Monkman

Implementation (PHP)<?php

$dlCodes =

array (

'ACCOUNT_CURRENCY_TOGGLE' => array (

'value' => 0,

'restriction' => 'boolean',

'isAvailableToJs' => 0,

'createdDate' => '2015-09-28 00:12:34',

),

...

);

Page 24: Dark launching with Consul at Hootsuite - Bill Monkman

Web ServerWeb Server

Modifying a flagWeb Server

PHP-FPMPHP-FPMPHP-FPM

Consul Agent

Consul ServerConsul ServerConsul Server

DL Config

1 2

34

5

Consul Agent

Page 25: Dark launching with Consul at Hootsuite - Bill Monkman

Web ServerWeb Server

Creating a flagWeb Server

PHP-FPMPHP-FPMPHP-FPM

Consul Agent

Consul ServerConsul ServerConsul Server

DL Config

4 3

2

1

Consul Agent

Page 26: Dark launching with Consul at Hootsuite - Bill Monkman

Implementation (Scala)

● Handler that receives all KV data for a project● Writes out a Typesafe HOCON syntax config file with all data as a list● Uses inotify to watch for changes to the file● Scala code asks the actor for data for a specific dark launch code● Uses an Akka Agent (a construct which just manages state)

Page 27: Dark launching with Consul at Hootsuite - Bill Monkman

Implementation (Containers)

● We use Mesos / Marathon to schedule long-running services written in Scala and Go

● Similar to previous implementations.● Consul runs on the mesos slave host, writes all service dark launch data to

disk● Shared between all containers on the host

Page 28: Dark launching with Consul at Hootsuite - Bill Monkman

Problems

● Multi-DC setup was hampered until Consul 0.5.1 due to lack of distinct LAN/WAN advertise addresses

● Atomicity - Convergence is slower than atomic memcached change, though it’s not a problem for our usage of dark launching (typical convergence is within 1 second)

Page 29: Dark launching with Consul at Hootsuite - Bill Monkman

Convergence

1 second

Page 30: Dark launching with Consul at Hootsuite - Bill Monkman

Lessons Learned

● Enable ACLs early, plan your usage of ACLs● Put enough thought into your KV store structure● You may need to bribe your security team to convince them that having bi-

directional communication between all nodes on specific ports is okay● It’s important to understand Consul’s outage recovery process and

document what to do in the unlikely event that all servers fail.● Key prefix type events will be delivered even to nodes that were down at

the time of the event

Page 31: Dark launching with Consul at Hootsuite - Bill Monkman

Conclusions

● Consul worked well for us right from the start (~0.4.0)● Making an existing, valuable system better was a great way to introduce it to the

company, making its adoption much more smooth● Using it for many other projects now

○ Nginx LB configuration based on auto-scaling web servers○ Service discovery for seeding Akka Cluster○ Distributed locking for various purposes○ Microservice Discovery and routing system (Skyline)

● Seamless upgrade process

Page 32: Dark launching with Consul at Hootsuite - Bill Monkman

Conclusions

● Increased stability and decreased load on Memcached / MySQL● Since data is now pushed rather than pulled, the system can still read dark launch

data independently of the state of the data store.● Now usable in all DCs, projects and environments● Shared state allows us to coordinate changes between microservices

Page 33: Dark launching with Consul at Hootsuite - Bill Monkman

Thank you!

[email protected]@bmonkman

Bill Monkman

http://code.hootsuite.com