Chef at Etsy

54
Chef at Etsy

description

Slides from my "Chef at Etsy" talk at the London Chef Meetup on Thurs Oct 10th, 2014

Transcript of Chef at Etsy

Page 1: Chef at Etsy

Chef at Etsy

Page 2: Chef at Etsy

@jonlives

Jon Cowie

Sr Operations Engineer

Page 3: Chef at Etsy
Page 4: Chef at Etsy

30 Million Members

4

1 Million Active Shops

Page 5: Chef at Etsy

20 Million Items Listed

5

60 Million Monthly Unique Visitors

Page 6: Chef at Etsy

@jonlives

We Love Chef!

Page 7: Chef at Etsy

@jonlives

Absorb what is useful.

Discard what is useless.

Page 8: Chef at Etsy

@jonlives

“I am not smart enough to build an ontology … that

can encompass all the variations in infrastructure.

Nobody is, the world moves too fast.”

Page 9: Chef at Etsy

@jonlives

There is no magic pill.

Page 10: Chef at Etsy

@jonlives

You are the expert.

Page 11: Chef at Etsy

@jonlives

Chef at Etsy

• Chef Server 11.1.4

• ~2000 Nodes

• CentOS, some Mac OS X

Page 12: Chef at Etsy

@jonlives

Beginning of 2010 Today

Page 13: Chef at Etsy

@jonlives

Chef at Etsy

Page 14: Chef at Etsy

@jonlives

Evolution of Chef

Page 15: Chef at Etsy

@jonlives

2010: The Beginning

• ~250 Nodes (Ubuntu & CentOS

• The first cookbooks

• Out of the box workflow

Page 16: Chef at Etsy

@jonlives

2011: Growth

• ~400 Nodes (CentOS)

• Chef still pretty specialised knowledge

• Handlers added

Page 17: Chef at Etsy

@jonlives

2012: A big year

• ~800 Nodes (CentOS & MacOS X) • More in-house Chef expertise • Workflow tooling • Debugging tooling • Monitoring

Page 18: Chef at Etsy

@jonlives

2013: Chef at Etsy

• ~1500 Nodes • Workflow tooling enhancements • Feature flags in Chef • Chef performance - Chef 11 upgrade

Page 19: Chef at Etsy

@jonlives

2014: Chef at Etsy

• ~2000 nodes • Consolidation • CI with Chef • Omnibus • Work-in-Progress tooling

Page 20: Chef at Etsy

@jonlives

Patterns & Workflows

Page 21: Chef at Etsy

@jonlives

Cookbook Workflow

Page 22: Chef at Etsy

@jonlives

$> review -r jcowie --cc ops

Page 23: Chef at Etsy

@jonlives

knife-spork

• https://github.com/jonlives/knife-spork • Workflow tool • Helps multiple chefs avoid clashing • Visibility into changes • Plugins

Page 24: Chef at Etsy

@jonlives

knife-spork

• knife spork bump • knife spork upload • Test change

Page 25: Chef at Etsy

@jonlives

Test Change

• https://github.com/jonlives/knife-flip

• knife node flip foo.etsy.com testing

• knife role flip MyRole testing

Page 26: Chef at Etsy

@jonlives

Test Change

• https://github.com/mrtazz/knife-wip • Uses node tags <irccat> CHEF: bburry started work cent7 package bugfixing on deploy01.ny5.etsy.com

Page 27: Chef at Etsy

@jonlives

knife-spork

• knife spork bump • knife spork upload • Test change • knife spork promote --remote • git commit and push

Page 28: Chef at Etsy

@jonlives

Monitoring & Debugging

Page 29: Chef at Etsy

@jonlives

knife-spork & CI Job

<irccat> CHEF: Jon Cowie uploaded [email protected] <irccat> CHEF: Jon Cowie promoted [email protected] to production <snip> <irccat> Git PUSH -> Sysops/chef <snip> <Jenkins> Starting build #5649 for job chef-server-git-sync <Jenkins> Project chef-server-git-sync build #5649: SUCCESS in 2 min 36 sec: http://ci.etsycorp.com/job/chef-server-git-sync/5649/

Page 30: Chef at Etsy

@jonlives

IRC Handler<irccat> Chef run failed on officebackup01.office.etsy.com gist failed, see /var/log/chef/client.log on the host !

<irccat> Still Failing on dbnest01.ny4.etsy.com since 2 days ago https://github.etsycorp.com/gist/656d8914fbef5a6bd9aa

Page 31: Chef at Etsy

@jonlives

Lastrun Data

• https://github.com/jgoulah/knife-lastrun

• knife node lastrun foo.bar.com

Page 32: Chef at Etsy

@jonlives

Lastrun Data%  knife  node  lastrun  dbnest01.ny4.etsy.com  Status                  failed                                        Elapsed  Time          29.055892                                  Start  Time              2014-­‐10-­‐06  12:54:51  +0000  End  Time                  2014-­‐10-­‐06  12:55:20  +0000  !<snip>  !Exception  <snip>  Installed  package  backupd-­‐1.4-­‐1.365657d.el5.centos  is  newer  than  candidate  package  backupd-­‐1.2-­‐1.99ddb8e.el5  

Page 33: Chef at Etsy

@jonlives

Dashboards

Page 34: Chef at Etsy

@jonlives

Dashboards

Page 35: Chef at Etsy

@jonlives

Dashboards

Page 36: Chef at Etsy

@jonlives

Monitoring & Debugging

• https://github.com/etsy/chef-handlers • https://github.com/etsy/dashboard • https://github.com/jgoulah/knife-lastrun • https://github.com/bmarini/knife-inspect

Page 37: Chef at Etsy

@jonlives

Feature Flags

Page 38: Chef at Etsy

@jonlives

Downsides of Existing Approach

• Holding cookbook in testing is blocking • Accidental promotions • Testing env affects all cookbooks • “Upgrade” envs often used • How to make it more “Etsy”?

Page 39: Chef at Etsy

@jonlives

Page 40: Chef at Etsy

@jonlives

chef-whitelist

• https://github.com/etsy/chef-whitelist • Databag driven • Cookbook library • Feature flags!

Page 41: Chef at Etsy

@jonlives

chef-whitelist{ "id": "php-5-5-17", "patterns": [ "statsd*.ny5.etsy.com", "deploy*.ny5.etsy.com", <snip> ] }

Page 42: Chef at Etsy

@jonlives

chef-whitelist

if node.is_in_whitelist? "php-5-5-17" package "php-pecl-opcache" do action :remove end end

Page 43: Chef at Etsy

@jonlives

Configuration Data

Page 44: Chef at Etsy

@jonlives

Keep cookbooks:• Simple • Modular • Scalable • Maintainable

Page 45: Chef at Etsy

@jonlives

Environments

• Cookbook version constraints

Page 46: Chef at Etsy

@jonlives

Roles

• Group-level config • Syslog-ng • Iptables • Sudoers

Page 47: Chef at Etsy

@jonlives

Roles - iptables“firewall": { "ports": { "11211": { "subnet_group": "prod_subnets" }, <snip> } }

Page 48: Chef at Etsy

@jonlives

Roles - Syslog-ng"syslog": {

"web": {

"web_apache_access_log": {

"source": "/var/log/httpd/access_log",

"source_program_override": "APACHEACCESS: ",

"destination": "/data/syslog/current/web/access.log",

"destination_filters": [

"host('^(web0|dlweb)')",

"match('APACHEACCESS')"

]

}

}

Page 49: Chef at Etsy

@jonlives

Data Bags

• Global / Datacenter specific Config • Ganglia • Cobbler • VOIP

• Data Storage

Page 50: Chef at Etsy

@jonlives

Data Bags - Ganglia{

"id": "config_se5",

"grid_name": "EtsySE5",

"authority": "http://gangliase5.etsycorp.com",

"trusted_hosts": <snip>,

"groups": {

"Utilities": "239.2.11.71",

<snip>

}

<snip>

}

Page 51: Chef at Etsy

@jonlives

Data Bags - Cobbler{

"id": "config_corp",

"cobbler_server": "corpking02.corp.etsy.com",

"dns_servers": [ “10.x.x.x", “10.x.x.x" ],

"dhcp_ranges": {

"10.100.x.0": {

"routers": "10.x.x.1",

"mask": "255.255.255.0",

"range": "10.x.x.11 10.x.x.250"

}

}

}

Page 52: Chef at Etsy

@jonlives

Write cookbooks you’ll thank yourself for.

Page 53: Chef at Etsy

@jonlives

!

http://jonliv.es/book !

Discount Code: AUTHD !

40% off Print 50% off Digital

Page 54: Chef at Etsy

@jonlives

Thanks! Questions?

!

@jonlives / http://jonliv.es / [email protected]