Progressive Delivery Patterns & Benefits · 2020-02-27 · How You Roll Matters @davekarow Approach...
Transcript of Progressive Delivery Patterns & Benefits · 2020-02-27 · How You Roll Matters @davekarow Approach...
Progressive Delivery Patterns & Benefits
Dave Karow Continuous Delivery Evangelist
Split.io
@davekarow
The future is already here — it's just not very evenly distributed.
William Gibson
@davekarow
What is Progressive Delivery?
Patterns In The Wild
Summing It Up
QR Code
@davekarow
What is Progressive Delivery and what are the potential benefits?
@davekarow @davekarow
Carlos Sanchez (Sr. Cloud Software Engineer @ Adobe)
https://blog.csanchez.org/2019/01/22/progressive-delivery-in-kubernetes-blue-green-and-canary-deployments/
Progressive Delivery is the next step after Continuous Delivery, where new versions are deployed to a subset of users and are evaluated in terms of correctness and performance before rolling them to the totality of the users and rolled back if not matching some key metrics.
@davekarow
Potential Benefits of Progressive Delivery
Avoid Downtime
Limit the Blast Radius
Limit WIP / Achieve Flow
Learn During the Process
@davekarow
How You Roll Matters Approach
Benefits Blue/Green
Deployment
Canary Release Feature Flag
Rollout
Feature Delivery
Platform
Avoid
Downtime
Limit The
Blast Radius
Limit WIP /
Achieve Flow
Learn During
The Process
https://www.split.io/blog/learn-the-four-shades-of-progressive-delivery/
Harvey Balls by Sschulte at English Wikipedia [CC BY-SA (https://creativecommons.org/licenses/by-sa/3.0)] @davekarow
How You Roll Matters
@davekarow
Approach
Benefits Blue/Green
Deployment
Canary Release Feature Flag
Rollout
Feature Delivery
Platform
Avoid
Downtime
Limit The
Blast Radius
Limit WIP /
Achieve Flow
Learn During
The Process
@davekarow
https://www.split.io/blog/learn-the-four-shades-of-progressive-delivery/
Harvey Balls by Sschulte at English Wikipedia [CC BY-SA (https://creativecommons.org/licenses/by-sa/3.0)]
How You Roll Matters
@davekarow
Approach
Benefits Blue/Green
Deployment
Canary Release Feature Flag
Rollout
Feature Delivery
Platform
Avoid
Downtime
Limit The
Blast Radius
Limit WIP /
Achieve Flow
Learn During
The Process
@davekarow
https://www.split.io/blog/learn-the-four-shades-of-progressive-delivery/
Harvey Balls by Sschulte at English Wikipedia [CC BY-SA (https://creativecommons.org/licenses/by-sa/3.0)]
How You Roll Matters
@davekarow
Approach
Benefits Blue/Green
Deployment
Canary Release Feature Flag
Rollout
Feature Delivery
Platform
Avoid
Downtime
Limit The
Blast Radius
Limit WIP /
Achieve Flow
Learn During
The Process
@davekarow
https://www.split.io/blog/learn-the-four-shades-of-progressive-delivery/
Harvey Balls by Sschulte at English Wikipedia [CC BY-SA (https://creativecommons.org/licenses/by-sa/3.0)]
How You Roll Matters
@davekarow
Approach
Benefits Blue/Green
Deployment
Canary Release Feature Flag
Rollout
Feature Delivery
Platform
Avoid
Downtime
Limit The
Blast Radius
Limit WIP /
Achieve Flow
Learn During
The Process
@davekarow
https://www.split.io/blog/learn-the-four-shades-of-progressive-delivery/
Harvey Balls by Sschulte at English Wikipedia [CC BY-SA (https://creativecommons.org/licenses/by-sa/3.0)]
Feature Delivery Platform Capabilities
@davekarow
Feature Delivery Platform Capabilities
@davekarow
Feature Delivery Platform Capabilities
@davekarow
Let’s Venture Into the Wild!
Bruce Turner from AustinTX
https://www.flickr.com/people/66994844@N00
[CC BY (https://creativecommons.org/licenses/by/2.0)]
@davekarow
Booking.com
@davekarow
Booking.com: a well-documented example of the pattern
@davekarow
https://medium.com/booking-com-development/moving-fast-breaking-things-and-fixing-them-as-quickly-as-possible-a6c16c5a1185
@davekarow
@davekarow @davekarow
@davekarow
Feature Flag Managing exposure like a dimmer or light board
0%
10%
20%
50%
100%
@davekarow
Booking.com’s experience with Manage: “asynchronous feature release” ● Deploying has no impact on user experience ● Deploy more frequently with less risk to business and users ● The big win is Agility
@davekarow
@davekarow
Monitoring the needle in the haystack If you roll out a change to just 5% of your population
...and 20% (1 in 5) of the exposed users get an error,
that’s a HUGE problem!
But, what % of your total user population is getting that error?
1%
@davekarow
@davekarow
@davekarow
Booking.com’s experience with Monitor: “Experimentation as a safety net” ● Each new feature is wrapped in its own experiment ● Allows: monitoring and stopping of individual changes ● The developer or team responsible for the feature can enable
and disable it... ● ...regardless of who deployed the new code that contained it.
@davekarow
Booking.com safety net automated: “circuit breaker” ● Active for the first three minutes of feature release ● Severe degradation → automatic abort of that feature ● Acceptable divergence from core value of local ownership
and responsibility where it’s a “no brainer” that users are being negatively impacted
@davekarow
Booking.com Circuit Breaker (Automatic Stopping)
@davekarow
Guardrail metrics
@davekarow
@davekarow
Booking.com’s experience with Experimentation: A way to validate ideas ● Measure (in a controlled manner) the impact changes have
on user behaviour ● Every change has a clear objective (explicitly stated
hypothesis on how it will improve user experience) ● Measuring allows validation that desired outcome is achieved
@davekarow
Feature Flag Experimentation example
@davekarow
50% 50%
The quicker we manage to validate new ideas the less time is wasted on things that don’t work and the more time is left to work on things that make a difference.
Booking’s Big Takeaway
@davekarow
LinkedIn XLNT/LIX
@davekarow
● Built a targeting engine that could “split” traffic between existing and new code
● Impact analysis was by hand only (and took ~2 weeks), so nobody did it :-(
Essentially just feature flags without automated feedback
LinkedIn early days: a modest start for XLNT
@davekarow
LinkedIn XLNT Today A controlled release with standardized KPI calculation launched very 5 minutes
100 releases per day
6000 metrics that can be “followed” by any stakeholder: “What releases are moving the numbers I care about?”
@davekarow
Lessons learned at LinkedIn ● Build for scale: no more coordinating over email ● Make it trustworthy: targeting and analysis must be rock solid ● Design for diverse teams, not just data scientists
Ya Xu Head of Data Science, LinkedIn Decisions Conference 10/2/2018
@davekarow
Step 1 Feature flags
Step 2 Sensors Correlation
Step 3 Stats Engine Causation
“Holy Grail” Mgmt console System of record Alerting
Increasing functionality & company adoption
Co
st t
o b
uild
an
d m
ain
tain
Summing it up: The patterns are proven
@davekarow
Maturity hasn’t come easily or fast for the pioneers
Split implements these patterns as a service: split.io
@davekarow
@davekarow
@davekarow
@davekarow
Checklists to DIY or Buy ● Foundational Capabilities You’ll Need ● How-To’s: Monitor & Experiment
@davekarow
Decouple deploy from release
❏ Allow changes of exposure w/o new deploy or rollback
❏ Support targeting by UserID, attribute (population), random hash
Foundational Capability #1
@davekarow
Automate a reliable and consistent way to answer,
“Who have we exposed this to so far?”
❏ Record who hit a flag, which way they were sent, and why
❏ Confirm that targeting is working as intended
❏ Confirm that expected traffic levels are reached
Foundational Capability #2
@davekarow
Automate a reliable and consistent way to answer,
“How is it going for them (and us)?”
❏ Automate comparison of system health (errors, latency, etc…)
❏ Automate comparison of user behavior (business outcomes)
❏ Make it easy to include “Guardrail Metrics” in comparisons to
avoid the local optimization trap
Foundational Capability #3
@davekarow
Limit the blast radius of unexpected consequences so you can replace the
“big bang” release night with more frequent, less stressful rollouts.
Build on the foundational capabilities to:
❏ Ramp in stages, starting with dev team, then dogfooding, then % of public
❏ Monitor at feature rollout level, not just globally (vivid facts vs faint signals)
❏ Alert at the team level (build it/own it)
❏ Kill if severe degradation detected (stop the pain now, triage later)
❏ Continue to ramp up healthy features while “sick” are ramped down or killed
How-To: Release Faster With Less Risk
@davekarow
Focus precious engineering cycles on “what works” with experimentation,
making statistically rigorous observations about what moves KPIs (and what
doesn’t).
Build on the foundational capabilities to:
❏ Target an experiment to a specific segment of users
❏ Ensure random, deterministic, persistent allocation to A/B/n variants
❏ Ingest metrics chosen before the experiment starts (not cherry-picked after)
❏ Compute statistical significance before proclaiming winners
❏ Design for diverse audiences, not just data scientists (buy-in needed to stick)
How-To: Engineer for Impact (Not Output)
@davekarow
Whatever you are, try to be a good one.
William Makepeace Thackeray
@davekarow
Progressive Delivery Resources tinyurl.com/pd4u2020 (just posted to twitter as well)
@davekarow