Fault tolerance made easy
-
Upload
uwe-friedrichsen -
Category
Technology
-
view
3.731 -
download
1
description
Transcript of Fault tolerance made easy
![Page 1: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/1.jpg)
Fault tolerance made easy Patterns for fault tolerance implemented surprisingly easy
Uwe Friedrichsen, codecentric AG, 2013-2014
![Page 2: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/2.jpg)
@ufried Uwe Friedrichsen | [email protected] | http://slideshare.net/ufried | http://ufried.tumblr.com
![Page 3: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/3.jpg)
It‘s all about production!
![Page 4: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/4.jpg)
Production
Availability
Resilience
Fault Tolerance
![Page 5: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/5.jpg)
Your web server doesn‘t look good …
![Page 6: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/6.jpg)
Pattern #1
Timeouts
![Page 7: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/7.jpg)
Timeouts (1) // Basics myObject.wait(); // Do not use this by default myObject.wait(TIMEOUT); // Better use this // Some more basics myThread.join(); // Do not use this by default myThread.join(TIMEOUT); // Better use this
![Page 8: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/8.jpg)
Timeouts (2) // Using the Java concurrent library Callable<MyActionResult> myAction = <My Blocking Action> ExecutorService executor = Executors.newSingleThreadExecutor(); Future<MyActionResult> future = executor.submit(myAction); MyActionResult result = null; try { result = future.get(); // Do not use this by default result = future.get(TIMEOUT, TIMEUNIT); // Better use this } catch (TimeoutException e) { // Only thrown if timeouts are used ... } catch (...) { ... }
![Page 9: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/9.jpg)
Timeouts (3) // Using Guava SimpleTimeLimiter Callable<MyActionResult> myAction = <My Blocking Action> SimpleTimeLimiter limiter = new SimpleTimeLimiter(); MyActionResult result = null; try { result = limiter.callWithTimeout(myAction, TIMEOUT, TIMEUNIT, false); } catch (UncheckedTimeoutException e) { ... } catch (...) { ... }
![Page 10: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/10.jpg)
Determining Timeout Duration Configurable Timeouts Self-Adapting Timeouts Timeouts in JavaEE Containers
![Page 11: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/11.jpg)
Pattern #2
Circuit Breaker
![Page 12: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/12.jpg)
Circuit Breaker (1)
Client Resource Circuit Breaker
Request
Resource unavailable
Resource available
Closed Open
Half-Open
Lifecycle
![Page 13: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/13.jpg)
Circuit Breaker (2)
Closed on call / pass through call succeeds / reset count call fails / count failure threshold reached / trip breaker
Open on call / fail on timeout / attempt reset
trip breaker
Half-Open on call / pass through call succeeds / reset call fails / trip breaker
trip breaker attempt reset
reset
Source: M. Nygard, „Release It!“
![Page 14: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/14.jpg)
Circuit Breaker (3) public class CircuitBreaker implements MyResource { public enum State { CLOSED, OPEN, HALF_OPEN } final MyResource resource; State state; int counter; long tripTime; public CircuitBreaker(MyResource r) { resource = r; state = CLOSED; counter = 0; tripTime = 0L; } ...
![Page 15: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/15.jpg)
Circuit Breaker (4) ... public Result access(...) { // resource access Result r = null; if (state == OPEN) { checkTimeout(); throw new ResourceUnavailableException(); } try { r = r.access(...); // should use timeout } catch (Exception e) { fail(); throw e; } success(); return r; } ...
![Page 16: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/16.jpg)
Circuit Breaker (5) ... private void success() { reset(); } private void fail() { counter++; if (counter > THRESHOLD) { tripBreaker(); } } private void reset() { state = CLOSED; counter = 0; } ...
![Page 17: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/17.jpg)
Circuit Breaker (6) ... private void tripBreaker() { state = OPEN; tripTime = System.currentTimeMillis(); } private void checkTimeout() { if ((System.currentTimeMillis - tripTime) > TIMEOUT) { state = HALF_OPEN; counter = THRESHOLD; } } public State getState() return state; } }
![Page 18: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/18.jpg)
Thread-Safe Circuit Breaker Failure Types Tuning Circuit Breakers Available Implementations
![Page 19: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/19.jpg)
Pattern #3
Fail Fast
![Page 20: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/20.jpg)
Fail Fast (1)
Client Resources Expensive Action
Request Uses
![Page 21: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/21.jpg)
Fail Fast (2)
Client Resources
Expensive Action
Request
Fail Fast Guard
Uses
Check availability
Forward
![Page 22: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/22.jpg)
Fail Fast (3) public class FailFastGuard { private FailFastGuard() {} public static void checkResources(Set<CircuitBreaker> resources) { for (CircuitBreaker r : resources) { if (r.getState() != CircuitBreaker.CLOSED) { throw new ResourceUnavailableException(r); } } } }
![Page 23: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/23.jpg)
Fail Fast (4) public class MyService { Set<CircuitBreaker> requiredResources; // Initialize resources ... public Result myExpensiveAction(...) { FailFastGuard.checkResources(requiredResources); // Execute core action ... } }
![Page 24: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/24.jpg)
The dreaded SiteTooSuccessfulException …
![Page 25: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/25.jpg)
Pattern #4
Shed Load
![Page 26: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/26.jpg)
Shed Load (1)
Clients Server
Too many Requests
![Page 27: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/27.jpg)
Shed Load (2)
Server
Too many Requests
Gate Keeper
Monitor
Requests
Request Load Data Monitor Load
Shedded Requests
Clients
![Page 28: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/28.jpg)
Shed Load (3) public class ShedLoadFilter implements Filter { Random random; public void init(FilterConfig fc) throws ServletException { random = new Random(System.currentTimeMillis()); } public void destroy() { random = null; } ...
![Page 29: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/29.jpg)
Shed Load (4) ... public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws java.io.IOException, ServletException { int load = getLoad(); if (shouldShed(load)) { HttpServletResponse res = (HttpServletResponse)response; res.setIntHeader("Retry-After", RECOMMENDATION); res.sendError(HttpServletResponse.SC_SERVICE_UNAVAILABLE); return; } chain.doFilter(request, response); } ...
![Page 30: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/30.jpg)
Shed Load (5) ... private boolean shouldShed(int load) { // Example implementation if (load < THRESHOLD) { return false; } double shedBoundary = ((double)(load - THRESHOLD))/ ((double)(MAX_LOAD - THRESHOLD)); return random.nextDouble() < shedBoundary; } }
![Page 31: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/31.jpg)
Shed Load (6)
![Page 32: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/32.jpg)
Shed Load (7)
![Page 33: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/33.jpg)
Shedding Strategy Retrieving Load Tuning Load Shedders Alternative Strategies
![Page 34: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/34.jpg)
Pattern #5
Deferrable Work
![Page 35: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/35.jpg)
Deferrable Work (1)
Client
Requests
Request Processing
Resources
Use
Routine Work
Use
![Page 36: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/36.jpg)
OVERLOAD
Deferrable Work (2)
WithoutDeferrable Work
100%
OVERLOAD
With Deferrable Work
100%
Request Processing
Routine Work
![Page 37: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/37.jpg)
// Do or wait variant ProcessingState state = initBatch(); while(!state.done()) { int load = getLoad(); if (load > THRESHOLD) { waitFixedDuration(); } else { state = processNext(state); } } void waitFixedDuration() { Thread.sleep(DELAY); // try-catch left out for better readability }
Deferrable Work (3)
![Page 38: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/38.jpg)
// Adaptive load variant ProcessingState state = initBatch(); while(!state.done()) { waitLoadBased(); state = processNext(state); } void waitLoadBased() { int load = getLoad(); long delay = calcDelay(load); Thread.sleep(delay); // try-catch left out for better readability } long calcDelay(int load) { // Simple example implementation if (load < THRESHOLD) { return 0L; } return (load – THRESHOLD) * DELAY_FACTOR; }
Deferrable Work (4)
![Page 39: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/39.jpg)
Delay Strategy Retrieving Load Tuning Deferrable Work
![Page 40: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/40.jpg)
I can hardly hear you …
![Page 41: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/41.jpg)
Pattern #6
Leaky Bucket
![Page 42: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/42.jpg)
Leaky Bucket (1)
Leaky Bucket
Fill
Problem occured
Periodically
Leak
Error Handling
Overflowed?
![Page 43: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/43.jpg)
public class LeakyBucket { // Very simple implementation final private int capacity; private int level; private boolean overflow; public LeakyBucket(int capacity) { this.capacity = capacity; drain(); } public void drain () { this.level = 0; this.overflow = false; } ...
Leaky Bucket (2)
![Page 44: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/44.jpg)
... public void fill() { level++; if (level > capacity) { overflow = true; } } public void leak() { level--; if (level < 0) { level = 0; } } public boolean overflowed() { return overflow; } }
Leaky Bucket (3)
![Page 45: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/45.jpg)
Thread-Safe Leaky Bucket Leaking strategies Tuning Leaky Bucket Available Implementations
![Page 46: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/46.jpg)
Pattern #7
Limited Retries
![Page 47: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/47.jpg)
// doAction returns true if successful, false otherwise // General pattern boolean success = false int tries = 0; while (!success && (tries < MAX_TRIES)) { success = doAction(...); tries++; } // Alternative one-retry-only variant success = doAction(...) || doAction(...);
Limited Retries (1)
![Page 48: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/48.jpg)
Idempotent Actions Closures / Lambdas Tuning Retries
![Page 49: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/49.jpg)
More Patterns • Complete Parameter Checking • Marked Data • Routine Audits
![Page 50: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/50.jpg)
Further reading 1. Michael T. Nygard, Release It!,
Pragmatic Bookshelf, 2007
2. Robert S. Hanmer, Patterns for Fault Tolerant Software, Wiley, 2007
3. James Hamilton, On Designing and Deploying Internet-Scale Services,21st LISA Conference 2007
4. Andrew Tanenbaum, Marten van Steen, Distributed Systems – Principles and Paradigms, Prentice Hall, 2nd Edition, 2006
![Page 51: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/51.jpg)
It‘s all about production!
![Page 52: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/52.jpg)
@ufried Uwe Friedrichsen | [email protected] | http://slideshare.net/ufried | http://ufried.tumblr.com
![Page 53: Fault tolerance made easy](https://reader034.fdocuments.net/reader034/viewer/2022052505/554fb08eb4c905ad218b5205/html5/thumbnails/53.jpg)