Microreboot
-
Upload
riyad-parvez -
Category
Technology
-
view
65 -
download
0
Transcript of Microreboot
![Page 1: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/1.jpg)
Microreboot: A Cheap Technique for Recovery
George Candea, Shinichi Kawamoto, Yuichi Fujiki, Greg Friedman, Armando Fox
Presented By Riyad
![Page 2: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/2.jpg)
Motivation● Production software has many transient
bugs● Rebooting can “cure” failures caused by
transient bugs● Rebooting is expensive, causes nontrivial
service disruption and downtime● Microreboot (µRB)!!!
![Page 3: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/3.jpg)
Microreboot (µRB)● Reboot individual fine-grained component● Similar as application reboot
○ Magnitudes faster recovery○ Few failed requests during recovery○ Less lost works due to recovery
● Rejuvenate the system without shutting it down● System needs to be designed microrebootable
from ground up.
![Page 4: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/4.jpg)
µRB Goals● Reduce system recovery time
● Minimize failure’s disruption to system and users
● Preserve in-memory data
![Page 5: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/5.jpg)
Crash Only System Design● Don’t try to take complex recovery process● Upon detecting failures crash gracefully● Keep state in stable storage● Ensure consistency of state and data before
crashing● Recover from failure by rebooting
application
![Page 6: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/6.jpg)
µRB System Design● Fine-grain components
○ Component-level µRB and fast initialization○ Huge components lower benefit of µRB
● State segregation○ Prevent reading inconsistent state during recovery○ Separates data recovery and application recovery
● Decoupling○ Lower disruption across system during recovery
![Page 7: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/7.jpg)
µRB System Design● Retryable requests
○ Minimize number of failures during recovery● Leases
○ Improve the reliability of cleaning up after μRBs, otherwise may leak resources
![Page 8: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/8.jpg)
Research Questions● Are μRBs effective in recovering from
failures?● Are μRBs any better than JVM restarts?● Are μRBs useful in clusters?● Do μRB-friendly architectures incur a
performance overhead?
![Page 9: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/9.jpg)
Experiment● J2EE Application● JBoss Server (modified to support µRB)● eBid, a crash only application based on
RUBiS● MySQL for persistent state● FastS/SSM for session state
![Page 10: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/10.jpg)
Injected Faults● Deadlocks● Infinite loops ● Memory leaks ● Transient Java exceptions ● Corrupted data structures● Out of Memory error ● Low-level faults underneath the JVM layer
![Page 11: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/11.jpg)
Failure Detection● Network-level error or an HTTP 4xx or 5xx
error or keywords indicative of failure (e.g., “exception,” “failed,” “error”).
● Submits in parallel each request to fault injected application, good application. Discrepancy between two results is “failure”.
![Page 12: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/12.jpg)
Recovery Group● EJBs might maintain references to other
EJBs ○ Cannot be microrebooted individually
● Whenever an EJB is to be microrebooted, microreboot the transitive closure of its inter-EJB dependents as a group.
![Page 13: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/13.jpg)
Recovery Manager● Micorereboots -
○ EJBs (Recovery Group)○ the WAR○ All of eBid○ The JVM that runs JBoss○ The operating system.
● Reboots component related to failed URL.● Tries the cheapest recovery first
![Page 14: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/14.jpg)
Research Questions● Are μRBs effective in recovering from
failures?● Are μRBs any better than JVM restarts?● Are μRBs useful in clusters?● Do μRB-friendly architectures incur a
performance overhead?
![Page 15: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/15.jpg)
μRB Failure Recovery
![Page 16: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/16.jpg)
Failed Requests
![Page 17: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/17.jpg)
Research Questions● Are μRBs effective in recovering from
failures?● Are μRBs any better than JVM restarts?● Are μRBs useful in clusters?● Do μRB-friendly architectures incur a
performance overhead?
![Page 18: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/18.jpg)
Failure + Recovery
![Page 19: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/19.jpg)
Client-perceived Availability
![Page 20: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/20.jpg)
Research Questions● Are μRBs effective in recovering from
failures?● Are μRBs any better than JVM restarts?● Are μRBs useful in clusters?● Do μRB-friendly architectures incur a
performance overhead?
![Page 21: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/21.jpg)
μRB in Cluster
![Page 22: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/22.jpg)
Client-perceived Availability● Response latency more than 8 seconds, user
get distracted
![Page 23: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/23.jpg)
Research Questions● Are μRBs effective in recovering from
failures?● Are μRBs any better than JVM restarts?● Are μRBs useful in clusters?● Do μRB-friendly architectures incur a
performance overhead?
![Page 24: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/24.jpg)
Performance Impact
![Page 25: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/25.jpg)
Limitations● µRB can leave system inconsistent if updates
aren’t atomic ● µRB can leak resources if resources aren’t
allocated through application server (Java Native Interface)
● Can delay full reboot when it’s the only way
![Page 26: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/26.jpg)
Limitations● Recovers from only transient bugs● Considerable design effort needed● Not suitable for
○ Existing monolithic applications○ (C/C++) don’t have such JavaEE like framework
● Experiment on only one recovery group closure with 5 EJBs.
![Page 27: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/27.jpg)
Microreboot● Cheap alternative of full system recovery
● Restart components “with a clean state”
● Reduces recovery time, failed requests, functional disruptions
● Only suitable for application with fine-grained components
![Page 28: Microreboot](https://reader033.fdocuments.net/reader033/viewer/2022052900/556263d5d8b42aab1a8b4b03/html5/thumbnails/28.jpg)
Discussion● Do μRBs lead to overengineering -
AbstractAbstractObjectFactory● Is modifications needed for μRB worth for
existing monolithic applications?● Possible to have a recovery technique for
CPU-bound application?