ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance...
-
Upload
shanna-barrett -
Category
Documents
-
view
217 -
download
1
Transcript of ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance...
ATIF MEHMOOD MALIKKASHIF SIDDIQUE
Improving dependability of Cloud Computing with Fault Tolerance and High Availability
Dependability
In Systems Engineering, dependability is a measure of system’s availability, reliability and maintainability
It is ability of system to deliver services that can be justifiably trusted
Often considered as third axis of system quality
Dependability ontology
Dependability challenges in cloud computing
Lack of trust in shared virtualized infrastructures
Management of cloud computing service by a single provider or vendor is in fact a single point of failure
APIs are proprietaryVirtualization increases complexityHigher resource utilization Common mode outagesMultiple administrative domainsLegal and privacy implications
Threats to dependability
Faults, Errors and FailuresA fault in a system is a deviation from its
expected behaviorFaults may arise due to hardware failure,
software bugs, user error and network problems
Fault Tolerance
Ability of a system to continue providing services to its user in case of failure of some of its components
Faults can be introduced at: Application level Virtual machine level Physical resource level
Fault Tolerance
Application Fault Tolerance: Application health is continuously monitored by
special software components called sensors Sensor may trigger specific procedures to start
repairing process of an application that is malfunctioning
Example : Vmware App HA
Fault Tolerance
Virtual Machine Fault Tolerance: Can be detected by both customer and service
provider Customers can detect virtual machine failure by
monitoring its state with the help of sensors deployed in the cloud
Cloud service provider can provide VM fault tolerance by installing a single sensor per physical server that monitors all virtual machines hosted on that server
Fault Tolerance
Physical Machine Fault Tolerance: Can be implemented by cloud service provider by
monitoring state of physical server machines and in case of hardware failure, resume all virtual machines on new server
Fault Tolerance Techniques
Reactive Fault Tolerance In case of failure, these techniques reduce the effect
of failure on application execution
Proactive Fault Tolerance These techniques work by predicting faults and
proactively replacing the suspected components with working ones
Reactive Fault Tolerance
Check pointingReplicationJob migrationSGuardRetryTask resubmissionUser defined exception handlingRescue workflow
Proactive Fault Tolerance
Software Rejuvenation Self-HealingPre-emptive migration
Tools for implementing fault tolerance
HA proxy: Open source high availability and load balancing
solution for TCP and HTTP based applications De facto standard open source load balancer
ASSUE Automatic Software Self-healing Using REscue points Uses rescue points to detect, tolerate and recover
from software faults
Tools for implementing fault tolerance
SHelp: Upgraded version of ASSURE Uses weighted values to rescue points and error
virtualization techniques so that applications bypass the faulty path
Tools for implementing fault tolerance
High Availability
Can be achieved by having redundant failover servers
Can be achieved at application level, infrastructure level, data center level
Types of Virtual Machines High Availability
Load sharing Both replicas are active Service requests are equally distributed between both
of themUpdated dedicated hot standby
Two identical virtual machines execute on two different physical servers
Both virtual machines are fully synchronized with state information
VMware Fault Tolerance is an example
Types of Virtual Machines High Availability
Not dedicated hot standby Standby VM running in parallel with active VM Standby is not fully synchronized VMware HA and Symantec’s Veritas Cluster Server
are examples
Types of Virtual Machines High Availability
Shared hot standby Uses check pointing mechanism to update the standby
replica Requires fewer resources for standby replica
Cold standby Standby replica is powered off and lies on storage
media Brought to service when active VM fails Useful for situations where availability requirements
are low
Conclusion
Dependability is one of the major challenges in cloud computing
Adoption of cloud computing can be increased by addressing the dependability challenges