Reliability on Web Services Pat Chan 31 Oct 2006.
-
date post
15-Jan-2016 -
Category
Documents
-
view
216 -
download
0
Transcript of Reliability on Web Services Pat Chan 31 Oct 2006.
Reliability on Web ServicesPat Chan
31 Oct 2006
Outline Introduction to Web Services Problem Statement Methodologies for Web Service Reliability New Reliable Web Service Paradigm Optimal Parameters Experimental Results and Discussion Conclusion Future Directions
Introduction Service-oriented computing is becoming a reality. The problems of service dependability, security and
timeliness are becoming critical. We propose experimental settings and offer a
roadmap to dependable Web services.
Problem Statement Fault-tolerant techniques
Replication Diversity
Replication is one of the efficient ways for providing reliable systems by time or space redundancy. Increasing the availability of distributed systems Key components are re-executed or replicated Protect against hardware malfunctions or transient system faults.
Another efficient technique is design diversity. By independently designing software systems or services with different programming
teams, Resort in defending against permanent software design faults.
We focus on the analysis of the replication techniques when applied to Web services.
A generic Web service system with spatial as well as temporal replication is proposed and investigated.
Road Map for Research Redundancy in time Redundancy in space
Sequentially Parallel Majority voting using N modular redundancy Diversified version of different services
Replication Manager
Web service selection algorithm
WatchDog
UDDI
Registry
WSDL
Web ServiceIIS
Application
Database
Web ServiceIIS
Application
Database
Web ServiceIIS
Application
Database
Client
Port
Application
Database
1. Create web services
2. Select primary web service (PWS)
3. Register
4. Look up
5. Get WSDL
6. Invoke web service
8.. Keep check the availability of all the web.
9. If Web service failed, update the list of availability of Web services
7. Update the WSDL
Proposed Paradigm
Round Robin
Web ServiceIIS
Application
Database
Web ServiceIIS
Application
Database
Web ServiceIIS
Application
Database
Web ServiceIIS
Application
Database
RM sends message to the Web Service
Update the Web service availability list
Do not get reply
Map the new address to the WSDL after RR
System Fail
Get reply
All Service failed
Work Flow of the Replication Manager
Experiments A series of experiments are designed and performed for
evaluating the reliability of the Web service, Spatial replication 0 0 0 0 1 1 1 1
Reboot 0 0 1 1 0 0 1 1
Retry 0 1 0 1 0 1 0 1
Spatial replication
Reboot
Retry
(1,1,1)
(0,1,1)
(1,0,0)
(0,0,0)
(1,0,1)
(1,1,0)
(0,1,0)
(0,0,1)
Parameters Computational load Communication Request frequency Load of the server Retry frequency Polling frequency Reboot time
Fault Mode Machine – reboot the server with probability Network – Network Fault Injection Tool (NFIT) Application – inject fault in the application
Failure definition: 5 retries are allowed. If there is still no correct result
from the web service after 5 retries, it is considered as a failure.
Experimental Setup Examine the computational to communication ratio Examine the request frequency to limit the load of
the server to 75% Fix the following parameters
Computational to communication ratio (e.g 10:1) Request frequency
Experimental Setup -- Overloadprime(2, 12000)
Communication time: Computational time
143:14 (10:1)
Request frequency 5 req/s (200ms)
Load 78.5%
Timeout period of retry 10s
Timeout for web service in RM 1s (web service specific)
Polling frequency 5 requests per min
Number of replicas 3
Max number of retries 5
Round Robin rate 1 s
Experiment Parameters Fault mode
Timing Calculation Temporary (fault probability: 0.01) Permanent (fault probability: 0.001)
Experiment time 30mins (9000 requests)
Measure: Number of failures Average response time (ms)
Experimental Result (failures / response time in ms)
Experiments Single server Single server with retry
Single server with reboot (continues no response for 3 requests)
Single server with retry and reboot
Spatial ReplicationRR
Hybrid approachRR+Retry
Hybrid approachRR+Reboot
All round approach RR spatial + Retry(5 times) + reboots
Normal case 0 / 157 0 / 156 0 / 156 0 / 155 0 / 155 0 / 156 0 / 156 0 / 157
Timing Temp 93 / 158 0 / 269 91 / 160 0 / 256 89 / 156 0 / 258 85 / 155 0 / 243
Calculation Temp
87 / 157 0 / 254 88 / 157 0 / 261 78 / 155 0 / 244 87 / 155 0 / 231
Timing Perm 8174 / -- 8362 / -- 2505 / -- 3 / 2938 7177 / -- 7284 / -- 76 / 157 0 / 172
Calculation Perm
8256 / -- 8581 / -- 2782 / -- 2 / 2872 7529 / -- 7425 / -- 69 / 158 0 / 176
Varying the parameters Number of tries Timeout period for retry in single server Timeout period for retry in our paradigm Polling frequency Number of replicas Load of server
Number of tries
Number of tries Number of failures in Temp
Number of failures in Perm
0 95 76
1 2 2
2 0 0
3 0 0
4 0 0
5 0 0
Timeout period for retry in single serverTimeout period for retry (s)
Number of failures in Temp
Number of failures in Perm
0 95 7265
2 2 7156
5 0 7314
6 0 6890
7 0 189
8 0 82
9 0 11
10 0 2
12 0 0
14 0 0
16 0 0
18 0 0
Timeout period for retry in single server
# of failure
Timeout period
Timeout period for retry in our paradigm
Timeout period for retry (s)
Number of failures in Temp
Number of failures in Perm
0 2 81
2 0 2
5 0 0
10 0 0
20 0 0
Polling frequency
Polling frequency (number of requests per min)
Number of failures in Temp
Number of failures in Perm
0 0 7124
1 0 811
2 0 30
5 0 12
10 0 1
15 213 254
20 1124 1023
Polling frequency
# of failure
Polling frequency
Vary the reboot time with different polling frequency
# of failure
Polling frequency
Reboot time
Number of Replicas
Number of replicas Number of failures in Temp Number of failures in Perm
No replica 91 8152
2 2 356
3 0 0
4 0 0
Load of Web ServerLoad of the web server
(%)Number of failures in Temp
Number of failures in Perm
70 0 0
75 0 0
80 2 3
85 10 14
90 512 528
95 3214 3125
100 8792 8845
110 8997 8994
Summary of Parameters Number of tries = 2 Timeout period for retry in single server = 10s Timeout period for retry in our paradigm = 5s Polling frequency = 10 request per min Number of replicas = 3 Load of server = < 75%
Cross-continental Experiments cross-continental
(Berlin < - >CUHK) Setting: UDDI: gaia Replication Manager: gaia Client: 141.20.33.17 Replica: PC89084, PC89083, PC89082
Experimental Result (failures / response time in ms)
Experiments Single server Single server with retry
Single server with reboot (continues no response for 3 requests)
Single server with retry and reboot
Spatial replicationRR
Hybrid approachRR+Retry
Hybrid approachRR+Reboot
All round approach RR spatial + Retry (5 times) + reboots
Normal case 0 / 672 0 / 673 0 / 673 0 / 671 0 / 671 0 / 672 0 / 673 0 / 671
Timing Temp
92 / 674 0 / 776 91 / 673 0 / 771 89 / 672 0 / 778 85 / 672 0 / 772
Calculation Temp
89 / 673 0 / 768 88 / 672 0 / 769 78 / 671 0 / 774 87 / 673 0 / 769
Timing Perm
8210 / -- 8258 / -- 2410 / -- 5 / 3076 7456 / -- 7310 / -- 71 / 674 0 / 712
Calculation Perm
8189 / -- 8412 / -- 2623 / -- 3 / 2985 7125 / -- 7514 / -- 68 / 671 0 / 718
Cross-continental 2 Setting: UDDI: gaia Replication Manager: gaia Client: 141.20.33.17 Replica: luna, Uranus, PC89084, PC89083
Experimental Result (failures / response time in ms)
Experiments Single server Single server with retry
Single server with reboot (continues no response for 3 requests)
Single server with retry and reboot
Spatial replicationRR
Hybrid approachRR+Retry
Hybrid approachRR+Reboot
All round approach RR spatial + Retry (5 times) + reboots
Normal case 0 / 410 0 / 412 0 / 411 0 / 411 0 / 410 0 / 412 0 / 411 0 / 410
Timing Temp
89 / 412 0 / 513 90 / 413 0 / 512 89 / 412 0 / 510 87 / 413 0 / 511
Calculation Temp
91 / 411 0 / 514 87 / 411 0 / 509 80 / 410 0 / 508 88 / 412 0 / 509
Timing Perm
8210 / -- 8412 / -- 2714 / -- 5 / 2856 7451 / -- 7247 / -- 75 / 411 0 / 431
Calculation Perm
8145 / -- 8247 / -- 2563 / -- 4 / 2745 7320 / -- 7235 / -- 68 / 410 0 / 433
Average Average response time from local: 156ms Average response tine from CUHK: 670ms Average response time of the whole system:
411ms
Conclusion Surveyed replication and design diversity
techniques for reliable services. Proposed a hybrid approach to improving the
availability of Web services. Carried out a series of experiments to evaluate
the availability and reliability of the proposed Web service system.
Optimal parameter are obtained.
Future Directions Improve the current fault-tolerant techniques
Current approach can deal with hardware and software failures.
How about software fault detectors? N-version programming
Different providers provide different solutions. There is a problem in failover or switch between
the Web Services.
Future Directions Modeling the Web Service behavior
The behavior of the We Services can be studied. Modeling the failure
The failure model of Web Service is different from traditional software.