CHAPTER 4 LINEAR REGRESSION PREDICTION OF DWELLING...
Transcript of CHAPTER 4 LINEAR REGRESSION PREDICTION OF DWELLING...
82
CHAPTER 4
LINEAR REGRESSION PREDICTION
OF DWELLING TIME
The Dwelling time Aware Resource Allocation (DARA) using
fuzzy logic in the previous chapter have proved that it is well been allocated
for the dynamic resources. Though the proposed scheduling algorithm is
performed well in the dynamic environment the maximum resource utilization
is only 65.6%.One of the ways to improve the performance of DARA
algorithm is, to predict the future dwelling time of the resources instead of
taking dwelling time from the history. Resource dwelling time prediction
intends to provide real time forecast of future dwelling time which can
support Grid scheduling decisions. From this perspective, prediction
techniques based onhistorical data manipulation from Grid workload traces
have a high chance of success.Perfect predictions of the dwelling time of the
resources lead us to high performance resource allocation. Hence,in
enhancing the performance of DARA technique,linear time series prediction
technique is hereby introduced.In this chapter, a linear prediction method is
developed that effectively predicts the dwelling times of successive resources
in a grid environment.
4.1 NEED OF PREDICTION MODEL
Grid environments are highly unpredictable in several aspects viz.,
dwelling time of the resources, the load on processors or dynamic nature of
resources.The processor capacities and link rates are often unknown, and
83
resources may connect and disconnect at any time. Variations in the dwelling
time of the resources (e.g., computing power, storage) may lead to a
significant increase in the running time of jobs. This uncertainty in demand
and dwellabilityraises the need for the development of techniques that serve
applications robust for the continuously changing circumstances of the grid
environment. Those techniques strongly rely on the effectiveness of
prediction methods: significant speedups can be obtained by good scheduling
schemes based on accurate predictions. A key requirement of jobs running in
a grid environment is robustness against the dynamically changing
circumstances,to ensure the prediction method to effectively use to choosethe
resource dwelling time. Moreover, since the monitoring capabilities in a real
grid environment will be at the best limited, prediction methods usually based
only on a small number of measurement parameters. The prediction of
dwelling time is very complicated in a real grid environment, for example
even the load (i.e., resource utilization) often cannot be measured often. To
circumvent this problem, the methods used to predict future dwelling time is
only on the basis of the past dwelling times of resources, not requiring any
additional and possibly unavailable measurement data.
From the literature (Harrell 2001, Smith & Wong 2002), it is found
that a relatively simple linear time series models are sufficient for short range
dwelling time prediction as well as the time series analysis is strongly
correlated over time.Time series forecasting methods are based on analysis of
historical data (a set of observations measured at successive times or over
successive periods). They make the assumption that past patterns in data can
be used to forecast future data points. The linear time series model analyzes
the relationship between the response or dependent variable and a set of
independent or predictor variables. This relationship is expressed as an
equation that predicts the response variable as a linear function of the
parameters. If these parameters are linearly appeared then it is a linear model.
84
Much of the effort in model fitting is focused on minimizing the size of the
residual, as well as ensuring that it is randomly distributed with respect to the
model predictions. The goal of time series model is to select the parameters of
the model so as to minimize the sum of the squared residuals. In a linear time
series model the parameters can be obtained by solving the equations in any
methods.The appropriateness of a model is based on comparison of predicted
load sequences with actual load sequences. This comparison is performed on
the basis of optimizing the mean-squared error (between actual and predicted
sequences).
In this chapter, a linear time series prediction model is applied to
predict the dynamic resource dwelling time using past history database.
4.2 RELATED WORK
A predictor, which forecasts the future dwelling time of the
resources that any task is likely to run on them, can help schedulers to make
better decisions(Mentzer& Kahn, 1995). If we can predict and provide the
future dwelling time of grid resources, the scheduler will allocate the
resources effectively and hence the better utilization of the resources. The
future state of the grid resource can be predicted by taking the past dwelling
time of the resource and performing time series analysis on that. Time series
analysis is a technique to understand the underlying situations of a system or
the following values of some information (Loe et al 2007, Uysal&Guvenir
1999). Predicting the resource dwelling time is a challenging problem due to
its dependency on multiple factors like resource stability, varying resource
maintenance and the wide range of policies etc. Prediction algorithms have
been used in many aspects of computing, including the prediction of future
CPU load, job run times,CPU time,wall clock time,physical memory and
virtual memory usage,prediction of job’s cost and resource availability
(Devarakonda&Iyer 1989).
85
Some of the prediction qualities in a grid are predicting running
times of job, predicting the load (Dinda 2002), and network traffic
(Qiao&Dinda 2003),end-to-end behavior (Downey 1997), queue wait
times(Smith et al 1998),data transfer times (Vazhkudai et al 2002, Faerman et
al 1999). Downey (1997), classified applications according to the queue they
were submitted to and determined cumulative distribution functions of the run
times for each class or category to predict application run times.
Shneidman et al (2005), have emphasized the importance of
predicting resource consumption for a successful application of market-based
approaches to solve resource allocation problems in distributed systems.
Historical information has also been used by Smith & Wong (2002) for
predicting job run times. Instead of dividing jobs into categories they used
instance-based learning techniques to determine for each query job the most
relevant historical usage records by computing the ``distances'' between the
attributes of the query job and those of all jobs in the ``experience base''
(historical database of fixed size). The approach based on instance-based
learning techniques has been applied by Li et al (2005) to predict also queue
wait times, instead of using a computationally demanding scheduler
simulation like in their previous work (Li et al 2004).
Dinda&O'Hallaron (2000) used linear time series models to predict
a host's workload from past behavior. The obtained load predictions have then
been used for the purpose of run time prediction (Dinda 2002). Their method,
in contrast to the works described above, provides a confidence interval for
run time predictions. Zhang et al(2008) used an equivalent approach (also
relying on previously known nominal run times), but applied polynomial
fitting to derive host load predictions from past behavior. The estimation of
network bandwidth, although not directly related to our work on job resource
usage, is another interesting example of performance prediction.
86
A conceptually different approach to resource usage prediction is based on
application performance models, having both the advantage and disadvantage
of being application-centric. This allows describing application behavior in a
detailed way but requires a dedicated performance model for each application
(or at least for each application class). Schopf& Berman (2001), for example,
used application performance models to predict run times on contended
resources. Instead of deriving a single predicted value for each application,
their method provides a stochastic prediction represented by a distribution of
likely run times. The benchmark-based procedure for run time prediction used
by Elmroth&Tordsson(2008) requires a run time estimate for at least one of
the available CEs to be provided by the user, similarly to the above mentioned
methods based on host load predictions. The procedure's accuracy is strongly
dependent on this user-specified estimate, consisting in its linear scaling
according to the relevant resource benchmarks. Queue wait times can be
predicted not only indirectly through the prediction of run times, as done by
Smith et al (1999), Li et al (2004), but also in a more direct way. Nurmi et
al(2007), developed a method for predicting bounds with quantitative
confidence levels for the queuing delay of batch jobs.
The prediction methods described so far is directly predicting
resource consumption from historical usage information. Different prediction
methods are used in the above literature to satisfy the need of their
applications. The dwelling time of the Grid resources have not been done so
far for the purpose of scheduling jobs to resources. This work aims at
improving the DARA performance directly by predicting the future dwelling
time of the executing resources.
4.3 LINEAR PREDICTION MODELFOR DARA (LPM-DARA)
Predictions of resource dwelling times can be derived from the
historical resource database using appropriate statistical methods or prediction
87
functions, such as moving average, exponential smoothing, mean value, auto
regressive model etc. In this section the linear prediction model for DARA is
presented. The base of the prediction model is auto regressive model and it is
modified according to the dynamic changes of the resources.
4.3.1 Auto Regressive Model
Autoregressive (AR) predictors multiply previous data points with
some parameter between 0 and 1 to compute the next prediction. AR models
have a parameter p which indicates the number of data points it uses from the
history. Generally, AR predictions (denoted as AR(p)) are calculated in the
following way:
ptpttt yayayay 2211 (4.1)
1
1p
ii
a and0 ai 1 (4.2)
where ty = future predicted data
pty = previous‘p’data points
ia = coefficient
Dinda(2002) has also analyzed AR models, and showed that
AR(16) performs well only in case of periodicity. But the datasets do not have
periodicityproperty, since grid jobs and resources do not appear periodically
rather it appears randomly. Hence the AR model is modified according to the
dynamism of the Grid system.
4.3.2 Modified Auto Regressive Model
88
The AR(p) model has the fixed number of parameter.Because of
this static characteristic, the AR model will not be effective for all situations.
This AR model can be modified for effective prediction of dwelling time
under the dynamic environment. To establish the dynamism in static AR
model, a data variation factor and some conditions are introduced. The linear
prediction model for dwelling time aware resource allocation (LPM-DARA)
is shown in Figure 4.1. The model takes historical data as input and generates
prediction for future variation.
History Future
Figure 4.1Linear prediction frameworkfor DARA
When a resource is given, a prediction model is developed for the
particular resource and the upcoming dwelling time is determined using the
prediction model. The prediction model mainly consists of three steps,
namely;
a) Determining the data variation factor
b) Generation of linear time series model
c) Determining the future dwelling time
4.3.3 Determining the Data Variation Factor
Linear Prediction Model
DeterminingData
Variation Factor
Generation of Linear Time Series Model
Determining the future
dwelling time
89
In developing a linear time series prediction model the first step is
to determine the data variation factor. Data variation factor is defined as the
number of times the dwelling time of the particular resource is varied for the
past‘t’ time slots. To determine this,six various criterions are defined to check
the variation in dwelling time of the same resource. The criterion will check
the ‘t’th dwelling time with the ‘t+1’ time and ‘t-1’ dwelling time for any data
variation.Similarly ‘t’th dwelling time will be checked for all six criterions
which are given below.If any one of the six criterions is satisfied,then it is
said to be that the data is varied. This process of determining j (where, j
refers to the resource ID) is described as a flowchart in Figure 4.2.
Criterion 1:
a) )1()( tj
tj dd
b) )1()( tj
tj dd
Criterion 2:
a) )1()( tj
tj dd
b) )1()( tj
tj dd
Criterion 3:
a) )1()( tj
tj dd
b) )1()( tj
tj dd
Criterion 4:
a) )1()( tj
tj dd
b) )1()( tj
tj dd
Criterion 5:
90
a) )1()( tj
tj dd
b) )1()( tj
tj dd
Criterion 6:
a) )1()( tj
tj dd
b) )1()( tj
tj dd
Figure4.2 Flowchart for determining data variation factor
Is t >9?
Stop
Get dwelling time of resource j, )(t
jd
Set t to unity
Set j to zero
Is )(tjd satisfies
anyone criteria?
Increment j by unit value
Increment t by unit value
YesNo
Yes
No
91
The first step determines the order of the proposed prediction
model dynamically. This value will be varied for each and every resource
based on their previous data.
4.3.4 Generation of Linear Time Series Model
After determining the data variation factor the next step is to
develop the linear time series predictor function of order ‘n’. The factor j
value is considered as the order of the equation. The order will be
dynamically changing for each and every resource.
)()3(3
)2(2
)1(1
)( ntjn
tj
tj
tj
tj ddddd (4.3)
Where 11
n
ii and0 i 1 (4.4)
)(tjd = Predicted dwelling time
)( ntjd = Past dwelling time
‘n’ represents the order of the equation,
‘j’ represents the resource identity
= modified AR model parameter
The parameters will be estimated using the standard least square
method. After estimating the parameters the prediction model is determined.
This is used as the prediction model to determine the future dwelling
time.This process is done for semi permanent types of resources and sporadic
types of resources.
92
4.3.5 Determining the Future Dwelling Time
After generating the model with parameters the next step is to
determine the future dwelling time. To determine the future dwelling time, the
following condition is applied.
Let (j)be the dwelling time selector which finds the difference
between the predicted and the minimum value of previous dwelling time
value.
It is defined as
(j)=)(t
jd - min( )( ntjd ) (4.5)
The difference value will be checked for the following condition:
0, setfuture dwelling time as min( )( ntjd )
(j) = < 0, setfuture dwelling time as
)(tjd
If the difference value is negative, select the future dwelling time as
predicted value found by the prediction model and if it is zero or positive then
the future dwelling time is selected as min( )( ntjd ) value. This is done
because to avoid the incompletion of job work due to the reliving of resources
before its registered dwelling time.
4.4 EVALUATION METHODOLOGY
The evaluation methodology is designed as follows
1. Choose a resource Rj from resource database to be allocated
93
2. Check the history size of that resource for the past N events
(history size)
3. If the history size is zero (i.e. new resource which has not
previously serviced) no prediction is made and assume it has
minimum dwelling time (1 sec), and add to the sporadic type
resource.
4. If the history size is complete (i.e which contains N previous
events), apply the prediction function (linear regression) to the
history to determine the future dwelling time.
5. Update the database (history).
4.5 PERFORMANCE CRITERIA
To analyze the accuracy of a prediction method error of a single
prediction, mean square error, absolute prediction error, relative mean
prediction error are calculated.
The error of a single prediction is evaluated by comparing the
predicted dwelling time with the respective actual dwelling time.
Error = predicted – actual
= ( ) ( ) (4.6)
The mean square error (MSE) is defined by taking the average of
the square of the error function over N time is
MSE = ] (4.7)
94
The absolute prediction error is
= ( ) ( ) (4.8)
The relative mean prediction error is
= ( ) (4.9)
where N is the number of predictions taken for the dataset, ( ) is the
actual dwelling time of the resource ( ), and ( ) is the predicted value of
dwelling time of resource ( ).
4.6 RESOURCE ALLOCATION
Resource dwelling time prediction is based on the historical data of
every grid resource to predict the future dwelling time. In the previous chapter
the resources are classified as permanent resources, Semi permanent resources
and sporadic resources based on their dwelling time in the system. As the
reliability fact for the permanent resources are high there is no need for
prediction of availability for the future. But the dwellability of the semi
permanent and sporadic types of resources is varying with respect to time, it is
important to predict their dwelling time so that the scheduling performance
can be improved. The dwelling time predictor predicts the upcoming dwelling
time for the particular resource and given to the Fuzzy Inference System as
one of the input. The three input variables are i) priority of job ii) requirement
time which is demanded by the jobs iii) futuredwelling time of resource
respectively, and an output variable; and the rules take the form of thenif
statements. The fuzzy inference system is inferred from the input vector based
on a set of rules. Then the output of inference system is compared with the
95
different threshold score values; if it is above a threshold score, the
corresponding job should be allotted to the resource immediately and stores
the details for the next scheduling. Each input variable carries three values,
which are termed as min, mid and max.
4.6.1 Experimental Setup
The proposed resource allocation technique was implemented in the
working platform of MATLAB (version 7.10) with system specifications,
Intel (R) core i5 CPU, 3.20GHz and 3GB RAM. The performance of the
prediction technique was analyzed by executing with different 5 synthetic job
datasets.The five datasets have been taken from the previous chapter for
simulation. This technique is compared with the existing resource allocation
techniques using the performance measures such as utilization rate, failure
rate and makespan. The main requirement for the resource allocation
technique is the historical dataset. Here the historical dataset is simulated with
N=10time slots.
4.6.2 Prediction Model Analysis
The linear time series prediction for DARA is evaluated in terms of
Mean Square Error (MSE),Relative Mean Prediction Error (RMPE)and %
prediction accuracy. The MSEof the predicted sporadic and semi-permanent
type of CPU, Memory Storage and Disk Storage resources are shown in
Figure 4.3. The history size ‘N’ of values 1, 2,3,…,10 are used to evaluate the
errors and accuracy.
96
(a) CPU resource MSE
(b) Memory resource MSE
(c) Disk storage resource MSE
Figure 4.3 Mean square error of resources (a) CPU, (b) Memoryand(c) Disk storage
1 2 3 4 5 6 7 8 9 100.5
1
1.5
2
2.5
3
3.5
History Size "N"
Semi PermanentSporadic
1 2 3 4 5 6 7 8 9 100.5
1
1.5
2
2.5
3
History Size "N"
Semi permanentSporadic
1 2 3 4 5 6 7 8 9 100.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
History Size "N"
Semi PermanentSporadic
97
Figure 4.3 shows the overall performance of the dwelling time
predictions interms of MSE (Equation (4.7) for different history size. From
this figure, it is understood that sporadic type have more mean square error
than semi permanent. The average of all dataset sporadic resources is plotted
and it ranges from 1.45 to 3.25 for CPU resources, 1.2 to 2.9 for memory
resources and 1.22 to 2.5 for disk storage resources. But for semi permanent
resources it ranges from 0.6 to 3.1 for CPU resources, 0.7 to 2.5 for memory
resources and 0.8 to 2.4 for disk storage resources. The error of sporadic
resources is more when compare to semi permanent resources. This shows
more dynamism for sporadic than semi permanent.
The average relative mean prediction errors (RMPE) of all datasets
for different types of resources are shown in Figure 4.4.
(a) CPU resource RMPE
Figure 4.4 (Continued)
1 2 3 4 5 6 7 8 9 10
0.2
0.25
0.3
0.35
0.4
0.45
0.5
History Size "N"
Semi permanentSporadic
98
(b) Memory resource RMPE
(c) Disk Storage RMPE
Figure 4.4 Relative mean prediction error of (a) CPU, (b) Memory and (c) Disk
Figure 4.4 shows the overall performance of the dwelling time
predictions in terms of relative mean prediction error (RMPE) as a
function of the history size for all three datasets. For N is 1, the relative mean
1 2 3 4 5 6 7 8 9 100.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
History Size "N"
Semi PermanentSporadic
1 2 3 4 5 6 7 8 9 100.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
History Size "N"
Semi permanentSporadic
99
prediction error is more and as N increases the error is also reduced. But for
higher values the RMPE value is almost constant. Hence the optimum history
size range is between 4 and 6.
The average of prediction accuracy of sporadic and semi permanent
resources is shown in Figure 4.5.
Figure4.5Averageprediction accuracy of resources
To understand the quality of dwelling time predictions, it is
important to examine not only the mean square error and relative mean
prediction error but also the prediction accuracy of all resources. Figure 4.4
depicts the accuracy of predictions in percentage with respect to history size.
For predictions of size 1,2,…10 events the graph shows an increase in
accuracy and for history size between 4 and 6 the accuracy is stabilized.
1 2 3 4 5 6 7 8 9 1076
78
80
82
84
86
88
90
History Size "N"
SporadicSemi permanent
100
4.6.3 Prediction Based Scheduling Analysis
Resource dwelling time prediction can improve performance of
scheduling and this section describes the influence of predictions on grid
scheduling.A quantitative analysis is made based on dwelling time based
fuzzy scheduling with the same five datasets generated in the previous
chapter. The scheduling quality is evaluated according to average job
makespan, resource utilization and the failure rate. Makespan is calculated for
each job as the time from submission to completion and this value is averaged
across all jobs. Resource time utilization indicates the ratio of number of
resources allocated to total number of resources. InDARA technique, the
fuzzy threshold filtering has been performed in two locations. One is at the
point of evaluating the fuzzy score that deals with the sporadic resource type
and the other is at the point of evaluating the fuzzy score that deals with semi-
permanent resource type. But in this chapter, the threshold values (Sth-
IIIandSth-II) are varied from 0.3 to 0.7 keeping the job and resource scenarios
for five different datasets as mentioned in chapter 3. The compilation of the
scenario conditions are presented in the Table 4.1. The performance metrics
from the simulation for the all the datasets under prediction model for
different threshold values, with respect to utilization, failure rate and
makespan are tabulated inTables 4.2 to 4.6.
Table 4.1 Compilation of scenarios for Datasets I-V
Type Dataset-I Dataset-II Dataset-III Dataset-IV Dataset-V
Resource Jobs Resource Jobs Resource Jobs Resource Jobs Resource Jobs
CPU Equal Equal Less More More Equal More less Equal More
Memory Equal Equal More Less Less Equal less more Equal More
Disk Equal Equal more less less Equal less more Equal less
101
Table 4.2Performancemetrics for Dataset-I
S.No Sth-III Sth-IIUtilization
(in %) Failure rate
(in %) Makespan
(in sec) 1 0.3 0.3 72.55 27.45 96.492 0.3 0.4 76.67 23.45 80.7273 0.3 0.5 73.78 26.22 76.8344 0.3 0.6 75.66 24.34 82.5925 0.3 0.7 74.97 25.03 83.6736 0.4 0.3 70.63 29.37 81.0357 0.4 0.4 74.42 25.58 84.9028 0.4 0.5 75.72 24.28 84.4749 0.4 0.6 74.19 25.81 91.765
10 0.4 0.7 73.92 26.08 82.56511 0.5 0.3 73.74 26.26 94.16812 0.5 0.4 73.53 26.47 82.21113 0.5 0.5 72.31 27.69 65.0114 0.5 0.6 74.7 25.3 69.3915 0.5 0.7 72.53 27.47 85.69516 0.6 0.3 71.46 28.54 73.4217 0.6 0.4 74.43 25.57 87.6518 0.6 0.5 71.09 28.91 73.92119 0.6 0.6 76.62 23.38 77.2920 0.6 0.7 72.73 27.27 82.21121 0.7 0.3 75.66 24.34 85.69522 0.7 0.4 74.64 25.36 79.70623 0.7 0.5 70.11 29.89 98.9824 0.7 0.6 72.8 27.2 72.86125 0.7 0.7 71.56 28.44 91.953
102
Table4.3Performancemetrics for Dataset-II
S.No Sth-III Sth-IIUtilization
(in %) Failure rate
(in %) Makespan
(in sec) 1 0.3 0.3 57.12 42.88 110.62
2 0.3 0.4 58.16 41.84 146.81
3 0.3 0.5 59.3 40.7 146.81
4 0.3 0.6 58.02 41.98 109.76
5 0.3 0.7 49.23 50.77 109.75
6 0.4 0.3 53.98 46.02 102.8
7 0.4 0.4 60.11 39.89 136.52
8 0.4 0.5 57.3 42.7 131
9 0.4 0.6 51.7 48.3 112.52
10 0.4 0.7 59.4 40.6 119.54
11 0.5 0.3 56.83 43.17 148.03
12 0.5 0.4 56.48 43.52 122.97
13 0.5 0.5 58.5 41.5 117.12
14 0.5 0.6 55.71 44.29 130.95
15 0.5 0.7 54.8 45.2 114.91
16 0.6 0.3 50.83 49.17 105.75
17 0.6 0.4 52.79 47.21 118.32
18 0.6 0.5 51.66 48.34 141.32
19 0.6 0.6 59.3 40.7 142.54
20 0.6 0.7 50.4 49.6 122.24
21 0.7 0.3 63.5 36.5 126.18
22 0.7 0.4 55.24 44.76 123.93
23 0.7 0.5 50.13 49.87 137.21
24 0.7 0.6 57.31 42.69 105.52
25 0.7 0.7 59.8 40.2 130.62
103
Table4.4Performancemetrics of Dataset-III
S.No Sth-III Sth-II Utilization (in %)
Failure rate (in %)
Makespan (in sec)
1 0.3 0.3 70.21 29.79 123.56
2 0.3 0.4 66.6 33.4 159.35
3 0.3 0.5 66.15 33.85 112.22
4 0.3 0.6 72.7 27.3 127.18
5 0.3 0.7 65.35 34.65 165.01
6 0.4 0.3 65.14 34.86 153.82
7 0.4 0.4 64.88 35.12 129.45
8 0.4 0.5 69.3 30.7 143.06
9 0.4 0.6 64.29 35.71 182.55
10 0.4 0.7 63.94 36.06 138.63
11 0.5 0.3 67.5 32.5 185.86
12 0.5 0.4 63.31 36.69 121.92
13 0.5 0.5 66.6 33.4 110.2214 0.5 0.6 62.04 37.96 132.54
15 0.5 0.7 61.78 38.22 154.61
16 0.6 0.3 61.32 38.68 157.34
17 0.6 0.4 60.98 39.02 109.3
18 0.6 0.5 62.8 37.2 181.79
19 0.6 0.6 60.03 39.97 127.62
20 0.6 0.7 59.76 40.24 139.04
21 0.7 0.3 59.42 40.58 129.17
22 0.7 0.4 65.9 34.1 120.42
23 0.7 0.5 58.13 41.87 183.84
24 0.7 0.6 63.7 36.3 133.31
25 0.7 0.7 61.3 38.7 163.75
104
Table4.5 Performance metrics of Dataset-IV
S.No Sth-III Sth-IIUtilization
(in %) Failure rate
(in %) Makespan
(in sec) 1 0.3 0.3 67 33 192.55
2 0.3 0.4 66.34 33.66 167.34
3 0.3 0.5 59.33 40.67 137.18
4 0.3 0.6 65.08 34.92 143.31
5 0.3 0.7 64.35 35.65 174.6
6 0.4 0.3 63.75 36.25 182.02
7 0.4 0.4 63.18 36.82 127.5
8 0.4 0.5 62.6 37.4 140.69
9 0.4 0.6 65.49 34.51 123.95
10 0.4 0.7 64 36 160.16
11 0.5 0.3 60.37 39.63 162.38
12 0.5 0.4 69.51 30.49 183.09
13 0.5 0.5 65.9 34.1 125.32
14 0.5 0.6 57.15 42.85 137.05
15 0.5 0.7 58.9 41.36 123.62
16 0.6 0.3 55.24 44.76 192.52
17 0.6 0.4 55.09 44.91 148.51
18 0.6 0.5 64.08 35.92 192.26
19 0.6 0.6 53.59 46.41 128.32
20 0.6 0.7 62.32 37.68 132.13
21 0.7 0.3 51.13 48.87 132
22 0.7 0.4 50.75 49.25 172.6
23 0.7 0.5 49.56 50.44 126.2
24 0.7 0.6 58.65 41.35 160.12
25 0.7 0.7 47 53 133.33
105
Table4.6 Performance metrics of Dataset-V
S.No Sth-III Sth-IIUtilization
(in %) Failure rate
(in %) Makespan
(in sec) 1 0.3 0.3 66.67 23.33 187.21
2 0.3 0.4 71.73 28.27 177.21
3 0.3 0.5 67.8 32.2 155.6
4 0.3 0.6 70.93 29.07 157.2
5 0.3 0.7 67.9 32.1 185.13
6 0.4 0.3 68.23 31.77 137.14
7 0.4 0.4 67.09 32.91 141.02
8 0.4 0.5 70.93 29.07 143.13
9 0.4 0.6 65.19 34.81 140.22
10 0.4 0.7 67.3 32.7 155.45
11 0.5 0.3 64.26 35.74 137.12
12 0.5 0.4 62.85 37.15 179.61
13 0.5 0.5 68.6 31.4 139.6
14 0.5 0.6 61.96 38.04 140.02
15 0.5 0.7 69.59 30.41 188.73
16 0.6 0.3 65.8 34.2 180.34
17 0.6 0.4 60.26 39.74 155.45
18 0.6 0.5 70.18 29.82 151.81
19 0.6 0.6 59.81 40.19 175.14
20 0.6 0.7 64.76 35.24 141.2
21 0.7 0.3 58.55 41.45 187.02
22 0.7 0.4 63.7 36.3 165.61
23 0.7 0.5 57.52 42.48 188.92
24 0.7 0.6 57.89 42.11 167.88
25 0.7 0.7 66.8 33.2 197.7
106
Table 4.2 shows for all possible Sth-IIIandSth-II, threshold values and its corresponding utilization, failure and makespan for the first scenario dataset. It is observed that the maximum resource utilization achieved in the sporadic and semi-permanent thresholds of 0.3 and 0.4 respectively as 76.67%. But for these thresholds the makespan is 96.49 seconds. The next maximum utilization is 76.62% for 0.6 and 0.6 and makespan for this threshold is 77.29 seconds. The minimum makespan achieved as 65.01 seconds at the thresholds of 0.5 and 0.5 and the corresponding utilization is 72.31% which is not maximum value. This indicates that there is no single threshold for achieving maximum utilization and minimum makespan. For the first dataset equal number of resources and jobs has been taken.
For the second dataset(Table 4.3) the resource allocation is performed and the average utilization of all resources, failure rate and makespan are shown in Table4.3. The maximum utilization is 63.5% at 0.7 and 0.3 and the minimum makespan is 102.8 seconds at 0.4 and 0.3 thresholds respectively. The makespan at 0.7 and 0.3 is 126.18 seconds and at 0.4 and 0.3 the utilization is 53.98 % which is almost 10% is reduced from maximum. This infers that less number of CPU resources make the sporadic threshold as high as 0.7.
The third dataset(Table 4.4) maximum utilization is 72.7% at 0.3 and 0.6 and the minimum makespan is 109.3 seconds at 0.6 and 0.4 thresholds respectively. The makespan at 0.3 and 0.6 is 127.18 seconds and 60.98% utilization at 0.6 and 0.4.
The maximum utilization is 67% at 0.3 and 0.3 and minimum makespan is 123.62 seconds at 0.5 and 0.7 thresholds respectively for fourth dataset(Table 4.5). The fifth dataset(Table 4.6) maximum utilization is 71.73% at 0.3 and 0.4 and the minimum makespan is 137.12 seconds at 0.5 and 0.3 thresholds respectively. The makespan at 0.3 and 0.4 is 177.21 seconds and 64.26% utilization at 0.5 and 0.3.
107
After analyzing all these five datasets, it has been seen that none of the threshold value set (Sth-IIIandSth-II) is repeated for any best utilization and makespan. The makespan value shows uncertain variation and lessening the threshold will increase the utilization rate and hence minimizes the failure rate. For every dataset, different utilization and different makespanare achieved for different threshold values. To come to a conclusion about the common threshold value, the following Tables 4.7 and 4.8 can be used.
Table 4.7 Thresholds for maximum resource utilization
Sth-III Sth-II Max. Utilization (%) Dataset-I 0.3 0.4 76.55Dataset-II 0.7 0.3 63.5Dataset-III 0.3 0.6 72.7Dataset-IV 0.3 0.3 67Dataset-V 0.3 0.4 71.73
Table4.8Thresholds for minimum makespan
Sth-III Sth-II Min. Makespan(Sec) Dataset-I 0.5 0.5 65.01Dataset-II 0.4 0.3 102.8Dataset-III 0.6 0.4 109.3Dataset-IV 0.5 0.7 123.62Dataset-V 0.5 0.3 137.12
4.6.4 Comparative Analysis
In chapter 3 the performance evaluation was evaluated by fixing the
fuzzy threshold as 0.5 for both Sth-III (Sporadic) and Sth-II(Semi-permanent) for
DARA. To compare the efficiency of LPM-DARA with DARA, the
performance metricsof bothSth-III andSth-II for 0.5 threshold values from the
Tables 4.2 to 4.6 has been taken and presented in the Figure 4.6.
108
0
10
20
30
40
50
60
Dataset-I Dataset-II Dataset-III Dataset-IV Dataset-V
Failu
reRa
tein
%
Failure Rate
DARA
LPM-DARA
(a) Resource Utilization
(b) Makespan
(c) Failure Rate
Figure 4.6Performancemetrics for five datasets at 0.5
0
20
40
60
80
Dataset-I Dataset-II Dataset-III Dataset-IV Dataset-V
Util
izat
ion
in%
Resource Utilization
DARA
LPM-DARA
0
50
100
150
200
Dataset-I Dataset-II Dataset-III Dataset-IV Dataset-V
Mak
espa
nin
Sec
Makespan
DARA
LPM-DARA
109
From Figure 4.6 the utilization of the resources is greatly increased
when compare to the resource utilization of DARA technique. At least 4% of
the resource time is utilized more in predicted output. For dataset-I the
utilization is maximum when compare toother datasets for both types of
resource allocation techniques. The utilization is decreased half of N but for
with prediction only 20% is decreased. It shows that if the scheduling is done
with prediction better utilization as well as the minimum failure rate is
achieved.
To further analysis, the outputs of the LPM-DARA technique in
terms of the performance metrics of Sth-III are averaged individually for the
clarity. The averaging has been done only for the sporadic resources because
of the high uncertainty and dynamic nature of the dwelling time. So it is
logically fit to analyze the sporadic resources to get a better understanding of
the scheduler. Based on the averaged value of each category, the performance
metrics are presented in Figure 4.7 (a), (b) and (c).
(a)
Figure 4.7 (Continued)
0.3 0.4 0.5 0.6 0.740
45
50
55
60
65
70
75
80
Util
izat
ion
in%
STh-III
Dataset I Dataset II Dataset III Dataset IV Dataset V
110
(b)
(c)
Figure 4.7 Compilation of performance metrics for (a) Resource utilization in %, (b) Makespan in seconds and (c) Failure rate in %.
0.3 0.4 0.5 0.6 0.7
80
90
100
110
120
130
140
150
160
170
180
Mak
espa
nin
sec
STh-III
Dataset I Dataset II Dataset III Dataset IV Dataset V
0.3 0.4 0.5 0.6 0.70
5
10
15
20
25
30
35
40
45
50
Failu
rera
te
Sth-III
Dataset I Dataset II Dataset III Dataset IV Dataset V
111
The utilization of the resources is inbetween 40% and 75% for all
the data sets. It shows that the increase in fuzzy threshold values affects the %
utilization linearly. This may be due to the probability of allocating the jobs is
more to sporadic resources if the value is fixed in 0.3. But, the increasing in
threshold value generate the situation where more jobs will be allotted to
semi-permanent resources, so the job allocation is limited to sporadic
resources and hence the % utilization comes down due to unutilized sporadic
resources. The failure rates are in compliance with the same conclusion. In
general, the dataset I, III and V produced less utilization which may be due to
the CPU shortage. The makespan data reveals that dataset I and II are
showing less value and others in the range of 130-180 seconds. It may be due
to the mismatch between distributions of resources and jobs under dynamic
environment.
4.7 SUMMARY
Linear prediction algorithms predict dwelling time using historical
data without requiring detailed knowledge of the underlying hardware and the
application. A set of past observations are kept for each machine and these are
used to make predictions of new incoming resources. The prediction made is
used to assist scheduler when allocating resources to the job. Statistical
algorithms are able to make better predictions as the number of past
observations increases. Linear predictions have their drawbacks as the
accuracy of their prediction depends on how well the past observations are
reflective of future incoming jobs.