CHAPTER 4 LINEAR REGRESSION PREDICTION OF DWELLING...

82

CHAPTER 4

LINEAR REGRESSION PREDICTION

OF DWELLING TIME

The Dwelling time Aware Resource Allocation (DARA) using

fuzzy logic in the previous chapter have proved that it is well been allocated

for the dynamic resources. Though the proposed scheduling algorithm is

performed well in the dynamic environment the maximum resource utilization

is only 65.6%.One of the ways to improve the performance of DARA

algorithm is, to predict the future dwelling time of the resources instead of

taking dwelling time from the history. Resource dwelling time prediction

intends to provide real time forecast of future dwelling time which can

support Grid scheduling decisions. From this perspective, prediction

techniques based onhistorical data manipulation from Grid workload traces

have a high chance of success.Perfect predictions of the dwelling time of the

resources lead us to high performance resource allocation. Hence,in

enhancing the performance of DARA technique,linear time series prediction

technique is hereby introduced.In this chapter, a linear prediction method is

developed that effectively predicts the dwelling times of successive resources

in a grid environment.

4.1 NEED OF PREDICTION MODEL

Grid environments are highly unpredictable in several aspects viz.,

dwelling time of the resources, the load on processors or dynamic nature of

resources.The processor capacities and link rates are often unknown, and

83

resources may connect and disconnect at any time. Variations in the dwelling

time of the resources (e.g., computing power, storage) may lead to a

significant increase in the running time of jobs. This uncertainty in demand

and dwellabilityraises the need for the development of techniques that serve

applications robust for the continuously changing circumstances of the grid

environment. Those techniques strongly rely on the effectiveness of

prediction methods: significant speedups can be obtained by good scheduling

schemes based on accurate predictions. A key requirement of jobs running in

a grid environment is robustness against the dynamically changing

circumstances,to ensure the prediction method to effectively use to choosethe

resource dwelling time. Moreover, since the monitoring capabilities in a real

grid environment will be at the best limited, prediction methods usually based

only on a small number of measurement parameters. The prediction of

dwelling time is very complicated in a real grid environment, for example

even the load (i.e., resource utilization) often cannot be measured often. To

circumvent this problem, the methods used to predict future dwelling time is

only on the basis of the past dwelling times of resources, not requiring any

additional and possibly unavailable measurement data.

From the literature (Harrell 2001, Smith & Wong 2002), it is found

that a relatively simple linear time series models are sufficient for short range

dwelling time prediction as well as the time series analysis is strongly

correlated over time.Time series forecasting methods are based on analysis of

historical data (a set of observations measured at successive times or over

successive periods). They make the assumption that past patterns in data can

be used to forecast future data points. The linear time series model analyzes

the relationship between the response or dependent variable and a set of

independent or predictor variables. This relationship is expressed as an

equation that predicts the response variable as a linear function of the

parameters. If these parameters are linearly appeared then it is a linear model.

84

Much of the effort in model fitting is focused on minimizing the size of the

residual, as well as ensuring that it is randomly distributed with respect to the

model predictions. The goal of time series model is to select the parameters of

the model so as to minimize the sum of the squared residuals. In a linear time

series model the parameters can be obtained by solving the equations in any

methods.The appropriateness of a model is based on comparison of predicted

load sequences with actual load sequences. This comparison is performed on

the basis of optimizing the mean-squared error (between actual and predicted

sequences).

In this chapter, a linear time series prediction model is applied to

predict the dynamic resource dwelling time using past history database.

4.2 RELATED WORK

A predictor, which forecasts the future dwelling time of the

resources that any task is likely to run on them, can help schedulers to make

better decisions(Mentzer& Kahn, 1995). If we can predict and provide the

future dwelling time of grid resources, the scheduler will allocate the

resources effectively and hence the better utilization of the resources. The

future state of the grid resource can be predicted by taking the past dwelling

time of the resource and performing time series analysis on that. Time series

analysis is a technique to understand the underlying situations of a system or

the following values of some information (Loe et al 2007, Uysal&Guvenir

1999). Predicting the resource dwelling time is a challenging problem due to

its dependency on multiple factors like resource stability, varying resource

maintenance and the wide range of policies etc. Prediction algorithms have

been used in many aspects of computing, including the prediction of future

CPU load, job run times,CPU time,wall clock time,physical memory and

virtual memory usage,prediction of job’s cost and resource availability

(Devarakonda&Iyer 1989).

85

Some of the prediction qualities in a grid are predicting running

times of job, predicting the load (Dinda 2002), and network traffic

(Qiao&Dinda 2003),end-to-end behavior (Downey 1997), queue wait

times(Smith et al 1998),data transfer times (Vazhkudai et al 2002, Faerman et

al 1999). Downey (1997), classified applications according to the queue they

were submitted to and determined cumulative distribution functions of the run

times for each class or category to predict application run times.

Shneidman et al (2005), have emphasized the importance of

predicting resource consumption for a successful application of market-based

approaches to solve resource allocation problems in distributed systems.

Historical information has also been used by Smith & Wong (2002) for

predicting job run times. Instead of dividing jobs into categories they used

instance-based learning techniques to determine for each query job the most

relevant historical usage records by computing the ``distances'' between the

attributes of the query job and those of all jobs in the ``experience base''

(historical database of fixed size). The approach based on instance-based

learning techniques has been applied by Li et al (2005) to predict also queue

wait times, instead of using a computationally demanding scheduler

simulation like in their previous work (Li et al 2004).

Dinda&O'Hallaron (2000) used linear time series models to predict

a host's workload from past behavior. The obtained load predictions have then

been used for the purpose of run time prediction (Dinda 2002). Their method,

in contrast to the works described above, provides a confidence interval for

run time predictions. Zhang et al(2008) used an equivalent approach (also

relying on previously known nominal run times), but applied polynomial

fitting to derive host load predictions from past behavior. The estimation of

network bandwidth, although not directly related to our work on job resource

usage, is another interesting example of performance prediction.

86

A conceptually different approach to resource usage prediction is based on

application performance models, having both the advantage and disadvantage

of being application-centric. This allows describing application behavior in a

detailed way but requires a dedicated performance model for each application

(or at least for each application class). Schopf& Berman (2001), for example,

used application performance models to predict run times on contended

resources. Instead of deriving a single predicted value for each application,

their method provides a stochastic prediction represented by a distribution of

likely run times. The benchmark-based procedure for run time prediction used

by Elmroth&Tordsson(2008) requires a run time estimate for at least one of

the available CEs to be provided by the user, similarly to the above mentioned

methods based on host load predictions. The procedure's accuracy is strongly

dependent on this user-specified estimate, consisting in its linear scaling

according to the relevant resource benchmarks. Queue wait times can be

predicted not only indirectly through the prediction of run times, as done by

Smith et al (1999), Li et al (2004), but also in a more direct way. Nurmi et

al(2007), developed a method for predicting bounds with quantitative

confidence levels for the queuing delay of batch jobs.

The prediction methods described so far is directly predicting

resource consumption from historical usage information. Different prediction

methods are used in the above literature to satisfy the need of their

applications. The dwelling time of the Grid resources have not been done so

far for the purpose of scheduling jobs to resources. This work aims at

improving the DARA performance directly by predicting the future dwelling

time of the executing resources.

4.3 LINEAR PREDICTION MODELFOR DARA (LPM-DARA)

Predictions of resource dwelling times can be derived from the

historical resource database using appropriate statistical methods or prediction

87

functions, such as moving average, exponential smoothing, mean value, auto

regressive model etc. In this section the linear prediction model for DARA is

presented. The base of the prediction model is auto regressive model and it is

modified according to the dynamic changes of the resources.

4.3.1 Auto Regressive Model

Autoregressive (AR) predictors multiply previous data points with

some parameter between 0 and 1 to compute the next prediction. AR models

have a parameter p which indicates the number of data points it uses from the

history. Generally, AR predictions (denoted as AR(p)) are calculated in the

following way:

ptpttt yayayay 2211 (4.1)

1

1p

ii

a and0 ai 1 (4.2)

where ty = future predicted data

pty = previous‘p’data points

ia = coefficient

Dinda(2002) has also analyzed AR models, and showed that

AR(16) performs well only in case of periodicity. But the datasets do not have

periodicityproperty, since grid jobs and resources do not appear periodically

rather it appears randomly. Hence the AR model is modified according to the

dynamism of the Grid system.

4.3.2 Modified Auto Regressive Model

88

The AR(p) model has the fixed number of parameter.Because of

this static characteristic, the AR model will not be effective for all situations.

This AR model can be modified for effective prediction of dwelling time

under the dynamic environment. To establish the dynamism in static AR

model, a data variation factor and some conditions are introduced. The linear

prediction model for dwelling time aware resource allocation (LPM-DARA)

is shown in Figure 4.1. The model takes historical data as input and generates

prediction for future variation.

History Future

Figure 4.1Linear prediction frameworkfor DARA

When a resource is given, a prediction model is developed for the

particular resource and the upcoming dwelling time is determined using the

prediction model. The prediction model mainly consists of three steps,

namely;

a) Determining the data variation factor

b) Generation of linear time series model

c) Determining the future dwelling time

4.3.3 Determining the Data Variation Factor

Linear Prediction Model

DeterminingData

Variation Factor

Generation of Linear Time Series Model

Determining the future

dwelling time

89

In developing a linear time series prediction model the first step is

to determine the data variation factor. Data variation factor is defined as the

number of times the dwelling time of the particular resource is varied for the

past‘t’ time slots. To determine this,six various criterions are defined to check

the variation in dwelling time of the same resource. The criterion will check

the ‘t’th dwelling time with the ‘t+1’ time and ‘t-1’ dwelling time for any data

variation.Similarly ‘t’th dwelling time will be checked for all six criterions

which are given below.If any one of the six criterions is satisfied,then it is

said to be that the data is varied. This process of determining j (where, j

refers to the resource ID) is described as a flowchart in Figure 4.2.

Criterion 1:

a) )1()( tj

tj dd

b) )1()( tj

tj dd

Criterion 2:

a) )1()( tj

tj dd

b) )1()( tj

tj dd

Criterion 3:

a) )1()( tj

tj dd

b) )1()( tj

tj dd

Criterion 4:

a) )1()( tj

tj dd

b) )1()( tj

tj dd

Criterion 5:

90

a) )1()( tj

tj dd

b) )1()( tj

tj dd

Criterion 6:

a) )1()( tj

tj dd

b) )1()( tj

tj dd

Figure4.2 Flowchart for determining data variation factor

Is t >9?

Stop

Get dwelling time of resource j, )(t

jd

Set t to unity

Set j to zero

Is )(tjd satisfies

anyone criteria?

Increment j by unit value

Increment t by unit value

YesNo

Yes

No

91

The first step determines the order of the proposed prediction

model dynamically. This value will be varied for each and every resource

based on their previous data.

4.3.4 Generation of Linear Time Series Model

After determining the data variation factor the next step is to

develop the linear time series predictor function of order ‘n’. The factor j

value is considered as the order of the equation. The order will be

dynamically changing for each and every resource.

)()3(3

)2(2

)1(1

)( ntjn

tj

tj

tj

tj ddddd (4.3)

Where 11

n

ii and0 i 1 (4.4)

)(tjd = Predicted dwelling time

)( ntjd = Past dwelling time

‘n’ represents the order of the equation,

‘j’ represents the resource identity

= modified AR model parameter

The parameters will be estimated using the standard least square

method. After estimating the parameters the prediction model is determined.

This is used as the prediction model to determine the future dwelling

time.This process is done for semi permanent types of resources and sporadic

types of resources.

92

4.3.5 Determining the Future Dwelling Time

After generating the model with parameters the next step is to

determine the future dwelling time. To determine the future dwelling time, the

following condition is applied.

Let (j)be the dwelling time selector which finds the difference

between the predicted and the minimum value of previous dwelling time

value.

It is defined as

(j)=)(t

jd - min( )( ntjd ) (4.5)

The difference value will be checked for the following condition:

0, setfuture dwelling time as min( )( ntjd )

(j) = < 0, setfuture dwelling time as

)(tjd

If the difference value is negative, select the future dwelling time as

predicted value found by the prediction model and if it is zero or positive then

the future dwelling time is selected as min( )( ntjd ) value. This is done

because to avoid the incompletion of job work due to the reliving of resources

before its registered dwelling time.

4.4 EVALUATION METHODOLOGY

The evaluation methodology is designed as follows

1. Choose a resource Rj from resource database to be allocated

93

2. Check the history size of that resource for the past N events

(history size)

3. If the history size is zero (i.e. new resource which has not

previously serviced) no prediction is made and assume it has

minimum dwelling time (1 sec), and add to the sporadic type

resource.

4. If the history size is complete (i.e which contains N previous

events), apply the prediction function (linear regression) to the

history to determine the future dwelling time.

5. Update the database (history).

4.5 PERFORMANCE CRITERIA

To analyze the accuracy of a prediction method error of a single

prediction, mean square error, absolute prediction error, relative mean

prediction error are calculated.

The error of a single prediction is evaluated by comparing the

predicted dwelling time with the respective actual dwelling time.

Error = predicted – actual

= ( ) ( ) (4.6)

The mean square error (MSE) is defined by taking the average of

the square of the error function over N time is

MSE = ] (4.7)

94

The absolute prediction error is

= ( ) ( ) (4.8)

The relative mean prediction error is

= ( ) (4.9)

where N is the number of predictions taken for the dataset, ( ) is the

actual dwelling time of the resource ( ), and ( ) is the predicted value of

dwelling time of resource ( ).

4.6 RESOURCE ALLOCATION

Resource dwelling time prediction is based on the historical data of

every grid resource to predict the future dwelling time. In the previous chapter

the resources are classified as permanent resources, Semi permanent resources

and sporadic resources based on their dwelling time in the system. As the

reliability fact for the permanent resources are high there is no need for

prediction of availability for the future. But the dwellability of the semi

permanent and sporadic types of resources is varying with respect to time, it is

important to predict their dwelling time so that the scheduling performance

can be improved. The dwelling time predictor predicts the upcoming dwelling

time for the particular resource and given to the Fuzzy Inference System as

one of the input. The three input variables are i) priority of job ii) requirement

time which is demanded by the jobs iii) futuredwelling time of resource

respectively, and an output variable; and the rules take the form of thenif

statements. The fuzzy inference system is inferred from the input vector based

on a set of rules. Then the output of inference system is compared with the

95

different threshold score values; if it is above a threshold score, the

corresponding job should be allotted to the resource immediately and stores

the details for the next scheduling. Each input variable carries three values,

which are termed as min, mid and max.

4.6.1 Experimental Setup

The proposed resource allocation technique was implemented in the

working platform of MATLAB (version 7.10) with system specifications,

Intel (R) core i5 CPU, 3.20GHz and 3GB RAM. The performance of the

prediction technique was analyzed by executing with different 5 synthetic job

datasets.The five datasets have been taken from the previous chapter for

simulation. This technique is compared with the existing resource allocation

techniques using the performance measures such as utilization rate, failure

rate and makespan. The main requirement for the resource allocation

technique is the historical dataset. Here the historical dataset is simulated with

N=10time slots.

4.6.2 Prediction Model Analysis

The linear time series prediction for DARA is evaluated in terms of

Mean Square Error (MSE),Relative Mean Prediction Error (RMPE)and %

prediction accuracy. The MSEof the predicted sporadic and semi-permanent

type of CPU, Memory Storage and Disk Storage resources are shown in

Figure 4.3. The history size ‘N’ of values 1, 2,3,…,10 are used to evaluate the

errors and accuracy.

96

(a) CPU resource MSE

(b) Memory resource MSE

(c) Disk storage resource MSE

Figure 4.3 Mean square error of resources (a) CPU, (b) Memoryand(c) Disk storage

1 2 3 4 5 6 7 8 9 100.5

1

1.5

2

2.5

3

3.5

History Size "N"

Semi PermanentSporadic

1 2 3 4 5 6 7 8 9 100.5

1

1.5

2

2.5

3

History Size "N"

Semi permanentSporadic

1 2 3 4 5 6 7 8 9 100.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

History Size "N"


97

Figure 4.3 shows the overall performance of the dwelling time

predictions interms of MSE (Equation (4.7) for different history size. From

this figure, it is understood that sporadic type have more mean square error

than semi permanent. The average of all dataset sporadic resources is plotted

and it ranges from 1.45 to 3.25 for CPU resources, 1.2 to 2.9 for memory

resources and 1.22 to 2.5 for disk storage resources. But for semi permanent

resources it ranges from 0.6 to 3.1 for CPU resources, 0.7 to 2.5 for memory

resources and 0.8 to 2.4 for disk storage resources. The error of sporadic

resources is more when compare to semi permanent resources. This shows

more dynamism for sporadic than semi permanent.

The average relative mean prediction errors (RMPE) of all datasets

for different types of resources are shown in Figure 4.4.

(a) CPU resource RMPE

Figure 4.4 (Continued)

1 2 3 4 5 6 7 8 9 10

0.2

0.25

0.3

0.35

0.4

0.45

0.5

History Size "N"


98

(b) Memory resource RMPE

(c) Disk Storage RMPE

Figure 4.4 Relative mean prediction error of (a) CPU, (b) Memory and (c) Disk

Figure 4.4 shows the overall performance of the dwelling time

predictions in terms of relative mean prediction error (RMPE) as a

function of the history size for all three datasets. For N is 1, the relative mean

1 2 3 4 5 6 7 8 9 100.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

History Size "N"


1 2 3 4 5 6 7 8 9 100.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

History Size "N"


99

prediction error is more and as N increases the error is also reduced. But for

higher values the RMPE value is almost constant. Hence the optimum history

size range is between 4 and 6.

The average of prediction accuracy of sporadic and semi permanent

resources is shown in Figure 4.5.

Figure4.5Averageprediction accuracy of resources

To understand the quality of dwelling time predictions, it is

important to examine not only the mean square error and relative mean

prediction error but also the prediction accuracy of all resources. Figure 4.4

depicts the accuracy of predictions in percentage with respect to history size.

For predictions of size 1,2,…10 events the graph shows an increase in

accuracy and for history size between 4 and 6 the accuracy is stabilized.

1 2 3 4 5 6 7 8 9 1076

78

80

82

84

86

88

90

History Size "N"

SporadicSemi permanent

100

4.6.3 Prediction Based Scheduling Analysis

Resource dwelling time prediction can improve performance of

scheduling and this section describes the influence of predictions on grid

scheduling.A quantitative analysis is made based on dwelling time based

fuzzy scheduling with the same five datasets generated in the previous

chapter. The scheduling quality is evaluated according to average job

makespan, resource utilization and the failure rate. Makespan is calculated for

each job as the time from submission to completion and this value is averaged

across all jobs. Resource time utilization indicates the ratio of number of

resources allocated to total number of resources. InDARA technique, the

fuzzy threshold filtering has been performed in two locations. One is at the

point of evaluating the fuzzy score that deals with the sporadic resource type

and the other is at the point of evaluating the fuzzy score that deals with semi-

permanent resource type. But in this chapter, the threshold values (Sth-

IIIandSth-II) are varied from 0.3 to 0.7 keeping the job and resource scenarios

for five different datasets as mentioned in chapter 3. The compilation of the

scenario conditions are presented in the Table 4.1. The performance metrics

from the simulation for the all the datasets under prediction model for

different threshold values, with respect to utilization, failure rate and

makespan are tabulated inTables 4.2 to 4.6.

Table 4.1 Compilation of scenarios for Datasets I-V

Type Dataset-I Dataset-II Dataset-III Dataset-IV Dataset-V

Resource Jobs Resource Jobs Resource Jobs Resource Jobs Resource Jobs

CPU Equal Equal Less More More Equal More less Equal More

Memory Equal Equal More Less Less Equal less more Equal More

Disk Equal Equal more less less Equal less more Equal less

101

Table 4.2Performancemetrics for Dataset-I

S.No Sth-III Sth-IIUtilization

(in %) Failure rate

(in %) Makespan

(in sec) 1 0.3 0.3 72.55 27.45 96.492 0.3 0.4 76.67 23.45 80.7273 0.3 0.5 73.78 26.22 76.8344 0.3 0.6 75.66 24.34 82.5925 0.3 0.7 74.97 25.03 83.6736 0.4 0.3 70.63 29.37 81.0357 0.4 0.4 74.42 25.58 84.9028 0.4 0.5 75.72 24.28 84.4749 0.4 0.6 74.19 25.81 91.765

10 0.4 0.7 73.92 26.08 82.56511 0.5 0.3 73.74 26.26 94.16812 0.5 0.4 73.53 26.47 82.21113 0.5 0.5 72.31 27.69 65.0114 0.5 0.6 74.7 25.3 69.3915 0.5 0.7 72.53 27.47 85.69516 0.6 0.3 71.46 28.54 73.4217 0.6 0.4 74.43 25.57 87.6518 0.6 0.5 71.09 28.91 73.92119 0.6 0.6 76.62 23.38 77.2920 0.6 0.7 72.73 27.27 82.21121 0.7 0.3 75.66 24.34 85.69522 0.7 0.4 74.64 25.36 79.70623 0.7 0.5 70.11 29.89 98.9824 0.7 0.6 72.8 27.2 72.86125 0.7 0.7 71.56 28.44 91.953

102

Table4.3Performancemetrics for Dataset-II


(in %) Failure rate

(in %) Makespan

(in sec) 1 0.3 0.3 57.12 42.88 110.62

2 0.3 0.4 58.16 41.84 146.81

3 0.3 0.5 59.3 40.7 146.81

4 0.3 0.6 58.02 41.98 109.76

5 0.3 0.7 49.23 50.77 109.75

6 0.4 0.3 53.98 46.02 102.8

7 0.4 0.4 60.11 39.89 136.52

8 0.4 0.5 57.3 42.7 131

9 0.4 0.6 51.7 48.3 112.52

10 0.4 0.7 59.4 40.6 119.54

11 0.5 0.3 56.83 43.17 148.03

12 0.5 0.4 56.48 43.52 122.97

13 0.5 0.5 58.5 41.5 117.12

14 0.5 0.6 55.71 44.29 130.95

15 0.5 0.7 54.8 45.2 114.91

16 0.6 0.3 50.83 49.17 105.75

17 0.6 0.4 52.79 47.21 118.32

18 0.6 0.5 51.66 48.34 141.32

19 0.6 0.6 59.3 40.7 142.54

20 0.6 0.7 50.4 49.6 122.24

21 0.7 0.3 63.5 36.5 126.18

22 0.7 0.4 55.24 44.76 123.93

23 0.7 0.5 50.13 49.87 137.21

24 0.7 0.6 57.31 42.69 105.52

25 0.7 0.7 59.8 40.2 130.62

103

Table4.4Performancemetrics of Dataset-III

S.No Sth-III Sth-II Utilization (in %)

Failure rate (in %)

Makespan (in sec)

1 0.3 0.3 70.21 29.79 123.56

2 0.3 0.4 66.6 33.4 159.35

3 0.3 0.5 66.15 33.85 112.22

4 0.3 0.6 72.7 27.3 127.18

5 0.3 0.7 65.35 34.65 165.01

6 0.4 0.3 65.14 34.86 153.82

7 0.4 0.4 64.88 35.12 129.45

8 0.4 0.5 69.3 30.7 143.06

9 0.4 0.6 64.29 35.71 182.55

10 0.4 0.7 63.94 36.06 138.63

11 0.5 0.3 67.5 32.5 185.86

12 0.5 0.4 63.31 36.69 121.92

13 0.5 0.5 66.6 33.4 110.2214 0.5 0.6 62.04 37.96 132.54

15 0.5 0.7 61.78 38.22 154.61

16 0.6 0.3 61.32 38.68 157.34

17 0.6 0.4 60.98 39.02 109.3

18 0.6 0.5 62.8 37.2 181.79

19 0.6 0.6 60.03 39.97 127.62

20 0.6 0.7 59.76 40.24 139.04

21 0.7 0.3 59.42 40.58 129.17

22 0.7 0.4 65.9 34.1 120.42

23 0.7 0.5 58.13 41.87 183.84

24 0.7 0.6 63.7 36.3 133.31

25 0.7 0.7 61.3 38.7 163.75

104

Table4.5 Performance metrics of Dataset-IV


(in %) Failure rate

(in %) Makespan

(in sec) 1 0.3 0.3 67 33 192.55

2 0.3 0.4 66.34 33.66 167.34

3 0.3 0.5 59.33 40.67 137.18

4 0.3 0.6 65.08 34.92 143.31

5 0.3 0.7 64.35 35.65 174.6

6 0.4 0.3 63.75 36.25 182.02

7 0.4 0.4 63.18 36.82 127.5

8 0.4 0.5 62.6 37.4 140.69

9 0.4 0.6 65.49 34.51 123.95

10 0.4 0.7 64 36 160.16

11 0.5 0.3 60.37 39.63 162.38

12 0.5 0.4 69.51 30.49 183.09

13 0.5 0.5 65.9 34.1 125.32

14 0.5 0.6 57.15 42.85 137.05

15 0.5 0.7 58.9 41.36 123.62

16 0.6 0.3 55.24 44.76 192.52

17 0.6 0.4 55.09 44.91 148.51

18 0.6 0.5 64.08 35.92 192.26

19 0.6 0.6 53.59 46.41 128.32

20 0.6 0.7 62.32 37.68 132.13

21 0.7 0.3 51.13 48.87 132

22 0.7 0.4 50.75 49.25 172.6

23 0.7 0.5 49.56 50.44 126.2

24 0.7 0.6 58.65 41.35 160.12

25 0.7 0.7 47 53 133.33

105

Table4.6 Performance metrics of Dataset-V


(in %) Failure rate

(in %) Makespan

(in sec) 1 0.3 0.3 66.67 23.33 187.21

2 0.3 0.4 71.73 28.27 177.21

3 0.3 0.5 67.8 32.2 155.6

4 0.3 0.6 70.93 29.07 157.2

5 0.3 0.7 67.9 32.1 185.13

6 0.4 0.3 68.23 31.77 137.14

7 0.4 0.4 67.09 32.91 141.02

8 0.4 0.5 70.93 29.07 143.13

9 0.4 0.6 65.19 34.81 140.22

10 0.4 0.7 67.3 32.7 155.45

11 0.5 0.3 64.26 35.74 137.12

12 0.5 0.4 62.85 37.15 179.61

13 0.5 0.5 68.6 31.4 139.6

14 0.5 0.6 61.96 38.04 140.02

15 0.5 0.7 69.59 30.41 188.73

16 0.6 0.3 65.8 34.2 180.34

17 0.6 0.4 60.26 39.74 155.45

18 0.6 0.5 70.18 29.82 151.81

19 0.6 0.6 59.81 40.19 175.14

20 0.6 0.7 64.76 35.24 141.2

21 0.7 0.3 58.55 41.45 187.02

22 0.7 0.4 63.7 36.3 165.61

23 0.7 0.5 57.52 42.48 188.92

24 0.7 0.6 57.89 42.11 167.88

25 0.7 0.7 66.8 33.2 197.7

106

Table 4.2 shows for all possible Sth-IIIandSth-II, threshold values and its corresponding utilization, failure and makespan for the first scenario dataset. It is observed that the maximum resource utilization achieved in the sporadic and semi-permanent thresholds of 0.3 and 0.4 respectively as 76.67%. But for these thresholds the makespan is 96.49 seconds. The next maximum utilization is 76.62% for 0.6 and 0.6 and makespan for this threshold is 77.29 seconds. The minimum makespan achieved as 65.01 seconds at the thresholds of 0.5 and 0.5 and the corresponding utilization is 72.31% which is not maximum value. This indicates that there is no single threshold for achieving maximum utilization and minimum makespan. For the first dataset equal number of resources and jobs has been taken.

For the second dataset(Table 4.3) the resource allocation is performed and the average utilization of all resources, failure rate and makespan are shown in Table4.3. The maximum utilization is 63.5% at 0.7 and 0.3 and the minimum makespan is 102.8 seconds at 0.4 and 0.3 thresholds respectively. The makespan at 0.7 and 0.3 is 126.18 seconds and at 0.4 and 0.3 the utilization is 53.98 % which is almost 10% is reduced from maximum. This infers that less number of CPU resources make the sporadic threshold as high as 0.7.

The third dataset(Table 4.4) maximum utilization is 72.7% at 0.3 and 0.6 and the minimum makespan is 109.3 seconds at 0.6 and 0.4 thresholds respectively. The makespan at 0.3 and 0.6 is 127.18 seconds and 60.98% utilization at 0.6 and 0.4.

The maximum utilization is 67% at 0.3 and 0.3 and minimum makespan is 123.62 seconds at 0.5 and 0.7 thresholds respectively for fourth dataset(Table 4.5). The fifth dataset(Table 4.6) maximum utilization is 71.73% at 0.3 and 0.4 and the minimum makespan is 137.12 seconds at 0.5 and 0.3 thresholds respectively. The makespan at 0.3 and 0.4 is 177.21 seconds and 64.26% utilization at 0.5 and 0.3.

107

After analyzing all these five datasets, it has been seen that none of the threshold value set (Sth-IIIandSth-II) is repeated for any best utilization and makespan. The makespan value shows uncertain variation and lessening the threshold will increase the utilization rate and hence minimizes the failure rate. For every dataset, different utilization and different makespanare achieved for different threshold values. To come to a conclusion about the common threshold value, the following Tables 4.7 and 4.8 can be used.

Table 4.7 Thresholds for maximum resource utilization

Sth-III Sth-II Max. Utilization (%) Dataset-I 0.3 0.4 76.55Dataset-II 0.7 0.3 63.5Dataset-III 0.3 0.6 72.7Dataset-IV 0.3 0.3 67Dataset-V 0.3 0.4 71.73

Table4.8Thresholds for minimum makespan

Sth-III Sth-II Min. Makespan(Sec) Dataset-I 0.5 0.5 65.01Dataset-II 0.4 0.3 102.8Dataset-III 0.6 0.4 109.3Dataset-IV 0.5 0.7 123.62Dataset-V 0.5 0.3 137.12

4.6.4 Comparative Analysis

In chapter 3 the performance evaluation was evaluated by fixing the

fuzzy threshold as 0.5 for both Sth-III (Sporadic) and Sth-II(Semi-permanent) for

DARA. To compare the efficiency of LPM-DARA with DARA, the

performance metricsof bothSth-III andSth-II for 0.5 threshold values from the

Tables 4.2 to 4.6 has been taken and presented in the Figure 4.6.

108

0

10

20

30

40

50

60

Dataset-I Dataset-II Dataset-III Dataset-IV Dataset-V

Failu

reRa

tein

%

Failure Rate

DARA

LPM-DARA

(a) Resource Utilization

(b) Makespan

(c) Failure Rate

Figure 4.6Performancemetrics for five datasets at 0.5

0

20

40

60

80


Util

izat

ion

in%

Resource Utilization

DARA

LPM-DARA

0

50

100

150

200


Mak

espa

nin

Sec

Makespan

DARA

LPM-DARA

109

From Figure 4.6 the utilization of the resources is greatly increased

when compare to the resource utilization of DARA technique. At least 4% of

the resource time is utilized more in predicted output. For dataset-I the

utilization is maximum when compare toother datasets for both types of

resource allocation techniques. The utilization is decreased half of N but for

with prediction only 20% is decreased. It shows that if the scheduling is done

with prediction better utilization as well as the minimum failure rate is

achieved.

To further analysis, the outputs of the LPM-DARA technique in

terms of the performance metrics of Sth-III are averaged individually for the

clarity. The averaging has been done only for the sporadic resources because

of the high uncertainty and dynamic nature of the dwelling time. So it is

logically fit to analyze the sporadic resources to get a better understanding of

the scheduler. Based on the averaged value of each category, the performance

metrics are presented in Figure 4.7 (a), (b) and (c).

(a)

Figure 4.7 (Continued)

0.3 0.4 0.5 0.6 0.740

45

50

55

60

65

70

75

80

Util

izat

ion

in%

STh-III

Dataset I Dataset II Dataset III Dataset IV Dataset V

110

(b)

(c)

Figure 4.7 Compilation of performance metrics for (a) Resource utilization in %, (b) Makespan in seconds and (c) Failure rate in %.

0.3 0.4 0.5 0.6 0.7

80

90

100

110

120

130

140

150

160

170

180

Mak

espa

nin

sec

STh-III


0.3 0.4 0.5 0.6 0.70

5

10

15

20

25

30

35

40

45

50

Failu

rera

te

Sth-III


111

The utilization of the resources is inbetween 40% and 75% for all

the data sets. It shows that the increase in fuzzy threshold values affects the %

utilization linearly. This may be due to the probability of allocating the jobs is

more to sporadic resources if the value is fixed in 0.3. But, the increasing in

threshold value generate the situation where more jobs will be allotted to

semi-permanent resources, so the job allocation is limited to sporadic

resources and hence the % utilization comes down due to unutilized sporadic

resources. The failure rates are in compliance with the same conclusion. In

general, the dataset I, III and V produced less utilization which may be due to

the CPU shortage. The makespan data reveals that dataset I and II are

showing less value and others in the range of 130-180 seconds. It may be due

to the mismatch between distributions of resources and jobs under dynamic

environment.

4.7 SUMMARY

Linear prediction algorithms predict dwelling time using historical

data without requiring detailed knowledge of the underlying hardware and the

application. A set of past observations are kept for each machine and these are

used to make predictions of new incoming resources. The prediction made is

used to assist scheduler when allocating resources to the job. Statistical

algorithms are able to make better predictions as the number of past

observations increases. Linear predictions have their drawbacks as the

accuracy of their prediction depends on how well the past observations are

reflective of future incoming jobs.

CHAPTER 4 LINEAR REGRESSION PREDICTION OF DWELLING...

Documents

Transcript of CHAPTER 4 LINEAR REGRESSION PREDICTION OF DWELLING...