A Hybrid Method for Short-Term Host Utilization Prediction...

15
Research Article A Hybrid Method for Short-Term Host Utilization Prediction in Cloud Computing Jing Chen 1,2 and Yinglong Wang 2 1 College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China 2 Shandong Provincial Key Laboratory of Computer Networks, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250014, China Correspondence should be addressed to Yinglong Wang; [email protected] Received 24 June 2018; Accepted 17 February 2019; Published 14 March 2019 Academic Editor: Sos S. Agaian Copyright © 2019 Jing Chen and Yinglong Wang. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Dynamic resource scheduling is a critical activity to guarantee quality of service (QoS) in cloud computing. One challenging problem is how to predict future host utilization in real time. By predicting future host utilization, a cloud data center can place virtual machines to suitable hosts or migrate virtual machines in advance from overloaded or underloaded hosts to guarantee QoS or save energy. However, it is very difficult to accurately predict host utilization in a timely manner because host utilization varies very quickly and exhibits strong instability with many bursts. Although machine learning methods can accurately predict host utilization, it usually takes too much time to ensure rapid resource allocation and scheduling. In this paper, we propose a hybrid method, EEMD-RT-ARIMA, for short-term host utilization prediction based on ensemble empirical mode decomposition (EEMD), runs test (RT), and autoregressive integrated moving average (ARIMA). First, the EEMD method is used to decompose the nonstationary host utilization sequence into relatively stable intrinsic mode function (IMF) components and a residual component to improve prediction accuracy. en, efficient IMF components are selected and then reconstructed into three new components to reduce the prediction time and error accumulation due to too many IMF components. Finally, the overall prediction results are obtained by superposing the prediction results of three new components, each of which is predicted by the ARIMA method. An experiment is conducted on real host utilization traces from a cloud platform. We compare our method with the ARIMA model and the EEMD-ARIMA method in terms of error, effectiveness, and time-cost analysis. e results show that our method is a cost-effective method and is more suitable for short-term host utilization prediction in cloud computing. 1. Introduction Cloud computing assembles a large number of computing, storage, and network resources into a data center. ese re- sources are cut and allocated efficiently to satisfy users’ resource demands through virtualization technology. In addition to rich resources, cloud computing also provides a pay-as-you-go model. Users can rent various resources as they demand, which reduces their costs. ese characteristics of rich re- sources, on-demand resource provision, and low costs prompt cloud computing to be widely applied in various domains. However, it is still a challenge to allocate and schedule resources effectively to improve resource utilization and guarantee QoS. e general process of resource allocation and sched- uling in cloud computing is shown in Figure 1. When a new virtual machine (VM) request is initiated, the cloud data center selects a suitable physical host to allocate resources for this VM according to a specified resource allocation policy. is policy can maximize resource utilization per host to minimize the number of active hosts or balancing resource utilization of all active hosts. Whichever policy you use, it is important to know the future host utilization for the se- lection of a suitable host. Additionally, VM migration is also an effective method for resource scheduling. When the host utilization exceeds a predefined threshold, the performance of VMs running on this host will decrease. It will not Hindawi Journal of Electrical and Computer Engineering Volume 2019, Article ID 2782349, 14 pages https://doi.org/10.1155/2019/2782349

Transcript of A Hybrid Method for Short-Term Host Utilization Prediction...

Page 1: A Hybrid Method for Short-Term Host Utilization Prediction ...downloads.hindawi.com/journals/jece/2019/2782349.pdf · terminesthe migratedVMs according to powersavings and workloadbalance.Dabbaghetal.[13]proposedaprediction

Research ArticleA Hybrid Method for Short-Term Host Utilization Prediction inCloud Computing

Jing Chen 12 and Yinglong Wang 2

1College of Computer Science and Engineering Shandong University of Science and Technology Qingdao 266590 China2Shandong Provincial Key Laboratory of Computer NetworksShandong Computer Science Center (National Supercomputer Center in Jinan)Qilu University of Technology (Shandong Academy of Sciences) Jinan 250014 China

Correspondence should be addressed to Yinglong Wang wangylscsc126com

Received 24 June 2018 Accepted 17 February 2019 Published 14 March 2019

Academic Editor Sos S Agaian

Copyright copy 2019 Jing Chen and Yinglong Wang +is is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in anymedium provided the original work isproperly cited

Dynamic resource scheduling is a critical activity to guarantee quality of service (QoS) in cloud computing One challengingproblem is how to predict future host utilization in real time By predicting future host utilization a cloud data center can placevirtual machines to suitable hosts or migrate virtual machines in advance from overloaded or underloaded hosts to guarantee QoSor save energy However it is very difficult to accurately predict host utilization in a timely manner because host utilization variesvery quickly and exhibits strong instability with many bursts Although machine learning methods can accurately predict hostutilization it usually takes too much time to ensure rapid resource allocation and scheduling In this paper we propose a hybridmethod EEMD-RT-ARIMA for short-term host utilization prediction based on ensemble empirical mode decomposition(EEMD) runs test (RT) and autoregressive integrated moving average (ARIMA) First the EEMDmethod is used to decomposethe nonstationary host utilization sequence into relatively stable intrinsic mode function (IMF) components and a residualcomponent to improve prediction accuracy +en efficient IMF components are selected and then reconstructed into three newcomponents to reduce the prediction time and error accumulation due to too many IMF components Finally the overallprediction results are obtained by superposing the prediction results of three new components each of which is predicted by theARIMAmethod An experiment is conducted on real host utilization traces from a cloud platformWe compare our method withthe ARIMA model and the EEMD-ARIMA method in terms of error effectiveness and time-cost analysis +e results show thatour method is a cost-effective method and is more suitable for short-term host utilization prediction in cloud computing

1 Introduction

Cloud computing assembles a large number of computingstorage and network resources into a data center +ese re-sources are cut and allocated efficiently to satisfy usersrsquo resourcedemands through virtualization technology In addition to richresources cloud computing also provides a pay-as-you-gomodel Users can rent various resources as they demandwhich reduces their costs +ese characteristics of rich re-sources on-demand resource provision and low costs promptcloud computing to be widely applied in various domainsHowever it is still a challenge to allocate and schedule resourceseffectively to improve resource utilization and guarantee QoS

+e general process of resource allocation and sched-uling in cloud computing is shown in Figure 1 When a newvirtual machine (VM) request is initiated the cloud datacenter selects a suitable physical host to allocate resources forthis VM according to a specified resource allocation policy+is policy can maximize resource utilization per host tominimize the number of active hosts or balancing resourceutilization of all active hosts Whichever policy you use it isimportant to know the future host utilization for the se-lection of a suitable host Additionally VMmigration is alsoan effective method for resource scheduling When the hostutilization exceeds a predefined threshold the performanceof VMs running on this host will decrease It will not

HindawiJournal of Electrical and Computer EngineeringVolume 2019 Article ID 2782349 14 pageshttpsdoiorg10115520192782349

guarantee the QoS of applications running on these VMs+erefore it is necessary to migrate some hotspot VMs fromone overloaded host to other nonoverloaded hosts Similarlyif the host utilization is below a predefined threshold allVMs on this host will be migrated to other hosts +us thishost can be closed to reduce energy consumption

VM migration is a reactive method that cannot beinitiated until the host is overloaded or underloaded+erefore it is very important to detect when the host isoverloaded or underloaded Most existing approachesmonitor host utilization to determine its state If its resourceutilization exceeds a predefined threshold during an ob-servation period this host is overloaded If its resourceutilization is always below a predefined threshold during anobservation period it is declared underloaded Basically itusually takes some time to migrate VMs from an overloadedhost to other hosts If the host utilization changes faster thanthe provision time of the resources users will suffer poorQoS until resources are available In addition host under-load detection based on a single host utilization value alsoleads to unnecessary VM migration and stability problems+ese problems can be addressed via proactive methods thatactively predict short-term host utilization to allocate re-sources in advance For example if the host utilizationwithin the future 15minutes is always over 80 this hostwill be overloaded +erefore the VMs should be migratedin advance from this host to other hosts to ensure QoS If thehost utilization within the current and future 1 hour is al-ways below 15 this host is underloaded and should beclosed to save energy after VM migration However a largenumber of random resource demands and concurrent accessto applications cause stochastic volatility of host utilization+ey change very fast and exhibit strong instability withmany bursts It is difficult to predict short-term host utili-zation in a timely and accurate manner based on such data

Although some machine learning methods such as aneural network (NN) [1 2] support vector regression (SVR)[3] and backpropagation neural networks (BPNN) [4]achieve good prediction accuracy in cloud computing theyrequire too much time to train a model to allocate resourcesrapidly Line regression (LR) can implement predictionmore quickly than ARIMA but it demands that the trainingdata have simpler behaviors ARIMA is a prediction model

for nonstationary time series but it is not suitable if a largeamount of random variation exists in the data In ourprevious work [5] we proposed a resource demand pre-diction method EEMD-ARIMA that combines the EEMDmethod and ARIMA model to predict future resource de-mands +is method first uses the EEMD method to de-compose the original resource demand sequence intomultiple IMF components and the residual (R) componentNext we forecast the future values of each component by theARIMA model Finally the overall forecasting results areobtained by superposing the forecasting results of eachcomponent Although this method alleviates random vari-ation of resource demands and improves prediction accu-racy by combining EEMD and ARIMA methods twoproblems arise One is the prediction error accumulationcaused by the superposition of ARIMA prediction of allcomponents +e ARIMA prediction of each componentdecomposed by the EEMD method generates a certainprediction error +e superposition of the prediction resultsof all components leads to the prediction error accumula-tion Another is the high time cost due to EEMD de-composition and ARIMA prediction of too manydecomposition components +e ARIMA prediction of eachcomponent takes some time +us the total time of theARIMA prediction of multiple components greatly increasescompared with a single ARIMA prediction of the originalsequence

To solve these problems of the EEMD-ARIMA methodthis paper further proposes a hybrid method EEMD-RT-ARIMA for short-term host utilization prediction that notonly further improves prediction accuracy by combining theEEMD method with the ARIMA model but also reducestime cost by selecting and reconstructing efficient IMFcomponents +e comparison and evaluation are madeamong our EEMD-RT-ARIMA method ARIMA modeland EEMD-ARIMA method in terms of error effectivenessand time-cost analysis

2 Related Works

Many studies have been conducted on various predictions incloud computing From the perspective of research objec-tives some researchers have studied server load prediction

Physical host 1 (active)

Physical host n (closed)

VM3 VM4 NewVM5 VM3 VM4VM2VM1VM5

VM2VM1

Physical host m (active)

VM4 VM5

VM1 VM2

Virtual machine migration

Resourceallocation

VM1 VM2 VM3

Virtual machine migration

Physical host 2 (active)

Figure 1 Resource allocation and scheduling process

2 Journal of Electrical and Computer Engineering

[6ndash10] VM load prediction [11 12] VM utilization pre-diction [13 14] host utilization prediction [15] web ap-plication workload prediction [16] cloud service workloadprediction [17ndash19] workflow workload prediction [20]service quality prediction [21] and workload characteriza-tion [22ndash24] Toumi et al [6] described a server loadaccording to the submitted task types and the submissionrate and applied a stream mining technique to predict serverloads Jheng et al [11] proposed a VM workload predictionmethod based on the gray forecasting model which de-termines the migrated VMs according to power savings andworkload balance Dabbagh et al [13] proposed a predictionapproach that uses Wiener filters to predict the future re-source utilization of VMs Mason et al [15] predicted hostCPU utilization for a short time using evolutionary neuralnetworks which showed a high prediction accuracy and ahigh degree of generality In this paper we focus on hostutilization prediction using EEMD and ARIMA methods tonot only improve prediction accuracy but also reduceprediction time as much as possible

From the perspective of approaches predictionmethods are usually divided into two categories One isbased on machine learning methods Tseng et al [25]proposed a prediction method for CPU and memoryutilization of VMs and physical machines based on a ge-netic algorithm (GA) which precedes the gray model understable tendency and unstable tendency in terms of pre-diction accuracy Shyam and Manvi [26] proposed a short-and long-term prediction model of virtual resource re-quirements for CPUmemory-intensive applications basedon Bayesian networks where the relationships and de-pendencies between variables are identified to facilitateresource prediction Lu et al [27] proposed a workloadprediction model RVLBPNN (Rand Variable Learning RateBackpropagation Neural Network) based on BPNN algo-rithm which achieves higher prediction accuracy than thehidden Markov model and the naive Bayes classifier +ismethod not only predicts CPU-intensive and memory-intensive workloads but also improves prediction accu-racy by using the intrinsic relations among the arrivingcloud workloads Rajaram and Malarvizhi [28] comparedthe prediction accuracies of a few machine learningmethods such as LR SVR and multiplayer perceptron Liand Zhang [29] proposed an optimal combination pre-diction method for resource demands which combines theinduced ordered weighted geometry averaging operatorand the generalized dice coefficient with the improvedElman neural network and gray model to enhance theprediction accuracy Minarolli and Freisleben [30] pre-sented a cross-correlation prediction approach based onsupport vector machine (SVM) which considers the crossrelation of VMs running the same application to improveprediction accuracy Zhang et al [31] proposed a deepbelief network- (DBN-) based prediction approach of cloudresource requests in which orthogonal experimental designand analysis of variance are used to enhance the predictionaccuracy Compared with the ARIMA model this methodgreatly reduces mean square error (MSE) by over 60 forCPU and RAM request predictions Although machine

learning methods are effective in improving predictionaccuracy they are complex and usually demand a largenumber of data to extract features and train a model Itrequires too much time for the prediction to guarantee QoSof the running applications Cloud computing requires asimple and rapid host utilization prediction method tosupport resource allocation and scheduling

Another method is based on statistical methods such asBrownrsquos quadratic exponential smoothing method [32]autoregressive integrated moving average (ARIMA) model[33ndash35] and the kernel canonical correlation analysis [36]Tran et al [37] applied the ARIMA model in the long-termprediction of server workload while our method aims topredict short-term host utilization It is more difficult be-cause host utilization can be extremely random and non-stationary in a short time Calheiros et al [33] proposed ashort-term prediction model of cloud workload using theARIMAmodel and evaluated the prediction accuracy and itsimpact on user applicationsrsquo QoS +ey suggested that usersrsquobehaviors must be considered to reflect real conditions inworkload simulation Our method combines the ARIMAmodel with EEMD and RT methods to improve predictionaccuracy and reduce prediction time as much as possible Itis compared with EEMD-ARIMA and ARIMA methods interms of error effectiveness and time-cost analysis

Moreover some studies combine the ARIMA modelwith other techniques to improve prediction accuracy Xuet al [38] constructed a model GFSS-ANFISSARIMAcombining the seasonal ARIMA model with the general-ized fuzzy soft sets and adaptive neuro-fuzzy inferencesystem +is model improves the prediction accuracy ofresource demands Li et al [39] proposed a workloadpredictor combined with ARIMA and dynamic errorcompensation to reduce the service-level agreement (SLA)default rate Fu and Zhou [40] proposed a predicted af-finity model to implement VM placement which uses theresource demands predicted by the ARIMA model tocalculate a VM-host affinity value Jiang et al [41] pre-sented a self-adaptive ensemble prediction method forcloud resource demands which uses a two-level ensemblemethod to predict VM demands based on a historic timeseries +is method not only combines multiple predictionmethods moving average (MA) autoregressive (AR)artificial neural network (ANN) gene expression pro-gramming (GEP) and SVM but also adjusts the weight ofeach method adaptively to obtain the best average per-formance according to the relative errors In contrast ourmethod uses the EEMD method to deal with the non-stationary host utilization and then selects and re-constructs efficient components to improve predictionaccuracy and reduce the time cost +e EEMD proposed byWu and Huang [42] is an effective noise-aided method thatcan handle nonlinear and nonstationary time series It hasbeen widely used in wind speed forecasting [43 44]aircraft auxiliary power unit (APU) degradation pre-diction [45] turbine fault trend prediction [46] androlling bearing fault diagnosis [47] It has shown a goodeffect on enhancing the prediction accuracy Our methodalso uses EEMD to decompose the nonstationary host

Journal of Electrical and Computer Engineering 3

utilization for improving the prediction accuracy andfurther uses correlation coefficients RT values and av-erage periods to select and reconstruct efficient compo-nents for reducing prediction error accumulation andprediction time

3 Background

31 Empirical Mode Decomposition (EMD) EMD is amethod of signal processing that can decompose a signalinto multiple IMFs and an R trend item [48] Two conditionsmust be satisfied for an IMF

(1) +e number of extrema and zero-crossings musteither be identical or differ by at most one

(2) +e mean value of the envelopes of the local maximaand the local minima must be zero

EMD includes the following steps

Step 1 Make f(t) x(t) where x(t) is given as theoriginal dataStep 2 Find all the local maxima and minima of fi(t)where i is the loop times and its initial value is 1Interpolate between the local maxima and minima toobtain an upper envelope and a lower envelope andthen compute the mean value mi(t) of these envelopesStep 3 Compute the new component hi(t) fi(t)minusmi(t)Step 4 Verify whether hi(t) satisfies the above-mentioned two conditions for an IMF If it does notmake fi+1(t) hi(t) and repeat steps 2 and 3 If itsatisfies the condition hi(t) is regarded as the first IMFcomponent p1(t) where p1(t) hi(t) +en computethe R component by the formula r1(t) x(t)minusp1(t)Step 5 Repeat step 1ndash4 with r1(t) as the new data untilthe R is a monotonic function +us x(t) is decom-posed into n IMFs and an R as follows

x(t) 1113944n

i1pi(t) + r(t) (1)

32 Ensemble Empirical Mode Decomposition (EEMD)+e EMD method has a noticeable drawback of modemixing that can cause signal intermittency Wu and Huangproposed a new method named ensemble empirical modedecomposition (EEMD) to solve this problem Comparedwith the EMD method the EEMD method first executesthe decomposition process k times Each time it adds adifferent white noise to the signal and then decomposes thenew signal Generally the k iterations are set as an integerin the range [50 100] and the standard deviation d of thewhite noise is set as a value in the range [01 02] Next k

groups of decomposition results are obtained Each groupincludes n IMFs pmi(t)(i 1 n) and an R rm(t) wherem denotes the group number Finally the mean values ofthese groups of IMFs and Rs are calculated as the finalIMFs pi(t)(i 1 n) and the R r(t)

pi(t) 1113936

km1pmi(t)

k

r(t) 1113936

km1rm(t)

k

(2)

+e IMF components have three main characteristics

(1) Completeness the total of all IMFs and the R havethe same feature as the original data

(2) Orthogonality each IMF with a certain physicalmeaning is independent and has no effect on otherIMFs +e product of any two IMFs equals 0 inmathematics

(3) Adaptability an IMF with a higher frequency isdecomposed from the original data faster than thosewith low frequencies +e frequencies of IMFs reflectthe features of the original data

33 Runs Test (RT) RT is a nonparametric test method thatchecks the randomness of a sequence with only two symbolsor two values such as + and minus and 0 and 1 An RT is definedas a sequence with successive symbols (0 or 1) For examplea data sequence ldquo11110000011111000110010rdquo includes 8runs 4 of which involve successive ldquo1rdquo and the others involvesuccessive ldquo0rdquo RT can also be used to test a time series

Assume that M ltm1(t) mi(t) mn(t)gt de-notes a time series where mi(t) is an element of this timeseries and n is the total number of elements +e mean valueof these elements is calculated by the following formula

M 1n

1113944

n

i1mi(t) (3)

+en the element of this time series can be denoted asfollows

Gi mi(t)minusM 1 mi(t)geM

0 mi(t)ltM

⎧⎨

⎩ (4)

+us this time series is transformed into a sequence witha series of 0 and 1 in which the elements are independentand identically distributed +e total number of RT reflectsthe fluctuation of the sequence

4 A Hybrid Method for Short-Term HostUtilization Prediction

To improve prediction accuracy and reduce prediction timeof the EEMD-ARIMAmethod we propose a hybrid methodEEMD-RT-ARIMA for short-term host utilization pre-diction as shown in Figure 2 First the host utilization se-quence is decomposed into multiple IMF components andthe R component using the EEMD method Next we cal-culate the correlation coefficients between IMF componentsand the original data sequence to select the efficient IMFcomponents+en we use RTvalues and average periods toreconstruct these efficient IMF and R components intothree new components high-frequency and strong-

4 Journal of Electrical and Computer Engineering

volatility component medium-frequency and weak-volatility component and low-frequency trend compo-nent +en we use the ARIMA method to predict theresults of three new components Finally the overallprediction results are achieved by summing the predictionresults of the three new components

+e key to our EEMD-RT-ARIMA method is to selectand reconstruct efficient components Compared with theEEMD-ARIMA method the number of its componentsinvolving in ARIMA prediction is reduced +us theEEMD-RT-ARIMA method can reduce the prediction erroraccumulation and the total prediction time by reducing thenumber of components Obviously both the EEMD-RT-ARIMA method and EEMD-ARIMA method have a higherprediction time than the ARIMA model from their imple-mentation processes However our EEMD-RT-ARIMAmethod focuses on cost-effectiveness which has a trade-off among prediction accuracy effectiveness and time cost

41 Use of EEMD toDecompose theHostUtilization SequenceA host utilization sequence is classified into different cate-gories according to the CPU memory and disk such as CPUutilization sequence Tcpu c1 ci cn1113864 1113865 +e CPUutilization sequence is usually random and unstable owing torandom and sudden resource demands in cloud computing Itis necessary to transform such data into relatively stationarydata to improve prediction accuracy +e EEMD methodappears to be more effective in processing nonlinear andnonstationary data sequences than other decomposition al-gorithms+erefore we use the EEMDmethod to decomposethe host utilization sequence and obtain a series of the IMFi

components and the R componentA running example shows the nonstationary CPU uti-

lization trace of a physical host from our cloud platform We

divide it into the training set (673 data points) and thetesting set (24 data points) in Figure 3 +en we use theEEMD method to decompose the training set and obtainIMF1-IMF8 components and the R component +ey areshown from the high frequency to low frequency in Figure 4

42 Calculation of the Correlation Coefficients to Select Effi-cient IMF Components A correlation coefficient measuresthe correlation between two sequences We calculate thecorrelation coefficient Pj(X Y) between the IMFj compo-nent and the original training set based on the followingformula where cov(X Y) is a covariance between the se-quences X and Y and Var(X) and Var(Y) are the variancesof the sequence X and the sequence Y

Pj(X Y) Cov(X Y)

Var(X)

1113968 Var(Y)

1113968 (5)

+en the correlation coefficient Pj(X Y) is checked todetermine whether it is negative If it is negative the IMFj

component is inefficient and dropped If it is not negativethe IMFj component is efficient and reserved

We calculate the correlation coefficient between eachIMF component and the original training set Only IMF6and IMF7 have negative correlation coefficients of minus008 andminus015 Hence they are dropped IMF1ndashIMF5 and IMF8 areselected as efficient IMF components

43 Reconstruction of Efficient IMFs and R into NewComponents Each IMF component actually reflects a cer-tain physical feature of the original data If some IMFcomponents are closer in terms of frequency and amplitudefluctuation then they have similar features+us they can bereconstructed into a new component with these typical

hellip

Use EEMD to decompose the host utilization sequence

IMF1 IMF2 IMFhndash1 R

High-frequency and strong-volatility component

hellip

Medium-frequency and weak-volatility component

Low-frequency trend component

Use ARIMA to predict the result of this component

Use correlation coefficients to select efficient IMFs

IMF1 IMFm

Use RT values and average periods to reconstruct efficient IMFs and R intothree new components

Use ARIMA to predict the result of this component

Use ARIMA to predict the result of this component

Construct the overall prediction result by summing the prediction results of three new components

Figure 2 EEMD-RT-ARIMA method

Journal of Electrical and Computer Engineering 5

features +us the prediction error accumulation and theprediction time of the EEMD-ARIMA method can be re-duced by reducing the number of components

+e average period reflects the frequency of host utili-zation variation +ere exists a reciprocal relation betweenthem +e smaller the average period the higher the fre-quency If the average periods of IMF components are closerthey are closer in frequency +e average period is calculatedby the following formula in which n is the number of thetraining set and lj is the number of extrema

Tj n

lj (6)

Similarly the RT value reflects the trend of amplitudefluctuation If the RTvalue is larger the amplitude volatility isstronger If the RT values of the two IMFs are closer theoverall trend of the two IMFs is similar in amplitude volatility

To enhance the prediction accuracy and reduce theprediction time of the EEMD-ARIMA method we re-construct the IMF components and the R component intothree new components according to their average periodsand RT values in the EEMD-RT-ARIMA method Becausethe average period and RT value have different units wenormalize the average period Tj as follows

Tnj Tj minusTmin

Tmax minusTmin (7)

where Tnj denotes the normalized average period of theIMFj component Tmax and Tmin represent the maximumandminimum of the average periods of all IMF componentsSimilarly the RT value Rj can be normalized as follows

Rnj Rj minusRmin

Rmax minusRmin (8)

where Rnj is the normalized value of Rj Rmax and Rmin arethe maximum and minimum of all RT values +us thereconstruction factor (RF) is defined as follows

Fj α middot 1minusTnj1113872 1113873 + β middot Rnj (9)

An IMF component is higher in frequency and strongerin volatility and its RF value is greater If the RF values of thetwo IMF components are closer their overall trends aremore similar +us they can be reconstructed into a newcomponent All efficient IMF and R components arereconstructed into three new components high-frequencyand strong-volatility component medium-frequency andweak-volatility component and low-frequency trend com-ponent +e high-frequency and strong-volatility compo-nent reflects the strong volatility and randomness of thehigh-frequency part of the original host utilization sequence+e medium-frequency and weak-volatility componentshows the detailed features of the volatility of the originalhost utilization sequence +e low-frequency trend com-ponent depicts the overall trend of the volatility of theoriginal host utilization sequence

Table 1 shows the RT values average periods and RFvalues of efficient IMF and R components +e RF values ofIMF1 and IMF2 are large and relatively close while the RFvalues of IMF8 and R are equal to 0 +e RF values ofIMF3ndashIMF5 are close +erefore we reconstructed IMF1-IMF2 IMF3ndashIMF5 and IMF8-R into three new compo-nents as shown in Figures 5(a)ndash5(c) +ey separately reflectthe randomness the fluctuation details and the overall trendof the original host utilization sequence

44 Use of the ARIMA Model to Predict the Future HostUtilization We use the ARIMA model to predict the futureresults for each new component+en the overall predictionresults are obtained by superposing the prediction results ofeach new component+e ARIMA prediction is described asfollows (Algorithm 1)

For example we assume that three new components ChCm and Cl are obtained which represent the high-frequencyand strong-volatility component medium-frequency andweak-volatility component and low-frequency trend com-ponent respectively +en we use the ARIMA method topredict the future 24-point values for each new component+e prediction results Ph Pm and Pl of three new com-ponents can be described in the following formulas each ofwhich includes the values of the predicting 24-point data

Ph f1h f

2h f

24h1113872 1113873

Pm f1m f

2m f

24m1113872 1113873

Pl f1l f

2l f

24l1113872 1113873

(10)

Finally we calculate the overall prediction result P bysuperposing the prediction results of each new componentas follows

P Ph + Pm + Pl (11)

From this process of the EEMD-RT-ARIMAmethod wefind that the number of components decreases from 9 to 3which can reduce the total prediction time and the erroraccumulation of the component prediction compared withthe EEMD-ARIMA method

0 50 100 150 200 250 300 350 400 450 500 550 600 650 7005

10

15

20

25

30

Testing set

CPU

util

izat

ion

()

Sample data

Training set

Figure 3 CPU utilization trace of a physical host

6 Journal of Electrical and Computer Engineering

5 Experimental Setup

We conduct an experiment to evaluate our method +eexperimental dataset and measurement metrics are in-troduced as follows

51 Experimental Dataset Host utilization mainly involvesin CPU utilization memory utilization network utilizationand disk utilization In this paper we mainly focus on hostCPU utilization We randomly select CPU utilization tracesof 7 physical hosts from the dataset released by Alibaba inAugust 2017 [49] each of which includes 144 points(5minutes per point) +ese traces are all time-dependentsequences as shown in Figures 6(a)ndash6(g)

Each sequence is divided into a training set and a testingset We first use the training set to predict the future dataand then these predicting data are compared with thoseactual data in the testing set to evaluate our method In thispaper each training set is set as the first 120 points and thetesting set is defined as the subsequent points such as 6points 12 points and 24 points We set the number of it-erations k 50 and the standard deviation d 02 in EEMDdecomposition

52MeasurementMetrics We evaluate our method in termsof error effectiveness and time-cost analysis as follows

521 Error Analysis To evaluate our method we use themean absolute percentage error (MAPE) to reflect theprediction accuracy MAPE is defined as follows

MAPE 1m

1113944

m

i1

xfi minusxt

i

xti

⎛⎝ ⎞⎠lowast100 (12)

where xfi denotes the value of the prediction point xt

i

denotes the actual value in the testing set and m denotes the

0 100 200 300 400 500 600 700

ndash2

0

2

IMF1

CPU

util

izat

ion

()

(a)

0 100 200 300 400 500 600 700IMF2

ndash2

0

2

CPU

util

izat

ion

()

(b)

0 100 200 300 400 500 600 700IMF3

ndash2

0

2

CPU

util

izat

ion

()

(c)

0 100 200 300 400 500 600 700IMF4

ndash2

0

2

CPU

util

izat

ion

()

(d)

0 100 200 300 400 500 600 700IMF5

CPU

util

izat

ion

()

ndash2

0

2

(e)

0 100 200 300 400 500 600 700IMF6

CPU

util

izat

ion

()

ndash2

0

2

(f )

0 100 200 300 400 500 600 700IMF7

CPU

util

izat

ion

()

ndash2

0

2

(g)

0 100 200 300 400 500 600 700IMF8

CPU

util

izat

ion

()

ndash2

0

2

(h)

0 100 200 300 400 500 600 700R

CPU

util

izat

ion

()

101214161820

(i)

Figure 4 Decomposition results of EEMD

Table 1 RT average periods and RF of efficient IMF and Rcomponents

IMF1 IMF2 IMF3 IMF4 IMF5 IMF8 RRT 431 181 96 47 21 5 2Average period 212 831 1641 3059 673 673 673RF 1 070 060 053 047 0 0

Journal of Electrical and Computer Engineering 7

number of prediction points It is obvious that the predictionaccuracy is higher when the MAPE is lower

522 Effectiveness Analysis Host utilization under-prediction or overprediction can lead to resource under-provision or overprovision Resource underprovisioncannot guarantee applicationsrsquo QoS while resource over-provision can cause resource waste and low resource utili-zation +erefore a good prediction method should avoidunderprediction and overprediction In particular under-prediction should be avoided as much as possible because itresults in a lower QoS to users

We set up the positive and negative errors to reflect theoverprediction and underprediction and then use them toevaluate the effectiveness of our method A good predictionmethod should have a smaller negative error to avoidunderprediction+e positive and negative prediction errorsare calculated by the following formula where pi is thepredicting data ri is the actual data and m is the number ofunderprediction data (ie negative deviation) or over-prediction data (ie positive deviation)

ei 1m

1113944

m

i1pi minus ri

11138681113868111386811138681113868111386811138681113868 (13)

523 Time Cost Analysis Host utilization varies veryquickly in a cloud data center If host utilization prediction isslower than the determination of VM migration resourceprovision will be delayed which can cause poor QoS +ushost utilization prediction must be completed in a timelymanner To investigate the time cost of our proposedmethod we test the running time of the EEMD-RT-ARIMAmethod and compared it with other prediction methodsaccording to the following index tc

tc tour minus tother

totherlowast100 (14)

where tour indicates the running time of our method EEMD-RT-ARIMA and tother represents the running time of othermethods such as the ARIMA model or the EEMD-ARIMA

method tc denotes the percent of the reduced or increasedtime cost

6 Experimental Results and Analysis

To validate the prediction effectiveness of our EEMD-RT-ARIMA method we conduct experiments on ARIMAEEMD-ARIMA and EEMD-RT-ARIMA methods andcompare their predictive results All experiments wereperformed on a PC with 25GHz Intel (R) i7 CPU runningMATLAB To make three methods comparable we use thesame original dataset to execute it 5 times for each method+e mean values of the prediction results are shown in thefollowing tables and figures

61 Error Analysis Table 2 shows the MAPE values of hostutilization predictions for 7 physical hosts We can see thatEEMD-ARIMA and EEMD-RT-ARIMA methods havelowerMAPE values than ARIMAmodels for 6-point and 12-point predictions For example EEMD-ARIMA and EEMD-RT-ARIMA methods achieve MAPE values of 606 and505 for the 6-point prediction of host 109 while theARIMA model has a far higher MAPE value (up to 1685)+ey obtain MAPE values of 1013 and 546 for the 12-point prediction of host 109 while the ARIMA model ob-tains 1108 For host 22 both the EEMD-ARIMA andEEMD-RT-ARIMA methods obtain far lower MAPE valuesof 531 and 542 than the 1066 of the ARIMA modelfor 6-point prediction Similarly they also obtain bettereffectiveness on 12-point prediction +e same situation alsoexists in 6-point and 12-point predictions of other hosts+is indicates that both EEMD-ARIMA and EEMD-RT-ARIMA methods have higher prediction accuracy thanARIMAmodels in 6-point and 12-point predictions for hostutilization EEMD reduces the inherent volatility of the hostutilization sequence which improves the prediction accu-racy of the EEMD-ARIMA and EEMD-RT-ARIMAmethods However the situation changes in 24-point pre-diction +e MAPE values of hosts 1162 424 1060 and 237are all over 30 using these three methods Although theEEMD-RT-ARIMA method has lower MAPE values than

0 100 200 300 400 500 600 700ndash3ndash2ndash1

0123

CPU

util

izat

ion

()

(a)

0 100 200 300 400 500 600 700ndash3ndash2ndash1

012

CPU

util

izat

ion

()

(b)

0 100 200 300 400 500 600 700101214161820

CPU

util

izat

ion

()

(c)

Figure 5 New components (a) High-frequency and strong-volatility component (b) Medium-frequency and weak-volatility component(c) Low-frequency trend component

8 Journal of Electrical and Computer Engineering

the ARIMA and EEMD-ARIMA methods for hosts 839 and109 it has far higherMAPE values in the 24-point predictionthan those of 6-point and 12-point predictions +is showsthat the EEMD-ARIMA and EEMD-RT-ARIMA methodsare not suitable for long-term but suitable for short-termprediction

For further analysis we find that the EEMD-RT-ARIMAmethod achieves lower prediction error than the EEMD-ARIMA method for the 6-point and 12-point predictions ofhosts 839 109 and 1162 although the EEMD-RT-ARIMAmethod only selects efficient IMF components However itis the opposite for hosts 22 424 1060 and 237 +e original

CPU utilization sequences of all physical hosts are identicalin frequency so we calculate the RT value of each CPUutilization sequence shown in Table 3 Hosts 839 109 and1162 achieve lower RT values under 10 which shows thattheir CPU utilization is more stationary than other hostsSmaller RT values indicate more stationary host utilizationsequences +is phenomenon can also be seen inFigures 6(a)ndash6(c) From Tables 2 and 3 it can be found thatthe EEMD-RT-ARIMA method achieves a lower MAPEvalue than the EEMD-ARIMA method if the RT value issmaller Conversely the EEMD-RT-ARIMA method has ahigher MAPE value than the EEMD-ARIMA method if the

(1) For each new component(2) Set the order of difference d 0(3) Execute the augmented Dickey-Fuller (ADF) test If it is a stationary time series go to step 5 else go to step 4 until it is stationary(4) Difference the time series and set d d + 1 go to step 3(5) Determine the order of the ARIMA model using Bayesian information criterion (BIC)(6) Estimate the parameters of the ARIMA model using the maximum likelihood(7) Forecast the future n values of this new component using the ARIMA model(8) End(9) Obtain the overall prediction results by superposing the prediction results of each new component

ALGORITHM 1 ARIMA prediction

0204060

CPU

utili

zatio

n(

)

(a)

0204060

CPU

utili

zatio

n(

)

(b)

0204060

CPU

utili

zatio

n(

)

(c)

0204060

CPU

utili

zatio

n(

)

(d)

CPU

utili

zatio

n(

)

0204060

(e)

CPU

utili

zatio

n(

)

0204060

(f )

CPU

utili

zatio

n(

)

0204060

(g)

Figure 6 CPU utilization traces of 7 physical hosts (a) Host 839 (b) Host 109 (c) Host 1162 (d) Host 22 (e) Host 424 (f ) Host 1060 (g)Host 237

Journal of Electrical and Computer Engineering 9

RTvalue is larger For instance hosts 839 109 and 1162 withsmaller RT values obtain lower MAPE values using theEEMD-RT-ARIMA method than the EEMD-ARIMAmethod while hosts 22 424 1060 and 237 with largerRT values obtain higher MAPE values using the EEMD-RT-ARIMA method than the EEMD-ARIMA method

Furthermore the difference in theMAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods decreaseswith the increase in the RTvalues from host 839 to host 1162+eir difference changes to negative from host 22 whichindicates that the EEMD-ARIMA method has higher pre-diction accuracy than the EEMD-RT-ARIMA method+en their difference becomes larger as the RT values in-crease +e 12-point host CPU utilization prediction illus-trates this situation For example host 839 with an RTvalueof 2 has a MAPE of 503 for 12-point prediction by usingthe EEMD-RT-ARIMA method which is 545 lower thanthe 1048 of the EEMD-ARIMA method +e MAPE valueof the EEMD-RT-ARIMA method is only 419 lower thanthat of the EEMD-ARIMAmethod for host 1162 with an RTvalue of 10 For host 22 with an RT value of 16 the EEMD-RT-ARIMA method has a slightly higher MAPE of 537than 533 of the EEMD-ARIMAmethodWith the increaseof the RT value the differences of MAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods furtherincrease to 087 196 and 463 for hosts 424 1060 and237 respectively+is indicates that the EEMD-RT-ARIMAmethod is less effective than the EEMD-ARIMA method inCPU utilization prediction for these hosts Undoubtedlythe ARIMA prediction of each component decomposed by

the EEMD method generates a certain error +e super-position of the prediction results of each component causeserror accumulation +e EEMD-RT-ARIMA method re-duces the error accumulation by selecting and recon-structing the efficient IMF components into fewercomponents so it achieves better prediction accuracy thanthe EEMD-ARIMA method for hosts 839 109 and 1162Certainly the selection and reconstruction of efficient IMFcomponents also cause a certain prediction error due to theabsence of nonefficient components especially for non-stationary host utilization sequences When this kind ofprediction error exceeds the error accumulation of ARIMAprediction of all components in the EEMD-ARIMAmethod the EEMD-RT-ARIMA method is no more ef-fective than the EEMD-ARIMA method for the non-stationary CPU utilization prediction of some hosts suchas hosts 22 424 1060 and 237

62 Effectiveness Analysis To verify the effectiveness of ourmethod in short-term prediction we select the experimentalresults of hosts 839 22 and 237 with the minimum middleand maximum RT values for further analysis Figure 7 showsthe prediction results of the EEMD-RT-ARIMAARIMA andEEMD-ARIMA methods We find that the future resourceutilization of host 839 decreases below 11 According to apredefined policy host 839 is underloaded and can be closedto save energy Figure 7 shows that our method is moreaccurate and effective than the ARIMA model In particularour method tends to change with the trend of data variationwhile the ARIMAmodel cannot keep up with it Our methodis more suitable for handling nonstationary time series thanthe ARIMAmodel Additionally the predicting data using theEEMD-RT-ARIMAmethod are closer to the testing data thanthose of the EEMD-ARIMA method for host 839 +ese

Table 2 MAPE values of host utilization prediction

Host ID Prediction length ARIMA () EEMD-ARIMA () EEMD-RT-ARIMA ()

8396-point 977 473 46112-point 1378 1048 50324-point 2310 2153 882

1096-point 1685 606 50512-point 1108 1013 54624-point 4134 2468 836

11626-point 2445 776 64512-point 1781 1314 89524-point 2193 4141 3345

226-point 1066 531 54212-point 822 533 53724-point 1141 1844 1477

4246-point 3754 1612 176512-point 2174 1659 174624-point 8868 7739 1157

10606-point 3571 1005 139412-point 3760 1744 194024-point 7472 5482 9193

2376-point 2229 785 117812-point 2551 1130 159324-point 12611 15843 18951

Table 3 RT values of each host utilizationHost ID 839 109 1162 22 424 1060 237RT value 2 6 10 16 22 24 38

10 Journal of Electrical and Computer Engineering

results show that the EEMD-RT-ARIMA method is moreeffective than the EEMD-ARIMAmethod for CPU utilizationsequences with weak fluctuations When the host utilizationsequence shows stronger fluctuation the absence of non-efficient IMF components will greatly influence the predictionresults +e EEMD-RT-ARIMA method is no more effectivethan the EEMD-ARIMA method for CPU utilization pre-diction of host 237

To further analyze the effectiveness of our method wecalculate the positive and negative errors for 6-point and 12-point predictions of these hosts shown in Table 4 When thenegative error is smaller the prediction method is moresuitable for cloud resource provision because of avoidingunderprediction It can be observed that most of the pre-diction results of the ARIMAmodel are underpredicted (thecells of positive error are all ldquonullrdquo for hosts 839 and 22)Furthermore the negative errors of the ARIMA model are

all far higher than those of other methods for host 237 Forinstance the ARIMA model has a high negative error of upto 2751 for the 12-point prediction of host 237 while theEEMD-ARIMA and EEMD-RT-ARIMA methods only havenegative errors of 800 and 892 respectively If theARIMA model is used to predict future host utilization itcan cause resource underprovision which cannot ensureapplicationsrsquo QoS +e EEMD-RT-ARIMA method achievessmaller negative errors than the EEMD-ARIMA method forhosts 839 and 22 while it has a larger negative error for hosts237 For instance the EEMD-ARIMA method achieves thenegative error of 1071 for the 12-point prediction of host839 while EEMD-RT-ARIMA only achieves the negativeerror of 474 Similarly the EEMD-ARIMA method ob-tains a negative error of 609 for the 12-point prediction ofhost 22 while the EEMD-RT-ARIMA method achieves alower value of only 505 However the situation changes

6789

101112

0 1 2 3 4 5 6 7Host 839 6-point prediction

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(a)

6789

101112

0 2 4 6 8 10 12

CPU

util

izat

ion

()

Host 839 12-point prediction

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(b)

262830323436

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 6-point prediction

(c)

262830323436

0 2 4 6 8 10 12CP

U u

tiliz

atio

n (

)

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 12-point prediction

(d)

1214161820222426

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 6-point prediction

(e)

0 2 4 6 8 10 121214161820222426

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 12-point prediction

(f )

Figure 7 Prediction of the results of different methods

Journal of Electrical and Computer Engineering 11

for host 237 +e EEMD-ARIMA method has a smallernegative error than the EEMD-RT-ARIMA method Forinstance the EEMD-RT-ARIMA method obtains negativeerrors of 1137 and 892 for 6-point and 12-point pre-dictions while the EEMD-ARIMAmethod only has negativeerrors of 462 and 800

63 Time-Cost Analysis To verify the applicability of ourmethod we further compare the time cost of these methodsin Figure 8 +e running time of the EEMD-ARIMAmethod is the largest by over 180 s while the ARIMAmodel takes the least time at less than 50 s +e EEMD-RT-ARIMA method time cost is between 70 s and 117 s whichdecreases the time cost by 40ndash80 compared with theEEMD-ARIMA method For example the running time ofthe EEMD-RT-ARIMA method is 6937 s far less than the33720 s of the EEMD-ARIMA method for the 6-pointprediction of host 22 Our method saves up to 80 of thetime cost For the CPU utilization sequence of host 237with strong variability it requires 19046 s to predict thefuture 6-point values using the EEMD-ARIMA methodwhile it only takes 11364 s using the EEMD-RT-ARIMAmethod +e running time is reduced by approximately40 Considering the prediction accuracy effectivenessand time cost our EEMD-RT-ARIMA method is morecost-effective for short-term host utilization prediction incloud computing

7 Conclusions

Host utilization is an indicator of host performancewhose prediction can promote effective resource sched-uling in cloud computing However host utilizationdemonstrates strong randomness and instability caused byusersrsquo random and various resource demands It is difficultto improve prediction accuracy In this paper we proposea hybrid and cost-effective method EEMD-RT-ARIMAfor short-term host utilization prediction in cloud com-puting +e EEMD method is first used to decompose thenonstationary host utilization sequence into a few rela-tively stationary IMF components and an R component+en we calculate the correlation coefficient between eachIMF component and the original data to select efficientIMF components and use RTvalues and average periods toreconstruct these components into three new componentsto reduce error accumulation and time cost Finally three

new components are predicted by the ARIMA model andtheir prediction results are superposed to form the overallprediction results We use the real host utilization tracesfrom a cloud platform to conduct the experiments andcompare our EEMD-RT-ARIMA method with theARIMA model and EEMD-ARIMA method in terms oferror effectiveness and time-cost analysis +e resultsshow that our method is cost-effective and is more suitablefor short-term host utilization prediction in cloudcomputing

Data Availability

+e running example and experimental data used to supportthe findings of this study have been deposited in the Figsharerepository (httpsdoiorg106084m9figshare7679594)

Conflicts of Interest

+e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

+is work was supported by the Shandong ProvincialNatural Science Foundation (ZR2016FM41)

Table 4 Positive and negative error analysis

HostID

Predictionlength

ARIMA EEMD-ARIMA EEMD-RT-ARIMAPositive error

()Negative error

()Positive error

()Negative error

()Positive error

()Negative error

()

839 6-point Null 977 316 585 492 41512-point Null 1378 343 1071 480 474

22 6-point Null 1066 391 670 412 63412-point Null 972 426 609 539 505

237 6-point 039 2668 1431 462 1381 113712-point 039 2751 1744 800 1790 892

0

50

100

150

200

250

300

350

12-point

Tim

e cos

t (s)

ARIMAEEMD-ARIMA EEMD-RT-ARIMA

6-pointHost 839

12-point6-pointHost 22

12-point6-pointHost 237

Figure 8 Time cost of different prediction methods

12 Journal of Electrical and Computer Engineering

References

[1] J J Prevost K Nagothu B Kelley and M Jamshidi ldquoPre-diction of cloud data center networks loads using stochasticand neural modelsrdquo in Proceedings of 2011 6th InternationalConference on System of Systems Engineering pp 276ndash281IEEE Albuquerque NM USA June 2011

[2] M Borkowski S Schulte and C Hochreiner ldquoPredictingcloud resource utilizationrdquo in Proceedings of 9th InternationalConference on Utility and Cloud Computing (UCC) pp 37ndash42IEEE Shanghai China December 2016

[3] M Barati and S Sharifian ldquoA hybrid heuristic-based tunedsupport vector regression model for cloud load predictionrdquoJournal of Supercomputing vol 71 no 11 pp 4235ndash42592015

[4] Z Chen Y Zhu Y Di S Feng and J Geng ldquoA high-accuracyself-adaptive resource demands predicting method in IaaScloud environmentrdquo Neural Network World vol 25 no 5pp 519ndash540 2015

[5] J Chen and Y Wang ldquoA resource demand prediction methodbased on EEMD in cloud computingrdquo Procedia ComputerScience vol 131 pp 116ndash123 2018

[6] H Toumi Z Brahmi Z Benarfa and M M GammoudildquoServer load prediction using stream miningrdquo in Proceedingsof 2017 International Conference on Information Networking(ICOIN) pp 653ndash661 IEEE Da Nang Vietnam January2017

[7] S Di D Kondo and W Cirne ldquoHost load prediction in aGoogle compute cloud with a Bayesian modelrdquo in Proceedingsof 2012 International Conference on High PerformanceComputing Networking Storage and Analysis (SCrsquo12)pp 1ndash11 IEEE Salt Lake City UT USA November 2012

[8] N K Gondhi and P Kailu ldquoPrediction based energy efficientvirtual machine consolidation in cloud computingrdquo in Pro-ceedings of 2015 Second International Conference on Advancesin Computing and Communication Engineering pp 437ndash441IEEE Dehradun India May 2015

[9] A Verma G Dasgupta T K Nayak P De and R KotharildquoServer workload analysis for power minimization usingconsolidationrdquo in Proceedings of the 2009 Conference onUSENIX Annual Technical Conference p 28 USENIX As-sociation San Diego CA USA June 2009

[10] B Song Y Yu Y Zhou Z Wang and S Du ldquoHost loadprediction with long short-term memory in cloud comput-ingrdquo Journal of Supercomputing vol 74 no 12 pp 6554ndash6568 2018

[11] J-J Jheng F-H Tseng H-C Chao and L-D Chou ldquoA novelVM workload prediction using grey forecasting model incloud data centerrdquo in Proceedings of International Conferenceon Information Networking 2014 (ICOIN2014) pp 40ndash45IEEE Phuket +ailand February 2014

[12] A Beloglazov and R Buyya ldquoManaging overloaded hosts fordynamic consolidation of virtual machines in cloud datacenters under quality of service constraintsrdquo IEEE Trans-actions on Parallel and Distributed Systems vol 24 no 7pp 1366ndash1379 2013

[13] M Dabbagh B Hamdaoui M Guizani and A Rayes ldquoAnenergy-efficient VM prediction and migration framework forovercommitted cloudsrdquo IEEE Transactions on Cloud Com-puting vol 6 no 4 pp 955ndash966 2018

[14] D Minarolli A Mazrekaj and B Freisleben ldquoTackling un-certainty in long-term predictions for host overload andunderload detection in cloud computingrdquo Journal of CloudComputing vol 6 no 1 p 4 2017

[15] K Mason M Duggan E Barrett J Duggan and E HowleyldquoPredicting host CPU utilization in the cloud using evolu-tionary neural networksrdquo Future Generation Computer Sys-tems vol 86 pp 162ndash173 2018

[16] D Magalhatildees R N Calheiros R Buyya and D G GomesldquoWorkload modeling for resource usage analysis and simu-lation in cloud computingrdquo Computers amp Electrical Engi-neering vol 47 pp 69ndash81 2015

[17] C Tian Y Wang Y Luo et al ldquoMinimizing content re-organization and tolerating imperfect workload prediction forcloud-based video-on-demand servicesrdquo IEEE Transactionson Services Computing vol 9 no 6 pp 926ndash939 2016

[18] M Verma G R Gangadharan V Ravi and N NarendraldquoResource demand prediction in multi-tenant service cloudsrdquoin Proceedings of 2013 IEEE International Conference on CloudComputing in Emerging Markets (CCEM) pp 1ndash8 IEEEBangalore India October 2013

[19] W Zhang Y Shi L Liu L Cui and Y Zheng ldquoPerformanceand resource prediction at high utilization for n-tier servicesystems in cloud an experiment driven approachrdquo in Pro-ceedings of 2015 IEEE International Conference on Computerand Information Technology Ubiquitous Computing andCommunications Dependable Autonomic and Secure Com-puting Pervasive Intelligence and Computing pp 843ndash848IEEE Liverpool UK October 2015

[20] G Kecskemeti A Kertesz and Z Nemeth ldquoCloud workloadprediction by means of simulationsrdquo in Proceedings of theComputing Frontiers Conference on ZZZ (CFrsquo17) pp 279ndash282ACM Siena Italy May 2017

[21] Y Chen and Z-A Jiang ldquoDynamically predicting the qualityof service batch online and hybrid algorithmsrdquo Journal ofElectrical and Computer Engineering vol 2017 Article ID9547869 10 pages 2017

[22] A Khan X Yan S Tao and N Anerousis ldquoWorkloadcharacterization and prediction in the cloud a multiple timeseries approachrdquo in Proceedings of 2012 IEEE Network Op-erations and Management Symposium pp 1287ndash1294 IEEEMaui HI USA April 2012

[23] A K Mishra J L Hellerstein W Cirne and C R DasldquoTowards characterizing cloud backend workloadsrdquo ACMSIGMETRICS Performance Evaluation Review vol 37 no 4pp 34ndash41 2010

[24] D Gmach J Rolia L Cherkasova and A KemperldquoWorkload analysis and demand prediction of enterprise datacenter applicationsrdquo in Proceedings of 2007 IEEE 10th In-ternational Symposium on Workload Characterizationpp 171ndash180 IEEE Boston MA USA September 2007

[25] F-H Tseng X Wang L-D Chou H-C Chao andV C M Leung ldquoDynamic resource prediction and allocationfor cloud data center using the multiobjective genetic algo-rithmrdquo IEEE Systems Journal vol 12 no 2 pp 1688ndash16992018

[26] G K Shyam and S S Manvi ldquoVirtual resource prediction incloud environment a Bayesian approachrdquo Journal of Networkand Computer Applications vol 65 pp 144ndash154 2016

[27] Y Lu J Panneerselvam L Liu and Y Wu ldquoRVLBPNN aworkload forecasting model for smart cloud computingrdquoScientific Programming vol 2016 Article ID 5635673 9 pages2016

[28] K Rajaram and M P Malarvizhi ldquoUtilization based pre-diction model for resource provisioningrdquo in Proceedings of2017 International Conference on Computer Communicationand Signal Processing (ICCCSP) pp 1ndash6 IEEE ChennaiIndia January 2017

Journal of Electrical and Computer Engineering 13

[29] L Li and A Zhang ldquoResource demand optimization com-bined prediction under cloud computing environment basedon IOWGA operatorrdquo International Journal of Grid andDistributed Computing vol 8 no 3 pp 77ndash86 2015

[30] D Minarolli and B Freisleben ldquoCross-correlation predictionof resource demand for virtual machine resource allocation incloudsrdquo in Proceedings of 2014 Sixth International Conferenceon Computational Intelligence Communication Systems andNetworks pp 119ndash124 IEEE Tetova Macedonia May 2014

[31] W Zhang P Duan L T Yang et al ldquoResource requestsprediction in the cloud computing environment with a deepbelief networkrdquo Software Practice and Experience vol 47no 3 pp 473ndash488 2017

[32] H-B Mi H-M Wang G Yin D-X Shi Y-F Zhou andL Yuan ldquoResource on-demand reconfiguration method forvirtualized data centersrdquo Journal of Software vol 22 no 9pp 2193ndash2205 2011

[33] R N Calheiros E Masoumi R Ranjan and R BuyyaldquoWorkload prediction using ARIMAmodel and its impact oncloud applicationsrsquo QoSrdquo IEEE Transactions on CloudComputing vol 3 no 4 pp 449ndash458 2015

[34] Y Meng R Rao X Zhang and P Hong ldquoCRUPA a con-tainer resource utilization prediction algorithm for auto-scaling based on time series analysisrdquo in Proceedings of2016 International Conference on Progress in Informatics andComputing (PIC) pp 468ndash472 IEEE Shanghai China De-cember 2016

[35] E Dhib N Zangar N Tabbane and K Boussetta ldquoImpact ofseasonal ARIMA workload prediction model on QoE formassively multiplayers online gamingrdquo in Proceedings of 20165th International Conference on Multimedia Computing andSystems (ICMCS) pp 737ndash741 IEEE Marrakech MoroccoSeptember 2016

[36] A Ganapathi Y Chen A Fox R Katz and D PattersonldquoStatistics-driven workload modeling for the cloudrdquo inProceedings of 2010 IEEE 26th International Conference onData EngineeringWorkshops (ICDEW 2010) pp 87ndash92 IEEELong Beach CA USA March 2010

[37] V G Tran V Debusschere and S Bacha ldquoHourly serverworkload forecasting up to 168 hours ahead using seasonalARIMA modelrdquo in Proceedings of 2012 IEEE InternationalConference on Industrial Technology pp 1127ndash1131 IEEEAthens Greece March 2012

[38] D Xu S Yang and H Luo ldquoResearch on generalized fuzzysoft sets theory based combined model for demanded cloudcomputing resource predictionrdquo Chinese Journal of Man-agement Science vol 23 no 5 pp 56ndash64 2015

[39] S Li Y Wang X Qiu D Wang and L Wang ldquoA workloadprediction-basedmulti-VM provisioningmechanism in cloudcomputingrdquo in Proceedings of 2013 15th Asia-Pacific NetworkOperations andManagement Symposium (APNOMS) pp 1ndash6IEEE Hiroshima Japan September 2013

[40] X Fu and C Zhou ldquoPredicted affinity based virtual machineplacement in cloud computing environmentsrdquo IEEE Trans-actions on Cloud Computing vol 99 p 1 2017

[41] Y Jiang C Perng T Li and R Chang ldquoASAP a self-adaptiveprediction system for instant cloud resource demand pro-visioningrdquo in Proceedings of 2011 IEEE 11th InternationalConference on Data Mining pp 1104ndash1109 IEEE VancouverBC Canada December 2011

[42] Z Wu and N E Huang ldquoEnsemble empirical mode de-composition a noise-assisted data analysis methodrdquo Ad-vances in Adaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[43] H Zang L Fan M Guo Z Wei G Sun and L ZhangldquoShort-term wind power interval forecasting based on anEEMD-RT-RVMmodelrdquo Advances in Meteorology vol 2016Article ID 8760780 10 pages 2016

[44] N Safari C Y Chung and G C D Price ldquoNovel multi-stepshort-term wind power prediction framework based onchaotic time series analysis and singular spectrum analysisrdquoIEEE Transactions on Power Systems vol 33 no 1 pp 590ndash601 2018

[45] X Chen H Wang J Huang and H Ren ldquoAPU degradationprediction based on EEMD and Gaussian process regressionrdquoin Proceedings of 2017 International Conference on SensingDiagnostics Prognostics and Control (SDPC) pp 98ndash104IEEE Shanghai China August 2017

[46] C Yan C Yi XWu et al ldquoTurbine fault trend prediction thatbased on EEMD and ARIMA modelsrdquo Journal of GansuSciences vol 28 no 4 pp 100ndash106 2016

[47] M Jin P Li L Zhang et al ldquoA signal feature method and itsapplication based on EEMD fuzzy entropy and GK cluster-ingrdquo ACTA Metrologica Sinica vol 26 no 5 pp 501ndash5052015

[48] E H Norden Z Shen R L Steven et al ldquo+e empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysisrdquoin Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 no 1971 pp 903ndash995 RoyalSociety London UK March 1998

[49] Alibaba cluster-trace-v2017 httpsgithubcomalibabaclusterdata

14 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: A Hybrid Method for Short-Term Host Utilization Prediction ...downloads.hindawi.com/journals/jece/2019/2782349.pdf · terminesthe migratedVMs according to powersavings and workloadbalance.Dabbaghetal.[13]proposedaprediction

guarantee the QoS of applications running on these VMs+erefore it is necessary to migrate some hotspot VMs fromone overloaded host to other nonoverloaded hosts Similarlyif the host utilization is below a predefined threshold allVMs on this host will be migrated to other hosts +us thishost can be closed to reduce energy consumption

VM migration is a reactive method that cannot beinitiated until the host is overloaded or underloaded+erefore it is very important to detect when the host isoverloaded or underloaded Most existing approachesmonitor host utilization to determine its state If its resourceutilization exceeds a predefined threshold during an ob-servation period this host is overloaded If its resourceutilization is always below a predefined threshold during anobservation period it is declared underloaded Basically itusually takes some time to migrate VMs from an overloadedhost to other hosts If the host utilization changes faster thanthe provision time of the resources users will suffer poorQoS until resources are available In addition host under-load detection based on a single host utilization value alsoleads to unnecessary VM migration and stability problems+ese problems can be addressed via proactive methods thatactively predict short-term host utilization to allocate re-sources in advance For example if the host utilizationwithin the future 15minutes is always over 80 this hostwill be overloaded +erefore the VMs should be migratedin advance from this host to other hosts to ensure QoS If thehost utilization within the current and future 1 hour is al-ways below 15 this host is underloaded and should beclosed to save energy after VM migration However a largenumber of random resource demands and concurrent accessto applications cause stochastic volatility of host utilization+ey change very fast and exhibit strong instability withmany bursts It is difficult to predict short-term host utili-zation in a timely and accurate manner based on such data

Although some machine learning methods such as aneural network (NN) [1 2] support vector regression (SVR)[3] and backpropagation neural networks (BPNN) [4]achieve good prediction accuracy in cloud computing theyrequire too much time to train a model to allocate resourcesrapidly Line regression (LR) can implement predictionmore quickly than ARIMA but it demands that the trainingdata have simpler behaviors ARIMA is a prediction model

for nonstationary time series but it is not suitable if a largeamount of random variation exists in the data In ourprevious work [5] we proposed a resource demand pre-diction method EEMD-ARIMA that combines the EEMDmethod and ARIMA model to predict future resource de-mands +is method first uses the EEMD method to de-compose the original resource demand sequence intomultiple IMF components and the residual (R) componentNext we forecast the future values of each component by theARIMA model Finally the overall forecasting results areobtained by superposing the forecasting results of eachcomponent Although this method alleviates random vari-ation of resource demands and improves prediction accu-racy by combining EEMD and ARIMA methods twoproblems arise One is the prediction error accumulationcaused by the superposition of ARIMA prediction of allcomponents +e ARIMA prediction of each componentdecomposed by the EEMD method generates a certainprediction error +e superposition of the prediction resultsof all components leads to the prediction error accumula-tion Another is the high time cost due to EEMD de-composition and ARIMA prediction of too manydecomposition components +e ARIMA prediction of eachcomponent takes some time +us the total time of theARIMA prediction of multiple components greatly increasescompared with a single ARIMA prediction of the originalsequence

To solve these problems of the EEMD-ARIMA methodthis paper further proposes a hybrid method EEMD-RT-ARIMA for short-term host utilization prediction that notonly further improves prediction accuracy by combining theEEMD method with the ARIMA model but also reducestime cost by selecting and reconstructing efficient IMFcomponents +e comparison and evaluation are madeamong our EEMD-RT-ARIMA method ARIMA modeland EEMD-ARIMA method in terms of error effectivenessand time-cost analysis

2 Related Works

Many studies have been conducted on various predictions incloud computing From the perspective of research objec-tives some researchers have studied server load prediction

Physical host 1 (active)

Physical host n (closed)

VM3 VM4 NewVM5 VM3 VM4VM2VM1VM5

VM2VM1

Physical host m (active)

VM4 VM5

VM1 VM2

Virtual machine migration

Resourceallocation

VM1 VM2 VM3

Virtual machine migration

Physical host 2 (active)

Figure 1 Resource allocation and scheduling process

2 Journal of Electrical and Computer Engineering

[6ndash10] VM load prediction [11 12] VM utilization pre-diction [13 14] host utilization prediction [15] web ap-plication workload prediction [16] cloud service workloadprediction [17ndash19] workflow workload prediction [20]service quality prediction [21] and workload characteriza-tion [22ndash24] Toumi et al [6] described a server loadaccording to the submitted task types and the submissionrate and applied a stream mining technique to predict serverloads Jheng et al [11] proposed a VM workload predictionmethod based on the gray forecasting model which de-termines the migrated VMs according to power savings andworkload balance Dabbagh et al [13] proposed a predictionapproach that uses Wiener filters to predict the future re-source utilization of VMs Mason et al [15] predicted hostCPU utilization for a short time using evolutionary neuralnetworks which showed a high prediction accuracy and ahigh degree of generality In this paper we focus on hostutilization prediction using EEMD and ARIMA methods tonot only improve prediction accuracy but also reduceprediction time as much as possible

From the perspective of approaches predictionmethods are usually divided into two categories One isbased on machine learning methods Tseng et al [25]proposed a prediction method for CPU and memoryutilization of VMs and physical machines based on a ge-netic algorithm (GA) which precedes the gray model understable tendency and unstable tendency in terms of pre-diction accuracy Shyam and Manvi [26] proposed a short-and long-term prediction model of virtual resource re-quirements for CPUmemory-intensive applications basedon Bayesian networks where the relationships and de-pendencies between variables are identified to facilitateresource prediction Lu et al [27] proposed a workloadprediction model RVLBPNN (Rand Variable Learning RateBackpropagation Neural Network) based on BPNN algo-rithm which achieves higher prediction accuracy than thehidden Markov model and the naive Bayes classifier +ismethod not only predicts CPU-intensive and memory-intensive workloads but also improves prediction accu-racy by using the intrinsic relations among the arrivingcloud workloads Rajaram and Malarvizhi [28] comparedthe prediction accuracies of a few machine learningmethods such as LR SVR and multiplayer perceptron Liand Zhang [29] proposed an optimal combination pre-diction method for resource demands which combines theinduced ordered weighted geometry averaging operatorand the generalized dice coefficient with the improvedElman neural network and gray model to enhance theprediction accuracy Minarolli and Freisleben [30] pre-sented a cross-correlation prediction approach based onsupport vector machine (SVM) which considers the crossrelation of VMs running the same application to improveprediction accuracy Zhang et al [31] proposed a deepbelief network- (DBN-) based prediction approach of cloudresource requests in which orthogonal experimental designand analysis of variance are used to enhance the predictionaccuracy Compared with the ARIMA model this methodgreatly reduces mean square error (MSE) by over 60 forCPU and RAM request predictions Although machine

learning methods are effective in improving predictionaccuracy they are complex and usually demand a largenumber of data to extract features and train a model Itrequires too much time for the prediction to guarantee QoSof the running applications Cloud computing requires asimple and rapid host utilization prediction method tosupport resource allocation and scheduling

Another method is based on statistical methods such asBrownrsquos quadratic exponential smoothing method [32]autoregressive integrated moving average (ARIMA) model[33ndash35] and the kernel canonical correlation analysis [36]Tran et al [37] applied the ARIMA model in the long-termprediction of server workload while our method aims topredict short-term host utilization It is more difficult be-cause host utilization can be extremely random and non-stationary in a short time Calheiros et al [33] proposed ashort-term prediction model of cloud workload using theARIMAmodel and evaluated the prediction accuracy and itsimpact on user applicationsrsquo QoS +ey suggested that usersrsquobehaviors must be considered to reflect real conditions inworkload simulation Our method combines the ARIMAmodel with EEMD and RT methods to improve predictionaccuracy and reduce prediction time as much as possible Itis compared with EEMD-ARIMA and ARIMA methods interms of error effectiveness and time-cost analysis

Moreover some studies combine the ARIMA modelwith other techniques to improve prediction accuracy Xuet al [38] constructed a model GFSS-ANFISSARIMAcombining the seasonal ARIMA model with the general-ized fuzzy soft sets and adaptive neuro-fuzzy inferencesystem +is model improves the prediction accuracy ofresource demands Li et al [39] proposed a workloadpredictor combined with ARIMA and dynamic errorcompensation to reduce the service-level agreement (SLA)default rate Fu and Zhou [40] proposed a predicted af-finity model to implement VM placement which uses theresource demands predicted by the ARIMA model tocalculate a VM-host affinity value Jiang et al [41] pre-sented a self-adaptive ensemble prediction method forcloud resource demands which uses a two-level ensemblemethod to predict VM demands based on a historic timeseries +is method not only combines multiple predictionmethods moving average (MA) autoregressive (AR)artificial neural network (ANN) gene expression pro-gramming (GEP) and SVM but also adjusts the weight ofeach method adaptively to obtain the best average per-formance according to the relative errors In contrast ourmethod uses the EEMD method to deal with the non-stationary host utilization and then selects and re-constructs efficient components to improve predictionaccuracy and reduce the time cost +e EEMD proposed byWu and Huang [42] is an effective noise-aided method thatcan handle nonlinear and nonstationary time series It hasbeen widely used in wind speed forecasting [43 44]aircraft auxiliary power unit (APU) degradation pre-diction [45] turbine fault trend prediction [46] androlling bearing fault diagnosis [47] It has shown a goodeffect on enhancing the prediction accuracy Our methodalso uses EEMD to decompose the nonstationary host

Journal of Electrical and Computer Engineering 3

utilization for improving the prediction accuracy andfurther uses correlation coefficients RT values and av-erage periods to select and reconstruct efficient compo-nents for reducing prediction error accumulation andprediction time

3 Background

31 Empirical Mode Decomposition (EMD) EMD is amethod of signal processing that can decompose a signalinto multiple IMFs and an R trend item [48] Two conditionsmust be satisfied for an IMF

(1) +e number of extrema and zero-crossings musteither be identical or differ by at most one

(2) +e mean value of the envelopes of the local maximaand the local minima must be zero

EMD includes the following steps

Step 1 Make f(t) x(t) where x(t) is given as theoriginal dataStep 2 Find all the local maxima and minima of fi(t)where i is the loop times and its initial value is 1Interpolate between the local maxima and minima toobtain an upper envelope and a lower envelope andthen compute the mean value mi(t) of these envelopesStep 3 Compute the new component hi(t) fi(t)minusmi(t)Step 4 Verify whether hi(t) satisfies the above-mentioned two conditions for an IMF If it does notmake fi+1(t) hi(t) and repeat steps 2 and 3 If itsatisfies the condition hi(t) is regarded as the first IMFcomponent p1(t) where p1(t) hi(t) +en computethe R component by the formula r1(t) x(t)minusp1(t)Step 5 Repeat step 1ndash4 with r1(t) as the new data untilthe R is a monotonic function +us x(t) is decom-posed into n IMFs and an R as follows

x(t) 1113944n

i1pi(t) + r(t) (1)

32 Ensemble Empirical Mode Decomposition (EEMD)+e EMD method has a noticeable drawback of modemixing that can cause signal intermittency Wu and Huangproposed a new method named ensemble empirical modedecomposition (EEMD) to solve this problem Comparedwith the EMD method the EEMD method first executesthe decomposition process k times Each time it adds adifferent white noise to the signal and then decomposes thenew signal Generally the k iterations are set as an integerin the range [50 100] and the standard deviation d of thewhite noise is set as a value in the range [01 02] Next k

groups of decomposition results are obtained Each groupincludes n IMFs pmi(t)(i 1 n) and an R rm(t) wherem denotes the group number Finally the mean values ofthese groups of IMFs and Rs are calculated as the finalIMFs pi(t)(i 1 n) and the R r(t)

pi(t) 1113936

km1pmi(t)

k

r(t) 1113936

km1rm(t)

k

(2)

+e IMF components have three main characteristics

(1) Completeness the total of all IMFs and the R havethe same feature as the original data

(2) Orthogonality each IMF with a certain physicalmeaning is independent and has no effect on otherIMFs +e product of any two IMFs equals 0 inmathematics

(3) Adaptability an IMF with a higher frequency isdecomposed from the original data faster than thosewith low frequencies +e frequencies of IMFs reflectthe features of the original data

33 Runs Test (RT) RT is a nonparametric test method thatchecks the randomness of a sequence with only two symbolsor two values such as + and minus and 0 and 1 An RT is definedas a sequence with successive symbols (0 or 1) For examplea data sequence ldquo11110000011111000110010rdquo includes 8runs 4 of which involve successive ldquo1rdquo and the others involvesuccessive ldquo0rdquo RT can also be used to test a time series

Assume that M ltm1(t) mi(t) mn(t)gt de-notes a time series where mi(t) is an element of this timeseries and n is the total number of elements +e mean valueof these elements is calculated by the following formula

M 1n

1113944

n

i1mi(t) (3)

+en the element of this time series can be denoted asfollows

Gi mi(t)minusM 1 mi(t)geM

0 mi(t)ltM

⎧⎨

⎩ (4)

+us this time series is transformed into a sequence witha series of 0 and 1 in which the elements are independentand identically distributed +e total number of RT reflectsthe fluctuation of the sequence

4 A Hybrid Method for Short-Term HostUtilization Prediction

To improve prediction accuracy and reduce prediction timeof the EEMD-ARIMAmethod we propose a hybrid methodEEMD-RT-ARIMA for short-term host utilization pre-diction as shown in Figure 2 First the host utilization se-quence is decomposed into multiple IMF components andthe R component using the EEMD method Next we cal-culate the correlation coefficients between IMF componentsand the original data sequence to select the efficient IMFcomponents+en we use RTvalues and average periods toreconstruct these efficient IMF and R components intothree new components high-frequency and strong-

4 Journal of Electrical and Computer Engineering

volatility component medium-frequency and weak-volatility component and low-frequency trend compo-nent +en we use the ARIMA method to predict theresults of three new components Finally the overallprediction results are achieved by summing the predictionresults of the three new components

+e key to our EEMD-RT-ARIMA method is to selectand reconstruct efficient components Compared with theEEMD-ARIMA method the number of its componentsinvolving in ARIMA prediction is reduced +us theEEMD-RT-ARIMA method can reduce the prediction erroraccumulation and the total prediction time by reducing thenumber of components Obviously both the EEMD-RT-ARIMA method and EEMD-ARIMA method have a higherprediction time than the ARIMA model from their imple-mentation processes However our EEMD-RT-ARIMAmethod focuses on cost-effectiveness which has a trade-off among prediction accuracy effectiveness and time cost

41 Use of EEMD toDecompose theHostUtilization SequenceA host utilization sequence is classified into different cate-gories according to the CPU memory and disk such as CPUutilization sequence Tcpu c1 ci cn1113864 1113865 +e CPUutilization sequence is usually random and unstable owing torandom and sudden resource demands in cloud computing Itis necessary to transform such data into relatively stationarydata to improve prediction accuracy +e EEMD methodappears to be more effective in processing nonlinear andnonstationary data sequences than other decomposition al-gorithms+erefore we use the EEMDmethod to decomposethe host utilization sequence and obtain a series of the IMFi

components and the R componentA running example shows the nonstationary CPU uti-

lization trace of a physical host from our cloud platform We

divide it into the training set (673 data points) and thetesting set (24 data points) in Figure 3 +en we use theEEMD method to decompose the training set and obtainIMF1-IMF8 components and the R component +ey areshown from the high frequency to low frequency in Figure 4

42 Calculation of the Correlation Coefficients to Select Effi-cient IMF Components A correlation coefficient measuresthe correlation between two sequences We calculate thecorrelation coefficient Pj(X Y) between the IMFj compo-nent and the original training set based on the followingformula where cov(X Y) is a covariance between the se-quences X and Y and Var(X) and Var(Y) are the variancesof the sequence X and the sequence Y

Pj(X Y) Cov(X Y)

Var(X)

1113968 Var(Y)

1113968 (5)

+en the correlation coefficient Pj(X Y) is checked todetermine whether it is negative If it is negative the IMFj

component is inefficient and dropped If it is not negativethe IMFj component is efficient and reserved

We calculate the correlation coefficient between eachIMF component and the original training set Only IMF6and IMF7 have negative correlation coefficients of minus008 andminus015 Hence they are dropped IMF1ndashIMF5 and IMF8 areselected as efficient IMF components

43 Reconstruction of Efficient IMFs and R into NewComponents Each IMF component actually reflects a cer-tain physical feature of the original data If some IMFcomponents are closer in terms of frequency and amplitudefluctuation then they have similar features+us they can bereconstructed into a new component with these typical

hellip

Use EEMD to decompose the host utilization sequence

IMF1 IMF2 IMFhndash1 R

High-frequency and strong-volatility component

hellip

Medium-frequency and weak-volatility component

Low-frequency trend component

Use ARIMA to predict the result of this component

Use correlation coefficients to select efficient IMFs

IMF1 IMFm

Use RT values and average periods to reconstruct efficient IMFs and R intothree new components

Use ARIMA to predict the result of this component

Use ARIMA to predict the result of this component

Construct the overall prediction result by summing the prediction results of three new components

Figure 2 EEMD-RT-ARIMA method

Journal of Electrical and Computer Engineering 5

features +us the prediction error accumulation and theprediction time of the EEMD-ARIMA method can be re-duced by reducing the number of components

+e average period reflects the frequency of host utili-zation variation +ere exists a reciprocal relation betweenthem +e smaller the average period the higher the fre-quency If the average periods of IMF components are closerthey are closer in frequency +e average period is calculatedby the following formula in which n is the number of thetraining set and lj is the number of extrema

Tj n

lj (6)

Similarly the RT value reflects the trend of amplitudefluctuation If the RTvalue is larger the amplitude volatility isstronger If the RT values of the two IMFs are closer theoverall trend of the two IMFs is similar in amplitude volatility

To enhance the prediction accuracy and reduce theprediction time of the EEMD-ARIMA method we re-construct the IMF components and the R component intothree new components according to their average periodsand RT values in the EEMD-RT-ARIMA method Becausethe average period and RT value have different units wenormalize the average period Tj as follows

Tnj Tj minusTmin

Tmax minusTmin (7)

where Tnj denotes the normalized average period of theIMFj component Tmax and Tmin represent the maximumandminimum of the average periods of all IMF componentsSimilarly the RT value Rj can be normalized as follows

Rnj Rj minusRmin

Rmax minusRmin (8)

where Rnj is the normalized value of Rj Rmax and Rmin arethe maximum and minimum of all RT values +us thereconstruction factor (RF) is defined as follows

Fj α middot 1minusTnj1113872 1113873 + β middot Rnj (9)

An IMF component is higher in frequency and strongerin volatility and its RF value is greater If the RF values of thetwo IMF components are closer their overall trends aremore similar +us they can be reconstructed into a newcomponent All efficient IMF and R components arereconstructed into three new components high-frequencyand strong-volatility component medium-frequency andweak-volatility component and low-frequency trend com-ponent +e high-frequency and strong-volatility compo-nent reflects the strong volatility and randomness of thehigh-frequency part of the original host utilization sequence+e medium-frequency and weak-volatility componentshows the detailed features of the volatility of the originalhost utilization sequence +e low-frequency trend com-ponent depicts the overall trend of the volatility of theoriginal host utilization sequence

Table 1 shows the RT values average periods and RFvalues of efficient IMF and R components +e RF values ofIMF1 and IMF2 are large and relatively close while the RFvalues of IMF8 and R are equal to 0 +e RF values ofIMF3ndashIMF5 are close +erefore we reconstructed IMF1-IMF2 IMF3ndashIMF5 and IMF8-R into three new compo-nents as shown in Figures 5(a)ndash5(c) +ey separately reflectthe randomness the fluctuation details and the overall trendof the original host utilization sequence

44 Use of the ARIMA Model to Predict the Future HostUtilization We use the ARIMA model to predict the futureresults for each new component+en the overall predictionresults are obtained by superposing the prediction results ofeach new component+e ARIMA prediction is described asfollows (Algorithm 1)

For example we assume that three new components ChCm and Cl are obtained which represent the high-frequencyand strong-volatility component medium-frequency andweak-volatility component and low-frequency trend com-ponent respectively +en we use the ARIMA method topredict the future 24-point values for each new component+e prediction results Ph Pm and Pl of three new com-ponents can be described in the following formulas each ofwhich includes the values of the predicting 24-point data

Ph f1h f

2h f

24h1113872 1113873

Pm f1m f

2m f

24m1113872 1113873

Pl f1l f

2l f

24l1113872 1113873

(10)

Finally we calculate the overall prediction result P bysuperposing the prediction results of each new componentas follows

P Ph + Pm + Pl (11)

From this process of the EEMD-RT-ARIMAmethod wefind that the number of components decreases from 9 to 3which can reduce the total prediction time and the erroraccumulation of the component prediction compared withthe EEMD-ARIMA method

0 50 100 150 200 250 300 350 400 450 500 550 600 650 7005

10

15

20

25

30

Testing set

CPU

util

izat

ion

()

Sample data

Training set

Figure 3 CPU utilization trace of a physical host

6 Journal of Electrical and Computer Engineering

5 Experimental Setup

We conduct an experiment to evaluate our method +eexperimental dataset and measurement metrics are in-troduced as follows

51 Experimental Dataset Host utilization mainly involvesin CPU utilization memory utilization network utilizationand disk utilization In this paper we mainly focus on hostCPU utilization We randomly select CPU utilization tracesof 7 physical hosts from the dataset released by Alibaba inAugust 2017 [49] each of which includes 144 points(5minutes per point) +ese traces are all time-dependentsequences as shown in Figures 6(a)ndash6(g)

Each sequence is divided into a training set and a testingset We first use the training set to predict the future dataand then these predicting data are compared with thoseactual data in the testing set to evaluate our method In thispaper each training set is set as the first 120 points and thetesting set is defined as the subsequent points such as 6points 12 points and 24 points We set the number of it-erations k 50 and the standard deviation d 02 in EEMDdecomposition

52MeasurementMetrics We evaluate our method in termsof error effectiveness and time-cost analysis as follows

521 Error Analysis To evaluate our method we use themean absolute percentage error (MAPE) to reflect theprediction accuracy MAPE is defined as follows

MAPE 1m

1113944

m

i1

xfi minusxt

i

xti

⎛⎝ ⎞⎠lowast100 (12)

where xfi denotes the value of the prediction point xt

i

denotes the actual value in the testing set and m denotes the

0 100 200 300 400 500 600 700

ndash2

0

2

IMF1

CPU

util

izat

ion

()

(a)

0 100 200 300 400 500 600 700IMF2

ndash2

0

2

CPU

util

izat

ion

()

(b)

0 100 200 300 400 500 600 700IMF3

ndash2

0

2

CPU

util

izat

ion

()

(c)

0 100 200 300 400 500 600 700IMF4

ndash2

0

2

CPU

util

izat

ion

()

(d)

0 100 200 300 400 500 600 700IMF5

CPU

util

izat

ion

()

ndash2

0

2

(e)

0 100 200 300 400 500 600 700IMF6

CPU

util

izat

ion

()

ndash2

0

2

(f )

0 100 200 300 400 500 600 700IMF7

CPU

util

izat

ion

()

ndash2

0

2

(g)

0 100 200 300 400 500 600 700IMF8

CPU

util

izat

ion

()

ndash2

0

2

(h)

0 100 200 300 400 500 600 700R

CPU

util

izat

ion

()

101214161820

(i)

Figure 4 Decomposition results of EEMD

Table 1 RT average periods and RF of efficient IMF and Rcomponents

IMF1 IMF2 IMF3 IMF4 IMF5 IMF8 RRT 431 181 96 47 21 5 2Average period 212 831 1641 3059 673 673 673RF 1 070 060 053 047 0 0

Journal of Electrical and Computer Engineering 7

number of prediction points It is obvious that the predictionaccuracy is higher when the MAPE is lower

522 Effectiveness Analysis Host utilization under-prediction or overprediction can lead to resource under-provision or overprovision Resource underprovisioncannot guarantee applicationsrsquo QoS while resource over-provision can cause resource waste and low resource utili-zation +erefore a good prediction method should avoidunderprediction and overprediction In particular under-prediction should be avoided as much as possible because itresults in a lower QoS to users

We set up the positive and negative errors to reflect theoverprediction and underprediction and then use them toevaluate the effectiveness of our method A good predictionmethod should have a smaller negative error to avoidunderprediction+e positive and negative prediction errorsare calculated by the following formula where pi is thepredicting data ri is the actual data and m is the number ofunderprediction data (ie negative deviation) or over-prediction data (ie positive deviation)

ei 1m

1113944

m

i1pi minus ri

11138681113868111386811138681113868111386811138681113868 (13)

523 Time Cost Analysis Host utilization varies veryquickly in a cloud data center If host utilization prediction isslower than the determination of VM migration resourceprovision will be delayed which can cause poor QoS +ushost utilization prediction must be completed in a timelymanner To investigate the time cost of our proposedmethod we test the running time of the EEMD-RT-ARIMAmethod and compared it with other prediction methodsaccording to the following index tc

tc tour minus tother

totherlowast100 (14)

where tour indicates the running time of our method EEMD-RT-ARIMA and tother represents the running time of othermethods such as the ARIMA model or the EEMD-ARIMA

method tc denotes the percent of the reduced or increasedtime cost

6 Experimental Results and Analysis

To validate the prediction effectiveness of our EEMD-RT-ARIMA method we conduct experiments on ARIMAEEMD-ARIMA and EEMD-RT-ARIMA methods andcompare their predictive results All experiments wereperformed on a PC with 25GHz Intel (R) i7 CPU runningMATLAB To make three methods comparable we use thesame original dataset to execute it 5 times for each method+e mean values of the prediction results are shown in thefollowing tables and figures

61 Error Analysis Table 2 shows the MAPE values of hostutilization predictions for 7 physical hosts We can see thatEEMD-ARIMA and EEMD-RT-ARIMA methods havelowerMAPE values than ARIMAmodels for 6-point and 12-point predictions For example EEMD-ARIMA and EEMD-RT-ARIMA methods achieve MAPE values of 606 and505 for the 6-point prediction of host 109 while theARIMA model has a far higher MAPE value (up to 1685)+ey obtain MAPE values of 1013 and 546 for the 12-point prediction of host 109 while the ARIMA model ob-tains 1108 For host 22 both the EEMD-ARIMA andEEMD-RT-ARIMA methods obtain far lower MAPE valuesof 531 and 542 than the 1066 of the ARIMA modelfor 6-point prediction Similarly they also obtain bettereffectiveness on 12-point prediction +e same situation alsoexists in 6-point and 12-point predictions of other hosts+is indicates that both EEMD-ARIMA and EEMD-RT-ARIMA methods have higher prediction accuracy thanARIMAmodels in 6-point and 12-point predictions for hostutilization EEMD reduces the inherent volatility of the hostutilization sequence which improves the prediction accu-racy of the EEMD-ARIMA and EEMD-RT-ARIMAmethods However the situation changes in 24-point pre-diction +e MAPE values of hosts 1162 424 1060 and 237are all over 30 using these three methods Although theEEMD-RT-ARIMA method has lower MAPE values than

0 100 200 300 400 500 600 700ndash3ndash2ndash1

0123

CPU

util

izat

ion

()

(a)

0 100 200 300 400 500 600 700ndash3ndash2ndash1

012

CPU

util

izat

ion

()

(b)

0 100 200 300 400 500 600 700101214161820

CPU

util

izat

ion

()

(c)

Figure 5 New components (a) High-frequency and strong-volatility component (b) Medium-frequency and weak-volatility component(c) Low-frequency trend component

8 Journal of Electrical and Computer Engineering

the ARIMA and EEMD-ARIMA methods for hosts 839 and109 it has far higherMAPE values in the 24-point predictionthan those of 6-point and 12-point predictions +is showsthat the EEMD-ARIMA and EEMD-RT-ARIMA methodsare not suitable for long-term but suitable for short-termprediction

For further analysis we find that the EEMD-RT-ARIMAmethod achieves lower prediction error than the EEMD-ARIMA method for the 6-point and 12-point predictions ofhosts 839 109 and 1162 although the EEMD-RT-ARIMAmethod only selects efficient IMF components However itis the opposite for hosts 22 424 1060 and 237 +e original

CPU utilization sequences of all physical hosts are identicalin frequency so we calculate the RT value of each CPUutilization sequence shown in Table 3 Hosts 839 109 and1162 achieve lower RT values under 10 which shows thattheir CPU utilization is more stationary than other hostsSmaller RT values indicate more stationary host utilizationsequences +is phenomenon can also be seen inFigures 6(a)ndash6(c) From Tables 2 and 3 it can be found thatthe EEMD-RT-ARIMA method achieves a lower MAPEvalue than the EEMD-ARIMA method if the RT value issmaller Conversely the EEMD-RT-ARIMA method has ahigher MAPE value than the EEMD-ARIMA method if the

(1) For each new component(2) Set the order of difference d 0(3) Execute the augmented Dickey-Fuller (ADF) test If it is a stationary time series go to step 5 else go to step 4 until it is stationary(4) Difference the time series and set d d + 1 go to step 3(5) Determine the order of the ARIMA model using Bayesian information criterion (BIC)(6) Estimate the parameters of the ARIMA model using the maximum likelihood(7) Forecast the future n values of this new component using the ARIMA model(8) End(9) Obtain the overall prediction results by superposing the prediction results of each new component

ALGORITHM 1 ARIMA prediction

0204060

CPU

utili

zatio

n(

)

(a)

0204060

CPU

utili

zatio

n(

)

(b)

0204060

CPU

utili

zatio

n(

)

(c)

0204060

CPU

utili

zatio

n(

)

(d)

CPU

utili

zatio

n(

)

0204060

(e)

CPU

utili

zatio

n(

)

0204060

(f )

CPU

utili

zatio

n(

)

0204060

(g)

Figure 6 CPU utilization traces of 7 physical hosts (a) Host 839 (b) Host 109 (c) Host 1162 (d) Host 22 (e) Host 424 (f ) Host 1060 (g)Host 237

Journal of Electrical and Computer Engineering 9

RTvalue is larger For instance hosts 839 109 and 1162 withsmaller RT values obtain lower MAPE values using theEEMD-RT-ARIMA method than the EEMD-ARIMAmethod while hosts 22 424 1060 and 237 with largerRT values obtain higher MAPE values using the EEMD-RT-ARIMA method than the EEMD-ARIMA method

Furthermore the difference in theMAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods decreaseswith the increase in the RTvalues from host 839 to host 1162+eir difference changes to negative from host 22 whichindicates that the EEMD-ARIMA method has higher pre-diction accuracy than the EEMD-RT-ARIMA method+en their difference becomes larger as the RT values in-crease +e 12-point host CPU utilization prediction illus-trates this situation For example host 839 with an RTvalueof 2 has a MAPE of 503 for 12-point prediction by usingthe EEMD-RT-ARIMA method which is 545 lower thanthe 1048 of the EEMD-ARIMA method +e MAPE valueof the EEMD-RT-ARIMA method is only 419 lower thanthat of the EEMD-ARIMAmethod for host 1162 with an RTvalue of 10 For host 22 with an RT value of 16 the EEMD-RT-ARIMA method has a slightly higher MAPE of 537than 533 of the EEMD-ARIMAmethodWith the increaseof the RT value the differences of MAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods furtherincrease to 087 196 and 463 for hosts 424 1060 and237 respectively+is indicates that the EEMD-RT-ARIMAmethod is less effective than the EEMD-ARIMA method inCPU utilization prediction for these hosts Undoubtedlythe ARIMA prediction of each component decomposed by

the EEMD method generates a certain error +e super-position of the prediction results of each component causeserror accumulation +e EEMD-RT-ARIMA method re-duces the error accumulation by selecting and recon-structing the efficient IMF components into fewercomponents so it achieves better prediction accuracy thanthe EEMD-ARIMA method for hosts 839 109 and 1162Certainly the selection and reconstruction of efficient IMFcomponents also cause a certain prediction error due to theabsence of nonefficient components especially for non-stationary host utilization sequences When this kind ofprediction error exceeds the error accumulation of ARIMAprediction of all components in the EEMD-ARIMAmethod the EEMD-RT-ARIMA method is no more ef-fective than the EEMD-ARIMA method for the non-stationary CPU utilization prediction of some hosts suchas hosts 22 424 1060 and 237

62 Effectiveness Analysis To verify the effectiveness of ourmethod in short-term prediction we select the experimentalresults of hosts 839 22 and 237 with the minimum middleand maximum RT values for further analysis Figure 7 showsthe prediction results of the EEMD-RT-ARIMAARIMA andEEMD-ARIMA methods We find that the future resourceutilization of host 839 decreases below 11 According to apredefined policy host 839 is underloaded and can be closedto save energy Figure 7 shows that our method is moreaccurate and effective than the ARIMA model In particularour method tends to change with the trend of data variationwhile the ARIMAmodel cannot keep up with it Our methodis more suitable for handling nonstationary time series thanthe ARIMAmodel Additionally the predicting data using theEEMD-RT-ARIMAmethod are closer to the testing data thanthose of the EEMD-ARIMA method for host 839 +ese

Table 2 MAPE values of host utilization prediction

Host ID Prediction length ARIMA () EEMD-ARIMA () EEMD-RT-ARIMA ()

8396-point 977 473 46112-point 1378 1048 50324-point 2310 2153 882

1096-point 1685 606 50512-point 1108 1013 54624-point 4134 2468 836

11626-point 2445 776 64512-point 1781 1314 89524-point 2193 4141 3345

226-point 1066 531 54212-point 822 533 53724-point 1141 1844 1477

4246-point 3754 1612 176512-point 2174 1659 174624-point 8868 7739 1157

10606-point 3571 1005 139412-point 3760 1744 194024-point 7472 5482 9193

2376-point 2229 785 117812-point 2551 1130 159324-point 12611 15843 18951

Table 3 RT values of each host utilizationHost ID 839 109 1162 22 424 1060 237RT value 2 6 10 16 22 24 38

10 Journal of Electrical and Computer Engineering

results show that the EEMD-RT-ARIMA method is moreeffective than the EEMD-ARIMAmethod for CPU utilizationsequences with weak fluctuations When the host utilizationsequence shows stronger fluctuation the absence of non-efficient IMF components will greatly influence the predictionresults +e EEMD-RT-ARIMA method is no more effectivethan the EEMD-ARIMA method for CPU utilization pre-diction of host 237

To further analyze the effectiveness of our method wecalculate the positive and negative errors for 6-point and 12-point predictions of these hosts shown in Table 4 When thenegative error is smaller the prediction method is moresuitable for cloud resource provision because of avoidingunderprediction It can be observed that most of the pre-diction results of the ARIMAmodel are underpredicted (thecells of positive error are all ldquonullrdquo for hosts 839 and 22)Furthermore the negative errors of the ARIMA model are

all far higher than those of other methods for host 237 Forinstance the ARIMA model has a high negative error of upto 2751 for the 12-point prediction of host 237 while theEEMD-ARIMA and EEMD-RT-ARIMA methods only havenegative errors of 800 and 892 respectively If theARIMA model is used to predict future host utilization itcan cause resource underprovision which cannot ensureapplicationsrsquo QoS +e EEMD-RT-ARIMA method achievessmaller negative errors than the EEMD-ARIMA method forhosts 839 and 22 while it has a larger negative error for hosts237 For instance the EEMD-ARIMA method achieves thenegative error of 1071 for the 12-point prediction of host839 while EEMD-RT-ARIMA only achieves the negativeerror of 474 Similarly the EEMD-ARIMA method ob-tains a negative error of 609 for the 12-point prediction ofhost 22 while the EEMD-RT-ARIMA method achieves alower value of only 505 However the situation changes

6789

101112

0 1 2 3 4 5 6 7Host 839 6-point prediction

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(a)

6789

101112

0 2 4 6 8 10 12

CPU

util

izat

ion

()

Host 839 12-point prediction

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(b)

262830323436

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 6-point prediction

(c)

262830323436

0 2 4 6 8 10 12CP

U u

tiliz

atio

n (

)

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 12-point prediction

(d)

1214161820222426

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 6-point prediction

(e)

0 2 4 6 8 10 121214161820222426

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 12-point prediction

(f )

Figure 7 Prediction of the results of different methods

Journal of Electrical and Computer Engineering 11

for host 237 +e EEMD-ARIMA method has a smallernegative error than the EEMD-RT-ARIMA method Forinstance the EEMD-RT-ARIMA method obtains negativeerrors of 1137 and 892 for 6-point and 12-point pre-dictions while the EEMD-ARIMAmethod only has negativeerrors of 462 and 800

63 Time-Cost Analysis To verify the applicability of ourmethod we further compare the time cost of these methodsin Figure 8 +e running time of the EEMD-ARIMAmethod is the largest by over 180 s while the ARIMAmodel takes the least time at less than 50 s +e EEMD-RT-ARIMA method time cost is between 70 s and 117 s whichdecreases the time cost by 40ndash80 compared with theEEMD-ARIMA method For example the running time ofthe EEMD-RT-ARIMA method is 6937 s far less than the33720 s of the EEMD-ARIMA method for the 6-pointprediction of host 22 Our method saves up to 80 of thetime cost For the CPU utilization sequence of host 237with strong variability it requires 19046 s to predict thefuture 6-point values using the EEMD-ARIMA methodwhile it only takes 11364 s using the EEMD-RT-ARIMAmethod +e running time is reduced by approximately40 Considering the prediction accuracy effectivenessand time cost our EEMD-RT-ARIMA method is morecost-effective for short-term host utilization prediction incloud computing

7 Conclusions

Host utilization is an indicator of host performancewhose prediction can promote effective resource sched-uling in cloud computing However host utilizationdemonstrates strong randomness and instability caused byusersrsquo random and various resource demands It is difficultto improve prediction accuracy In this paper we proposea hybrid and cost-effective method EEMD-RT-ARIMAfor short-term host utilization prediction in cloud com-puting +e EEMD method is first used to decompose thenonstationary host utilization sequence into a few rela-tively stationary IMF components and an R component+en we calculate the correlation coefficient between eachIMF component and the original data to select efficientIMF components and use RTvalues and average periods toreconstruct these components into three new componentsto reduce error accumulation and time cost Finally three

new components are predicted by the ARIMA model andtheir prediction results are superposed to form the overallprediction results We use the real host utilization tracesfrom a cloud platform to conduct the experiments andcompare our EEMD-RT-ARIMA method with theARIMA model and EEMD-ARIMA method in terms oferror effectiveness and time-cost analysis +e resultsshow that our method is cost-effective and is more suitablefor short-term host utilization prediction in cloudcomputing

Data Availability

+e running example and experimental data used to supportthe findings of this study have been deposited in the Figsharerepository (httpsdoiorg106084m9figshare7679594)

Conflicts of Interest

+e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

+is work was supported by the Shandong ProvincialNatural Science Foundation (ZR2016FM41)

Table 4 Positive and negative error analysis

HostID

Predictionlength

ARIMA EEMD-ARIMA EEMD-RT-ARIMAPositive error

()Negative error

()Positive error

()Negative error

()Positive error

()Negative error

()

839 6-point Null 977 316 585 492 41512-point Null 1378 343 1071 480 474

22 6-point Null 1066 391 670 412 63412-point Null 972 426 609 539 505

237 6-point 039 2668 1431 462 1381 113712-point 039 2751 1744 800 1790 892

0

50

100

150

200

250

300

350

12-point

Tim

e cos

t (s)

ARIMAEEMD-ARIMA EEMD-RT-ARIMA

6-pointHost 839

12-point6-pointHost 22

12-point6-pointHost 237

Figure 8 Time cost of different prediction methods

12 Journal of Electrical and Computer Engineering

References

[1] J J Prevost K Nagothu B Kelley and M Jamshidi ldquoPre-diction of cloud data center networks loads using stochasticand neural modelsrdquo in Proceedings of 2011 6th InternationalConference on System of Systems Engineering pp 276ndash281IEEE Albuquerque NM USA June 2011

[2] M Borkowski S Schulte and C Hochreiner ldquoPredictingcloud resource utilizationrdquo in Proceedings of 9th InternationalConference on Utility and Cloud Computing (UCC) pp 37ndash42IEEE Shanghai China December 2016

[3] M Barati and S Sharifian ldquoA hybrid heuristic-based tunedsupport vector regression model for cloud load predictionrdquoJournal of Supercomputing vol 71 no 11 pp 4235ndash42592015

[4] Z Chen Y Zhu Y Di S Feng and J Geng ldquoA high-accuracyself-adaptive resource demands predicting method in IaaScloud environmentrdquo Neural Network World vol 25 no 5pp 519ndash540 2015

[5] J Chen and Y Wang ldquoA resource demand prediction methodbased on EEMD in cloud computingrdquo Procedia ComputerScience vol 131 pp 116ndash123 2018

[6] H Toumi Z Brahmi Z Benarfa and M M GammoudildquoServer load prediction using stream miningrdquo in Proceedingsof 2017 International Conference on Information Networking(ICOIN) pp 653ndash661 IEEE Da Nang Vietnam January2017

[7] S Di D Kondo and W Cirne ldquoHost load prediction in aGoogle compute cloud with a Bayesian modelrdquo in Proceedingsof 2012 International Conference on High PerformanceComputing Networking Storage and Analysis (SCrsquo12)pp 1ndash11 IEEE Salt Lake City UT USA November 2012

[8] N K Gondhi and P Kailu ldquoPrediction based energy efficientvirtual machine consolidation in cloud computingrdquo in Pro-ceedings of 2015 Second International Conference on Advancesin Computing and Communication Engineering pp 437ndash441IEEE Dehradun India May 2015

[9] A Verma G Dasgupta T K Nayak P De and R KotharildquoServer workload analysis for power minimization usingconsolidationrdquo in Proceedings of the 2009 Conference onUSENIX Annual Technical Conference p 28 USENIX As-sociation San Diego CA USA June 2009

[10] B Song Y Yu Y Zhou Z Wang and S Du ldquoHost loadprediction with long short-term memory in cloud comput-ingrdquo Journal of Supercomputing vol 74 no 12 pp 6554ndash6568 2018

[11] J-J Jheng F-H Tseng H-C Chao and L-D Chou ldquoA novelVM workload prediction using grey forecasting model incloud data centerrdquo in Proceedings of International Conferenceon Information Networking 2014 (ICOIN2014) pp 40ndash45IEEE Phuket +ailand February 2014

[12] A Beloglazov and R Buyya ldquoManaging overloaded hosts fordynamic consolidation of virtual machines in cloud datacenters under quality of service constraintsrdquo IEEE Trans-actions on Parallel and Distributed Systems vol 24 no 7pp 1366ndash1379 2013

[13] M Dabbagh B Hamdaoui M Guizani and A Rayes ldquoAnenergy-efficient VM prediction and migration framework forovercommitted cloudsrdquo IEEE Transactions on Cloud Com-puting vol 6 no 4 pp 955ndash966 2018

[14] D Minarolli A Mazrekaj and B Freisleben ldquoTackling un-certainty in long-term predictions for host overload andunderload detection in cloud computingrdquo Journal of CloudComputing vol 6 no 1 p 4 2017

[15] K Mason M Duggan E Barrett J Duggan and E HowleyldquoPredicting host CPU utilization in the cloud using evolu-tionary neural networksrdquo Future Generation Computer Sys-tems vol 86 pp 162ndash173 2018

[16] D Magalhatildees R N Calheiros R Buyya and D G GomesldquoWorkload modeling for resource usage analysis and simu-lation in cloud computingrdquo Computers amp Electrical Engi-neering vol 47 pp 69ndash81 2015

[17] C Tian Y Wang Y Luo et al ldquoMinimizing content re-organization and tolerating imperfect workload prediction forcloud-based video-on-demand servicesrdquo IEEE Transactionson Services Computing vol 9 no 6 pp 926ndash939 2016

[18] M Verma G R Gangadharan V Ravi and N NarendraldquoResource demand prediction in multi-tenant service cloudsrdquoin Proceedings of 2013 IEEE International Conference on CloudComputing in Emerging Markets (CCEM) pp 1ndash8 IEEEBangalore India October 2013

[19] W Zhang Y Shi L Liu L Cui and Y Zheng ldquoPerformanceand resource prediction at high utilization for n-tier servicesystems in cloud an experiment driven approachrdquo in Pro-ceedings of 2015 IEEE International Conference on Computerand Information Technology Ubiquitous Computing andCommunications Dependable Autonomic and Secure Com-puting Pervasive Intelligence and Computing pp 843ndash848IEEE Liverpool UK October 2015

[20] G Kecskemeti A Kertesz and Z Nemeth ldquoCloud workloadprediction by means of simulationsrdquo in Proceedings of theComputing Frontiers Conference on ZZZ (CFrsquo17) pp 279ndash282ACM Siena Italy May 2017

[21] Y Chen and Z-A Jiang ldquoDynamically predicting the qualityof service batch online and hybrid algorithmsrdquo Journal ofElectrical and Computer Engineering vol 2017 Article ID9547869 10 pages 2017

[22] A Khan X Yan S Tao and N Anerousis ldquoWorkloadcharacterization and prediction in the cloud a multiple timeseries approachrdquo in Proceedings of 2012 IEEE Network Op-erations and Management Symposium pp 1287ndash1294 IEEEMaui HI USA April 2012

[23] A K Mishra J L Hellerstein W Cirne and C R DasldquoTowards characterizing cloud backend workloadsrdquo ACMSIGMETRICS Performance Evaluation Review vol 37 no 4pp 34ndash41 2010

[24] D Gmach J Rolia L Cherkasova and A KemperldquoWorkload analysis and demand prediction of enterprise datacenter applicationsrdquo in Proceedings of 2007 IEEE 10th In-ternational Symposium on Workload Characterizationpp 171ndash180 IEEE Boston MA USA September 2007

[25] F-H Tseng X Wang L-D Chou H-C Chao andV C M Leung ldquoDynamic resource prediction and allocationfor cloud data center using the multiobjective genetic algo-rithmrdquo IEEE Systems Journal vol 12 no 2 pp 1688ndash16992018

[26] G K Shyam and S S Manvi ldquoVirtual resource prediction incloud environment a Bayesian approachrdquo Journal of Networkand Computer Applications vol 65 pp 144ndash154 2016

[27] Y Lu J Panneerselvam L Liu and Y Wu ldquoRVLBPNN aworkload forecasting model for smart cloud computingrdquoScientific Programming vol 2016 Article ID 5635673 9 pages2016

[28] K Rajaram and M P Malarvizhi ldquoUtilization based pre-diction model for resource provisioningrdquo in Proceedings of2017 International Conference on Computer Communicationand Signal Processing (ICCCSP) pp 1ndash6 IEEE ChennaiIndia January 2017

Journal of Electrical and Computer Engineering 13

[29] L Li and A Zhang ldquoResource demand optimization com-bined prediction under cloud computing environment basedon IOWGA operatorrdquo International Journal of Grid andDistributed Computing vol 8 no 3 pp 77ndash86 2015

[30] D Minarolli and B Freisleben ldquoCross-correlation predictionof resource demand for virtual machine resource allocation incloudsrdquo in Proceedings of 2014 Sixth International Conferenceon Computational Intelligence Communication Systems andNetworks pp 119ndash124 IEEE Tetova Macedonia May 2014

[31] W Zhang P Duan L T Yang et al ldquoResource requestsprediction in the cloud computing environment with a deepbelief networkrdquo Software Practice and Experience vol 47no 3 pp 473ndash488 2017

[32] H-B Mi H-M Wang G Yin D-X Shi Y-F Zhou andL Yuan ldquoResource on-demand reconfiguration method forvirtualized data centersrdquo Journal of Software vol 22 no 9pp 2193ndash2205 2011

[33] R N Calheiros E Masoumi R Ranjan and R BuyyaldquoWorkload prediction using ARIMAmodel and its impact oncloud applicationsrsquo QoSrdquo IEEE Transactions on CloudComputing vol 3 no 4 pp 449ndash458 2015

[34] Y Meng R Rao X Zhang and P Hong ldquoCRUPA a con-tainer resource utilization prediction algorithm for auto-scaling based on time series analysisrdquo in Proceedings of2016 International Conference on Progress in Informatics andComputing (PIC) pp 468ndash472 IEEE Shanghai China De-cember 2016

[35] E Dhib N Zangar N Tabbane and K Boussetta ldquoImpact ofseasonal ARIMA workload prediction model on QoE formassively multiplayers online gamingrdquo in Proceedings of 20165th International Conference on Multimedia Computing andSystems (ICMCS) pp 737ndash741 IEEE Marrakech MoroccoSeptember 2016

[36] A Ganapathi Y Chen A Fox R Katz and D PattersonldquoStatistics-driven workload modeling for the cloudrdquo inProceedings of 2010 IEEE 26th International Conference onData EngineeringWorkshops (ICDEW 2010) pp 87ndash92 IEEELong Beach CA USA March 2010

[37] V G Tran V Debusschere and S Bacha ldquoHourly serverworkload forecasting up to 168 hours ahead using seasonalARIMA modelrdquo in Proceedings of 2012 IEEE InternationalConference on Industrial Technology pp 1127ndash1131 IEEEAthens Greece March 2012

[38] D Xu S Yang and H Luo ldquoResearch on generalized fuzzysoft sets theory based combined model for demanded cloudcomputing resource predictionrdquo Chinese Journal of Man-agement Science vol 23 no 5 pp 56ndash64 2015

[39] S Li Y Wang X Qiu D Wang and L Wang ldquoA workloadprediction-basedmulti-VM provisioningmechanism in cloudcomputingrdquo in Proceedings of 2013 15th Asia-Pacific NetworkOperations andManagement Symposium (APNOMS) pp 1ndash6IEEE Hiroshima Japan September 2013

[40] X Fu and C Zhou ldquoPredicted affinity based virtual machineplacement in cloud computing environmentsrdquo IEEE Trans-actions on Cloud Computing vol 99 p 1 2017

[41] Y Jiang C Perng T Li and R Chang ldquoASAP a self-adaptiveprediction system for instant cloud resource demand pro-visioningrdquo in Proceedings of 2011 IEEE 11th InternationalConference on Data Mining pp 1104ndash1109 IEEE VancouverBC Canada December 2011

[42] Z Wu and N E Huang ldquoEnsemble empirical mode de-composition a noise-assisted data analysis methodrdquo Ad-vances in Adaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[43] H Zang L Fan M Guo Z Wei G Sun and L ZhangldquoShort-term wind power interval forecasting based on anEEMD-RT-RVMmodelrdquo Advances in Meteorology vol 2016Article ID 8760780 10 pages 2016

[44] N Safari C Y Chung and G C D Price ldquoNovel multi-stepshort-term wind power prediction framework based onchaotic time series analysis and singular spectrum analysisrdquoIEEE Transactions on Power Systems vol 33 no 1 pp 590ndash601 2018

[45] X Chen H Wang J Huang and H Ren ldquoAPU degradationprediction based on EEMD and Gaussian process regressionrdquoin Proceedings of 2017 International Conference on SensingDiagnostics Prognostics and Control (SDPC) pp 98ndash104IEEE Shanghai China August 2017

[46] C Yan C Yi XWu et al ldquoTurbine fault trend prediction thatbased on EEMD and ARIMA modelsrdquo Journal of GansuSciences vol 28 no 4 pp 100ndash106 2016

[47] M Jin P Li L Zhang et al ldquoA signal feature method and itsapplication based on EEMD fuzzy entropy and GK cluster-ingrdquo ACTA Metrologica Sinica vol 26 no 5 pp 501ndash5052015

[48] E H Norden Z Shen R L Steven et al ldquo+e empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysisrdquoin Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 no 1971 pp 903ndash995 RoyalSociety London UK March 1998

[49] Alibaba cluster-trace-v2017 httpsgithubcomalibabaclusterdata

14 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: A Hybrid Method for Short-Term Host Utilization Prediction ...downloads.hindawi.com/journals/jece/2019/2782349.pdf · terminesthe migratedVMs according to powersavings and workloadbalance.Dabbaghetal.[13]proposedaprediction

[6ndash10] VM load prediction [11 12] VM utilization pre-diction [13 14] host utilization prediction [15] web ap-plication workload prediction [16] cloud service workloadprediction [17ndash19] workflow workload prediction [20]service quality prediction [21] and workload characteriza-tion [22ndash24] Toumi et al [6] described a server loadaccording to the submitted task types and the submissionrate and applied a stream mining technique to predict serverloads Jheng et al [11] proposed a VM workload predictionmethod based on the gray forecasting model which de-termines the migrated VMs according to power savings andworkload balance Dabbagh et al [13] proposed a predictionapproach that uses Wiener filters to predict the future re-source utilization of VMs Mason et al [15] predicted hostCPU utilization for a short time using evolutionary neuralnetworks which showed a high prediction accuracy and ahigh degree of generality In this paper we focus on hostutilization prediction using EEMD and ARIMA methods tonot only improve prediction accuracy but also reduceprediction time as much as possible

From the perspective of approaches predictionmethods are usually divided into two categories One isbased on machine learning methods Tseng et al [25]proposed a prediction method for CPU and memoryutilization of VMs and physical machines based on a ge-netic algorithm (GA) which precedes the gray model understable tendency and unstable tendency in terms of pre-diction accuracy Shyam and Manvi [26] proposed a short-and long-term prediction model of virtual resource re-quirements for CPUmemory-intensive applications basedon Bayesian networks where the relationships and de-pendencies between variables are identified to facilitateresource prediction Lu et al [27] proposed a workloadprediction model RVLBPNN (Rand Variable Learning RateBackpropagation Neural Network) based on BPNN algo-rithm which achieves higher prediction accuracy than thehidden Markov model and the naive Bayes classifier +ismethod not only predicts CPU-intensive and memory-intensive workloads but also improves prediction accu-racy by using the intrinsic relations among the arrivingcloud workloads Rajaram and Malarvizhi [28] comparedthe prediction accuracies of a few machine learningmethods such as LR SVR and multiplayer perceptron Liand Zhang [29] proposed an optimal combination pre-diction method for resource demands which combines theinduced ordered weighted geometry averaging operatorand the generalized dice coefficient with the improvedElman neural network and gray model to enhance theprediction accuracy Minarolli and Freisleben [30] pre-sented a cross-correlation prediction approach based onsupport vector machine (SVM) which considers the crossrelation of VMs running the same application to improveprediction accuracy Zhang et al [31] proposed a deepbelief network- (DBN-) based prediction approach of cloudresource requests in which orthogonal experimental designand analysis of variance are used to enhance the predictionaccuracy Compared with the ARIMA model this methodgreatly reduces mean square error (MSE) by over 60 forCPU and RAM request predictions Although machine

learning methods are effective in improving predictionaccuracy they are complex and usually demand a largenumber of data to extract features and train a model Itrequires too much time for the prediction to guarantee QoSof the running applications Cloud computing requires asimple and rapid host utilization prediction method tosupport resource allocation and scheduling

Another method is based on statistical methods such asBrownrsquos quadratic exponential smoothing method [32]autoregressive integrated moving average (ARIMA) model[33ndash35] and the kernel canonical correlation analysis [36]Tran et al [37] applied the ARIMA model in the long-termprediction of server workload while our method aims topredict short-term host utilization It is more difficult be-cause host utilization can be extremely random and non-stationary in a short time Calheiros et al [33] proposed ashort-term prediction model of cloud workload using theARIMAmodel and evaluated the prediction accuracy and itsimpact on user applicationsrsquo QoS +ey suggested that usersrsquobehaviors must be considered to reflect real conditions inworkload simulation Our method combines the ARIMAmodel with EEMD and RT methods to improve predictionaccuracy and reduce prediction time as much as possible Itis compared with EEMD-ARIMA and ARIMA methods interms of error effectiveness and time-cost analysis

Moreover some studies combine the ARIMA modelwith other techniques to improve prediction accuracy Xuet al [38] constructed a model GFSS-ANFISSARIMAcombining the seasonal ARIMA model with the general-ized fuzzy soft sets and adaptive neuro-fuzzy inferencesystem +is model improves the prediction accuracy ofresource demands Li et al [39] proposed a workloadpredictor combined with ARIMA and dynamic errorcompensation to reduce the service-level agreement (SLA)default rate Fu and Zhou [40] proposed a predicted af-finity model to implement VM placement which uses theresource demands predicted by the ARIMA model tocalculate a VM-host affinity value Jiang et al [41] pre-sented a self-adaptive ensemble prediction method forcloud resource demands which uses a two-level ensemblemethod to predict VM demands based on a historic timeseries +is method not only combines multiple predictionmethods moving average (MA) autoregressive (AR)artificial neural network (ANN) gene expression pro-gramming (GEP) and SVM but also adjusts the weight ofeach method adaptively to obtain the best average per-formance according to the relative errors In contrast ourmethod uses the EEMD method to deal with the non-stationary host utilization and then selects and re-constructs efficient components to improve predictionaccuracy and reduce the time cost +e EEMD proposed byWu and Huang [42] is an effective noise-aided method thatcan handle nonlinear and nonstationary time series It hasbeen widely used in wind speed forecasting [43 44]aircraft auxiliary power unit (APU) degradation pre-diction [45] turbine fault trend prediction [46] androlling bearing fault diagnosis [47] It has shown a goodeffect on enhancing the prediction accuracy Our methodalso uses EEMD to decompose the nonstationary host

Journal of Electrical and Computer Engineering 3

utilization for improving the prediction accuracy andfurther uses correlation coefficients RT values and av-erage periods to select and reconstruct efficient compo-nents for reducing prediction error accumulation andprediction time

3 Background

31 Empirical Mode Decomposition (EMD) EMD is amethod of signal processing that can decompose a signalinto multiple IMFs and an R trend item [48] Two conditionsmust be satisfied for an IMF

(1) +e number of extrema and zero-crossings musteither be identical or differ by at most one

(2) +e mean value of the envelopes of the local maximaand the local minima must be zero

EMD includes the following steps

Step 1 Make f(t) x(t) where x(t) is given as theoriginal dataStep 2 Find all the local maxima and minima of fi(t)where i is the loop times and its initial value is 1Interpolate between the local maxima and minima toobtain an upper envelope and a lower envelope andthen compute the mean value mi(t) of these envelopesStep 3 Compute the new component hi(t) fi(t)minusmi(t)Step 4 Verify whether hi(t) satisfies the above-mentioned two conditions for an IMF If it does notmake fi+1(t) hi(t) and repeat steps 2 and 3 If itsatisfies the condition hi(t) is regarded as the first IMFcomponent p1(t) where p1(t) hi(t) +en computethe R component by the formula r1(t) x(t)minusp1(t)Step 5 Repeat step 1ndash4 with r1(t) as the new data untilthe R is a monotonic function +us x(t) is decom-posed into n IMFs and an R as follows

x(t) 1113944n

i1pi(t) + r(t) (1)

32 Ensemble Empirical Mode Decomposition (EEMD)+e EMD method has a noticeable drawback of modemixing that can cause signal intermittency Wu and Huangproposed a new method named ensemble empirical modedecomposition (EEMD) to solve this problem Comparedwith the EMD method the EEMD method first executesthe decomposition process k times Each time it adds adifferent white noise to the signal and then decomposes thenew signal Generally the k iterations are set as an integerin the range [50 100] and the standard deviation d of thewhite noise is set as a value in the range [01 02] Next k

groups of decomposition results are obtained Each groupincludes n IMFs pmi(t)(i 1 n) and an R rm(t) wherem denotes the group number Finally the mean values ofthese groups of IMFs and Rs are calculated as the finalIMFs pi(t)(i 1 n) and the R r(t)

pi(t) 1113936

km1pmi(t)

k

r(t) 1113936

km1rm(t)

k

(2)

+e IMF components have three main characteristics

(1) Completeness the total of all IMFs and the R havethe same feature as the original data

(2) Orthogonality each IMF with a certain physicalmeaning is independent and has no effect on otherIMFs +e product of any two IMFs equals 0 inmathematics

(3) Adaptability an IMF with a higher frequency isdecomposed from the original data faster than thosewith low frequencies +e frequencies of IMFs reflectthe features of the original data

33 Runs Test (RT) RT is a nonparametric test method thatchecks the randomness of a sequence with only two symbolsor two values such as + and minus and 0 and 1 An RT is definedas a sequence with successive symbols (0 or 1) For examplea data sequence ldquo11110000011111000110010rdquo includes 8runs 4 of which involve successive ldquo1rdquo and the others involvesuccessive ldquo0rdquo RT can also be used to test a time series

Assume that M ltm1(t) mi(t) mn(t)gt de-notes a time series where mi(t) is an element of this timeseries and n is the total number of elements +e mean valueof these elements is calculated by the following formula

M 1n

1113944

n

i1mi(t) (3)

+en the element of this time series can be denoted asfollows

Gi mi(t)minusM 1 mi(t)geM

0 mi(t)ltM

⎧⎨

⎩ (4)

+us this time series is transformed into a sequence witha series of 0 and 1 in which the elements are independentand identically distributed +e total number of RT reflectsthe fluctuation of the sequence

4 A Hybrid Method for Short-Term HostUtilization Prediction

To improve prediction accuracy and reduce prediction timeof the EEMD-ARIMAmethod we propose a hybrid methodEEMD-RT-ARIMA for short-term host utilization pre-diction as shown in Figure 2 First the host utilization se-quence is decomposed into multiple IMF components andthe R component using the EEMD method Next we cal-culate the correlation coefficients between IMF componentsand the original data sequence to select the efficient IMFcomponents+en we use RTvalues and average periods toreconstruct these efficient IMF and R components intothree new components high-frequency and strong-

4 Journal of Electrical and Computer Engineering

volatility component medium-frequency and weak-volatility component and low-frequency trend compo-nent +en we use the ARIMA method to predict theresults of three new components Finally the overallprediction results are achieved by summing the predictionresults of the three new components

+e key to our EEMD-RT-ARIMA method is to selectand reconstruct efficient components Compared with theEEMD-ARIMA method the number of its componentsinvolving in ARIMA prediction is reduced +us theEEMD-RT-ARIMA method can reduce the prediction erroraccumulation and the total prediction time by reducing thenumber of components Obviously both the EEMD-RT-ARIMA method and EEMD-ARIMA method have a higherprediction time than the ARIMA model from their imple-mentation processes However our EEMD-RT-ARIMAmethod focuses on cost-effectiveness which has a trade-off among prediction accuracy effectiveness and time cost

41 Use of EEMD toDecompose theHostUtilization SequenceA host utilization sequence is classified into different cate-gories according to the CPU memory and disk such as CPUutilization sequence Tcpu c1 ci cn1113864 1113865 +e CPUutilization sequence is usually random and unstable owing torandom and sudden resource demands in cloud computing Itis necessary to transform such data into relatively stationarydata to improve prediction accuracy +e EEMD methodappears to be more effective in processing nonlinear andnonstationary data sequences than other decomposition al-gorithms+erefore we use the EEMDmethod to decomposethe host utilization sequence and obtain a series of the IMFi

components and the R componentA running example shows the nonstationary CPU uti-

lization trace of a physical host from our cloud platform We

divide it into the training set (673 data points) and thetesting set (24 data points) in Figure 3 +en we use theEEMD method to decompose the training set and obtainIMF1-IMF8 components and the R component +ey areshown from the high frequency to low frequency in Figure 4

42 Calculation of the Correlation Coefficients to Select Effi-cient IMF Components A correlation coefficient measuresthe correlation between two sequences We calculate thecorrelation coefficient Pj(X Y) between the IMFj compo-nent and the original training set based on the followingformula where cov(X Y) is a covariance between the se-quences X and Y and Var(X) and Var(Y) are the variancesof the sequence X and the sequence Y

Pj(X Y) Cov(X Y)

Var(X)

1113968 Var(Y)

1113968 (5)

+en the correlation coefficient Pj(X Y) is checked todetermine whether it is negative If it is negative the IMFj

component is inefficient and dropped If it is not negativethe IMFj component is efficient and reserved

We calculate the correlation coefficient between eachIMF component and the original training set Only IMF6and IMF7 have negative correlation coefficients of minus008 andminus015 Hence they are dropped IMF1ndashIMF5 and IMF8 areselected as efficient IMF components

43 Reconstruction of Efficient IMFs and R into NewComponents Each IMF component actually reflects a cer-tain physical feature of the original data If some IMFcomponents are closer in terms of frequency and amplitudefluctuation then they have similar features+us they can bereconstructed into a new component with these typical

hellip

Use EEMD to decompose the host utilization sequence

IMF1 IMF2 IMFhndash1 R

High-frequency and strong-volatility component

hellip

Medium-frequency and weak-volatility component

Low-frequency trend component

Use ARIMA to predict the result of this component

Use correlation coefficients to select efficient IMFs

IMF1 IMFm

Use RT values and average periods to reconstruct efficient IMFs and R intothree new components

Use ARIMA to predict the result of this component

Use ARIMA to predict the result of this component

Construct the overall prediction result by summing the prediction results of three new components

Figure 2 EEMD-RT-ARIMA method

Journal of Electrical and Computer Engineering 5

features +us the prediction error accumulation and theprediction time of the EEMD-ARIMA method can be re-duced by reducing the number of components

+e average period reflects the frequency of host utili-zation variation +ere exists a reciprocal relation betweenthem +e smaller the average period the higher the fre-quency If the average periods of IMF components are closerthey are closer in frequency +e average period is calculatedby the following formula in which n is the number of thetraining set and lj is the number of extrema

Tj n

lj (6)

Similarly the RT value reflects the trend of amplitudefluctuation If the RTvalue is larger the amplitude volatility isstronger If the RT values of the two IMFs are closer theoverall trend of the two IMFs is similar in amplitude volatility

To enhance the prediction accuracy and reduce theprediction time of the EEMD-ARIMA method we re-construct the IMF components and the R component intothree new components according to their average periodsand RT values in the EEMD-RT-ARIMA method Becausethe average period and RT value have different units wenormalize the average period Tj as follows

Tnj Tj minusTmin

Tmax minusTmin (7)

where Tnj denotes the normalized average period of theIMFj component Tmax and Tmin represent the maximumandminimum of the average periods of all IMF componentsSimilarly the RT value Rj can be normalized as follows

Rnj Rj minusRmin

Rmax minusRmin (8)

where Rnj is the normalized value of Rj Rmax and Rmin arethe maximum and minimum of all RT values +us thereconstruction factor (RF) is defined as follows

Fj α middot 1minusTnj1113872 1113873 + β middot Rnj (9)

An IMF component is higher in frequency and strongerin volatility and its RF value is greater If the RF values of thetwo IMF components are closer their overall trends aremore similar +us they can be reconstructed into a newcomponent All efficient IMF and R components arereconstructed into three new components high-frequencyand strong-volatility component medium-frequency andweak-volatility component and low-frequency trend com-ponent +e high-frequency and strong-volatility compo-nent reflects the strong volatility and randomness of thehigh-frequency part of the original host utilization sequence+e medium-frequency and weak-volatility componentshows the detailed features of the volatility of the originalhost utilization sequence +e low-frequency trend com-ponent depicts the overall trend of the volatility of theoriginal host utilization sequence

Table 1 shows the RT values average periods and RFvalues of efficient IMF and R components +e RF values ofIMF1 and IMF2 are large and relatively close while the RFvalues of IMF8 and R are equal to 0 +e RF values ofIMF3ndashIMF5 are close +erefore we reconstructed IMF1-IMF2 IMF3ndashIMF5 and IMF8-R into three new compo-nents as shown in Figures 5(a)ndash5(c) +ey separately reflectthe randomness the fluctuation details and the overall trendof the original host utilization sequence

44 Use of the ARIMA Model to Predict the Future HostUtilization We use the ARIMA model to predict the futureresults for each new component+en the overall predictionresults are obtained by superposing the prediction results ofeach new component+e ARIMA prediction is described asfollows (Algorithm 1)

For example we assume that three new components ChCm and Cl are obtained which represent the high-frequencyand strong-volatility component medium-frequency andweak-volatility component and low-frequency trend com-ponent respectively +en we use the ARIMA method topredict the future 24-point values for each new component+e prediction results Ph Pm and Pl of three new com-ponents can be described in the following formulas each ofwhich includes the values of the predicting 24-point data

Ph f1h f

2h f

24h1113872 1113873

Pm f1m f

2m f

24m1113872 1113873

Pl f1l f

2l f

24l1113872 1113873

(10)

Finally we calculate the overall prediction result P bysuperposing the prediction results of each new componentas follows

P Ph + Pm + Pl (11)

From this process of the EEMD-RT-ARIMAmethod wefind that the number of components decreases from 9 to 3which can reduce the total prediction time and the erroraccumulation of the component prediction compared withthe EEMD-ARIMA method

0 50 100 150 200 250 300 350 400 450 500 550 600 650 7005

10

15

20

25

30

Testing set

CPU

util

izat

ion

()

Sample data

Training set

Figure 3 CPU utilization trace of a physical host

6 Journal of Electrical and Computer Engineering

5 Experimental Setup

We conduct an experiment to evaluate our method +eexperimental dataset and measurement metrics are in-troduced as follows

51 Experimental Dataset Host utilization mainly involvesin CPU utilization memory utilization network utilizationand disk utilization In this paper we mainly focus on hostCPU utilization We randomly select CPU utilization tracesof 7 physical hosts from the dataset released by Alibaba inAugust 2017 [49] each of which includes 144 points(5minutes per point) +ese traces are all time-dependentsequences as shown in Figures 6(a)ndash6(g)

Each sequence is divided into a training set and a testingset We first use the training set to predict the future dataand then these predicting data are compared with thoseactual data in the testing set to evaluate our method In thispaper each training set is set as the first 120 points and thetesting set is defined as the subsequent points such as 6points 12 points and 24 points We set the number of it-erations k 50 and the standard deviation d 02 in EEMDdecomposition

52MeasurementMetrics We evaluate our method in termsof error effectiveness and time-cost analysis as follows

521 Error Analysis To evaluate our method we use themean absolute percentage error (MAPE) to reflect theprediction accuracy MAPE is defined as follows

MAPE 1m

1113944

m

i1

xfi minusxt

i

xti

⎛⎝ ⎞⎠lowast100 (12)

where xfi denotes the value of the prediction point xt

i

denotes the actual value in the testing set and m denotes the

0 100 200 300 400 500 600 700

ndash2

0

2

IMF1

CPU

util

izat

ion

()

(a)

0 100 200 300 400 500 600 700IMF2

ndash2

0

2

CPU

util

izat

ion

()

(b)

0 100 200 300 400 500 600 700IMF3

ndash2

0

2

CPU

util

izat

ion

()

(c)

0 100 200 300 400 500 600 700IMF4

ndash2

0

2

CPU

util

izat

ion

()

(d)

0 100 200 300 400 500 600 700IMF5

CPU

util

izat

ion

()

ndash2

0

2

(e)

0 100 200 300 400 500 600 700IMF6

CPU

util

izat

ion

()

ndash2

0

2

(f )

0 100 200 300 400 500 600 700IMF7

CPU

util

izat

ion

()

ndash2

0

2

(g)

0 100 200 300 400 500 600 700IMF8

CPU

util

izat

ion

()

ndash2

0

2

(h)

0 100 200 300 400 500 600 700R

CPU

util

izat

ion

()

101214161820

(i)

Figure 4 Decomposition results of EEMD

Table 1 RT average periods and RF of efficient IMF and Rcomponents

IMF1 IMF2 IMF3 IMF4 IMF5 IMF8 RRT 431 181 96 47 21 5 2Average period 212 831 1641 3059 673 673 673RF 1 070 060 053 047 0 0

Journal of Electrical and Computer Engineering 7

number of prediction points It is obvious that the predictionaccuracy is higher when the MAPE is lower

522 Effectiveness Analysis Host utilization under-prediction or overprediction can lead to resource under-provision or overprovision Resource underprovisioncannot guarantee applicationsrsquo QoS while resource over-provision can cause resource waste and low resource utili-zation +erefore a good prediction method should avoidunderprediction and overprediction In particular under-prediction should be avoided as much as possible because itresults in a lower QoS to users

We set up the positive and negative errors to reflect theoverprediction and underprediction and then use them toevaluate the effectiveness of our method A good predictionmethod should have a smaller negative error to avoidunderprediction+e positive and negative prediction errorsare calculated by the following formula where pi is thepredicting data ri is the actual data and m is the number ofunderprediction data (ie negative deviation) or over-prediction data (ie positive deviation)

ei 1m

1113944

m

i1pi minus ri

11138681113868111386811138681113868111386811138681113868 (13)

523 Time Cost Analysis Host utilization varies veryquickly in a cloud data center If host utilization prediction isslower than the determination of VM migration resourceprovision will be delayed which can cause poor QoS +ushost utilization prediction must be completed in a timelymanner To investigate the time cost of our proposedmethod we test the running time of the EEMD-RT-ARIMAmethod and compared it with other prediction methodsaccording to the following index tc

tc tour minus tother

totherlowast100 (14)

where tour indicates the running time of our method EEMD-RT-ARIMA and tother represents the running time of othermethods such as the ARIMA model or the EEMD-ARIMA

method tc denotes the percent of the reduced or increasedtime cost

6 Experimental Results and Analysis

To validate the prediction effectiveness of our EEMD-RT-ARIMA method we conduct experiments on ARIMAEEMD-ARIMA and EEMD-RT-ARIMA methods andcompare their predictive results All experiments wereperformed on a PC with 25GHz Intel (R) i7 CPU runningMATLAB To make three methods comparable we use thesame original dataset to execute it 5 times for each method+e mean values of the prediction results are shown in thefollowing tables and figures

61 Error Analysis Table 2 shows the MAPE values of hostutilization predictions for 7 physical hosts We can see thatEEMD-ARIMA and EEMD-RT-ARIMA methods havelowerMAPE values than ARIMAmodels for 6-point and 12-point predictions For example EEMD-ARIMA and EEMD-RT-ARIMA methods achieve MAPE values of 606 and505 for the 6-point prediction of host 109 while theARIMA model has a far higher MAPE value (up to 1685)+ey obtain MAPE values of 1013 and 546 for the 12-point prediction of host 109 while the ARIMA model ob-tains 1108 For host 22 both the EEMD-ARIMA andEEMD-RT-ARIMA methods obtain far lower MAPE valuesof 531 and 542 than the 1066 of the ARIMA modelfor 6-point prediction Similarly they also obtain bettereffectiveness on 12-point prediction +e same situation alsoexists in 6-point and 12-point predictions of other hosts+is indicates that both EEMD-ARIMA and EEMD-RT-ARIMA methods have higher prediction accuracy thanARIMAmodels in 6-point and 12-point predictions for hostutilization EEMD reduces the inherent volatility of the hostutilization sequence which improves the prediction accu-racy of the EEMD-ARIMA and EEMD-RT-ARIMAmethods However the situation changes in 24-point pre-diction +e MAPE values of hosts 1162 424 1060 and 237are all over 30 using these three methods Although theEEMD-RT-ARIMA method has lower MAPE values than

0 100 200 300 400 500 600 700ndash3ndash2ndash1

0123

CPU

util

izat

ion

()

(a)

0 100 200 300 400 500 600 700ndash3ndash2ndash1

012

CPU

util

izat

ion

()

(b)

0 100 200 300 400 500 600 700101214161820

CPU

util

izat

ion

()

(c)

Figure 5 New components (a) High-frequency and strong-volatility component (b) Medium-frequency and weak-volatility component(c) Low-frequency trend component

8 Journal of Electrical and Computer Engineering

the ARIMA and EEMD-ARIMA methods for hosts 839 and109 it has far higherMAPE values in the 24-point predictionthan those of 6-point and 12-point predictions +is showsthat the EEMD-ARIMA and EEMD-RT-ARIMA methodsare not suitable for long-term but suitable for short-termprediction

For further analysis we find that the EEMD-RT-ARIMAmethod achieves lower prediction error than the EEMD-ARIMA method for the 6-point and 12-point predictions ofhosts 839 109 and 1162 although the EEMD-RT-ARIMAmethod only selects efficient IMF components However itis the opposite for hosts 22 424 1060 and 237 +e original

CPU utilization sequences of all physical hosts are identicalin frequency so we calculate the RT value of each CPUutilization sequence shown in Table 3 Hosts 839 109 and1162 achieve lower RT values under 10 which shows thattheir CPU utilization is more stationary than other hostsSmaller RT values indicate more stationary host utilizationsequences +is phenomenon can also be seen inFigures 6(a)ndash6(c) From Tables 2 and 3 it can be found thatthe EEMD-RT-ARIMA method achieves a lower MAPEvalue than the EEMD-ARIMA method if the RT value issmaller Conversely the EEMD-RT-ARIMA method has ahigher MAPE value than the EEMD-ARIMA method if the

(1) For each new component(2) Set the order of difference d 0(3) Execute the augmented Dickey-Fuller (ADF) test If it is a stationary time series go to step 5 else go to step 4 until it is stationary(4) Difference the time series and set d d + 1 go to step 3(5) Determine the order of the ARIMA model using Bayesian information criterion (BIC)(6) Estimate the parameters of the ARIMA model using the maximum likelihood(7) Forecast the future n values of this new component using the ARIMA model(8) End(9) Obtain the overall prediction results by superposing the prediction results of each new component

ALGORITHM 1 ARIMA prediction

0204060

CPU

utili

zatio

n(

)

(a)

0204060

CPU

utili

zatio

n(

)

(b)

0204060

CPU

utili

zatio

n(

)

(c)

0204060

CPU

utili

zatio

n(

)

(d)

CPU

utili

zatio

n(

)

0204060

(e)

CPU

utili

zatio

n(

)

0204060

(f )

CPU

utili

zatio

n(

)

0204060

(g)

Figure 6 CPU utilization traces of 7 physical hosts (a) Host 839 (b) Host 109 (c) Host 1162 (d) Host 22 (e) Host 424 (f ) Host 1060 (g)Host 237

Journal of Electrical and Computer Engineering 9

RTvalue is larger For instance hosts 839 109 and 1162 withsmaller RT values obtain lower MAPE values using theEEMD-RT-ARIMA method than the EEMD-ARIMAmethod while hosts 22 424 1060 and 237 with largerRT values obtain higher MAPE values using the EEMD-RT-ARIMA method than the EEMD-ARIMA method

Furthermore the difference in theMAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods decreaseswith the increase in the RTvalues from host 839 to host 1162+eir difference changes to negative from host 22 whichindicates that the EEMD-ARIMA method has higher pre-diction accuracy than the EEMD-RT-ARIMA method+en their difference becomes larger as the RT values in-crease +e 12-point host CPU utilization prediction illus-trates this situation For example host 839 with an RTvalueof 2 has a MAPE of 503 for 12-point prediction by usingthe EEMD-RT-ARIMA method which is 545 lower thanthe 1048 of the EEMD-ARIMA method +e MAPE valueof the EEMD-RT-ARIMA method is only 419 lower thanthat of the EEMD-ARIMAmethod for host 1162 with an RTvalue of 10 For host 22 with an RT value of 16 the EEMD-RT-ARIMA method has a slightly higher MAPE of 537than 533 of the EEMD-ARIMAmethodWith the increaseof the RT value the differences of MAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods furtherincrease to 087 196 and 463 for hosts 424 1060 and237 respectively+is indicates that the EEMD-RT-ARIMAmethod is less effective than the EEMD-ARIMA method inCPU utilization prediction for these hosts Undoubtedlythe ARIMA prediction of each component decomposed by

the EEMD method generates a certain error +e super-position of the prediction results of each component causeserror accumulation +e EEMD-RT-ARIMA method re-duces the error accumulation by selecting and recon-structing the efficient IMF components into fewercomponents so it achieves better prediction accuracy thanthe EEMD-ARIMA method for hosts 839 109 and 1162Certainly the selection and reconstruction of efficient IMFcomponents also cause a certain prediction error due to theabsence of nonefficient components especially for non-stationary host utilization sequences When this kind ofprediction error exceeds the error accumulation of ARIMAprediction of all components in the EEMD-ARIMAmethod the EEMD-RT-ARIMA method is no more ef-fective than the EEMD-ARIMA method for the non-stationary CPU utilization prediction of some hosts suchas hosts 22 424 1060 and 237

62 Effectiveness Analysis To verify the effectiveness of ourmethod in short-term prediction we select the experimentalresults of hosts 839 22 and 237 with the minimum middleand maximum RT values for further analysis Figure 7 showsthe prediction results of the EEMD-RT-ARIMAARIMA andEEMD-ARIMA methods We find that the future resourceutilization of host 839 decreases below 11 According to apredefined policy host 839 is underloaded and can be closedto save energy Figure 7 shows that our method is moreaccurate and effective than the ARIMA model In particularour method tends to change with the trend of data variationwhile the ARIMAmodel cannot keep up with it Our methodis more suitable for handling nonstationary time series thanthe ARIMAmodel Additionally the predicting data using theEEMD-RT-ARIMAmethod are closer to the testing data thanthose of the EEMD-ARIMA method for host 839 +ese

Table 2 MAPE values of host utilization prediction

Host ID Prediction length ARIMA () EEMD-ARIMA () EEMD-RT-ARIMA ()

8396-point 977 473 46112-point 1378 1048 50324-point 2310 2153 882

1096-point 1685 606 50512-point 1108 1013 54624-point 4134 2468 836

11626-point 2445 776 64512-point 1781 1314 89524-point 2193 4141 3345

226-point 1066 531 54212-point 822 533 53724-point 1141 1844 1477

4246-point 3754 1612 176512-point 2174 1659 174624-point 8868 7739 1157

10606-point 3571 1005 139412-point 3760 1744 194024-point 7472 5482 9193

2376-point 2229 785 117812-point 2551 1130 159324-point 12611 15843 18951

Table 3 RT values of each host utilizationHost ID 839 109 1162 22 424 1060 237RT value 2 6 10 16 22 24 38

10 Journal of Electrical and Computer Engineering

results show that the EEMD-RT-ARIMA method is moreeffective than the EEMD-ARIMAmethod for CPU utilizationsequences with weak fluctuations When the host utilizationsequence shows stronger fluctuation the absence of non-efficient IMF components will greatly influence the predictionresults +e EEMD-RT-ARIMA method is no more effectivethan the EEMD-ARIMA method for CPU utilization pre-diction of host 237

To further analyze the effectiveness of our method wecalculate the positive and negative errors for 6-point and 12-point predictions of these hosts shown in Table 4 When thenegative error is smaller the prediction method is moresuitable for cloud resource provision because of avoidingunderprediction It can be observed that most of the pre-diction results of the ARIMAmodel are underpredicted (thecells of positive error are all ldquonullrdquo for hosts 839 and 22)Furthermore the negative errors of the ARIMA model are

all far higher than those of other methods for host 237 Forinstance the ARIMA model has a high negative error of upto 2751 for the 12-point prediction of host 237 while theEEMD-ARIMA and EEMD-RT-ARIMA methods only havenegative errors of 800 and 892 respectively If theARIMA model is used to predict future host utilization itcan cause resource underprovision which cannot ensureapplicationsrsquo QoS +e EEMD-RT-ARIMA method achievessmaller negative errors than the EEMD-ARIMA method forhosts 839 and 22 while it has a larger negative error for hosts237 For instance the EEMD-ARIMA method achieves thenegative error of 1071 for the 12-point prediction of host839 while EEMD-RT-ARIMA only achieves the negativeerror of 474 Similarly the EEMD-ARIMA method ob-tains a negative error of 609 for the 12-point prediction ofhost 22 while the EEMD-RT-ARIMA method achieves alower value of only 505 However the situation changes

6789

101112

0 1 2 3 4 5 6 7Host 839 6-point prediction

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(a)

6789

101112

0 2 4 6 8 10 12

CPU

util

izat

ion

()

Host 839 12-point prediction

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(b)

262830323436

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 6-point prediction

(c)

262830323436

0 2 4 6 8 10 12CP

U u

tiliz

atio

n (

)

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 12-point prediction

(d)

1214161820222426

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 6-point prediction

(e)

0 2 4 6 8 10 121214161820222426

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 12-point prediction

(f )

Figure 7 Prediction of the results of different methods

Journal of Electrical and Computer Engineering 11

for host 237 +e EEMD-ARIMA method has a smallernegative error than the EEMD-RT-ARIMA method Forinstance the EEMD-RT-ARIMA method obtains negativeerrors of 1137 and 892 for 6-point and 12-point pre-dictions while the EEMD-ARIMAmethod only has negativeerrors of 462 and 800

63 Time-Cost Analysis To verify the applicability of ourmethod we further compare the time cost of these methodsin Figure 8 +e running time of the EEMD-ARIMAmethod is the largest by over 180 s while the ARIMAmodel takes the least time at less than 50 s +e EEMD-RT-ARIMA method time cost is between 70 s and 117 s whichdecreases the time cost by 40ndash80 compared with theEEMD-ARIMA method For example the running time ofthe EEMD-RT-ARIMA method is 6937 s far less than the33720 s of the EEMD-ARIMA method for the 6-pointprediction of host 22 Our method saves up to 80 of thetime cost For the CPU utilization sequence of host 237with strong variability it requires 19046 s to predict thefuture 6-point values using the EEMD-ARIMA methodwhile it only takes 11364 s using the EEMD-RT-ARIMAmethod +e running time is reduced by approximately40 Considering the prediction accuracy effectivenessand time cost our EEMD-RT-ARIMA method is morecost-effective for short-term host utilization prediction incloud computing

7 Conclusions

Host utilization is an indicator of host performancewhose prediction can promote effective resource sched-uling in cloud computing However host utilizationdemonstrates strong randomness and instability caused byusersrsquo random and various resource demands It is difficultto improve prediction accuracy In this paper we proposea hybrid and cost-effective method EEMD-RT-ARIMAfor short-term host utilization prediction in cloud com-puting +e EEMD method is first used to decompose thenonstationary host utilization sequence into a few rela-tively stationary IMF components and an R component+en we calculate the correlation coefficient between eachIMF component and the original data to select efficientIMF components and use RTvalues and average periods toreconstruct these components into three new componentsto reduce error accumulation and time cost Finally three

new components are predicted by the ARIMA model andtheir prediction results are superposed to form the overallprediction results We use the real host utilization tracesfrom a cloud platform to conduct the experiments andcompare our EEMD-RT-ARIMA method with theARIMA model and EEMD-ARIMA method in terms oferror effectiveness and time-cost analysis +e resultsshow that our method is cost-effective and is more suitablefor short-term host utilization prediction in cloudcomputing

Data Availability

+e running example and experimental data used to supportthe findings of this study have been deposited in the Figsharerepository (httpsdoiorg106084m9figshare7679594)

Conflicts of Interest

+e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

+is work was supported by the Shandong ProvincialNatural Science Foundation (ZR2016FM41)

Table 4 Positive and negative error analysis

HostID

Predictionlength

ARIMA EEMD-ARIMA EEMD-RT-ARIMAPositive error

()Negative error

()Positive error

()Negative error

()Positive error

()Negative error

()

839 6-point Null 977 316 585 492 41512-point Null 1378 343 1071 480 474

22 6-point Null 1066 391 670 412 63412-point Null 972 426 609 539 505

237 6-point 039 2668 1431 462 1381 113712-point 039 2751 1744 800 1790 892

0

50

100

150

200

250

300

350

12-point

Tim

e cos

t (s)

ARIMAEEMD-ARIMA EEMD-RT-ARIMA

6-pointHost 839

12-point6-pointHost 22

12-point6-pointHost 237

Figure 8 Time cost of different prediction methods

12 Journal of Electrical and Computer Engineering

References

[1] J J Prevost K Nagothu B Kelley and M Jamshidi ldquoPre-diction of cloud data center networks loads using stochasticand neural modelsrdquo in Proceedings of 2011 6th InternationalConference on System of Systems Engineering pp 276ndash281IEEE Albuquerque NM USA June 2011

[2] M Borkowski S Schulte and C Hochreiner ldquoPredictingcloud resource utilizationrdquo in Proceedings of 9th InternationalConference on Utility and Cloud Computing (UCC) pp 37ndash42IEEE Shanghai China December 2016

[3] M Barati and S Sharifian ldquoA hybrid heuristic-based tunedsupport vector regression model for cloud load predictionrdquoJournal of Supercomputing vol 71 no 11 pp 4235ndash42592015

[4] Z Chen Y Zhu Y Di S Feng and J Geng ldquoA high-accuracyself-adaptive resource demands predicting method in IaaScloud environmentrdquo Neural Network World vol 25 no 5pp 519ndash540 2015

[5] J Chen and Y Wang ldquoA resource demand prediction methodbased on EEMD in cloud computingrdquo Procedia ComputerScience vol 131 pp 116ndash123 2018

[6] H Toumi Z Brahmi Z Benarfa and M M GammoudildquoServer load prediction using stream miningrdquo in Proceedingsof 2017 International Conference on Information Networking(ICOIN) pp 653ndash661 IEEE Da Nang Vietnam January2017

[7] S Di D Kondo and W Cirne ldquoHost load prediction in aGoogle compute cloud with a Bayesian modelrdquo in Proceedingsof 2012 International Conference on High PerformanceComputing Networking Storage and Analysis (SCrsquo12)pp 1ndash11 IEEE Salt Lake City UT USA November 2012

[8] N K Gondhi and P Kailu ldquoPrediction based energy efficientvirtual machine consolidation in cloud computingrdquo in Pro-ceedings of 2015 Second International Conference on Advancesin Computing and Communication Engineering pp 437ndash441IEEE Dehradun India May 2015

[9] A Verma G Dasgupta T K Nayak P De and R KotharildquoServer workload analysis for power minimization usingconsolidationrdquo in Proceedings of the 2009 Conference onUSENIX Annual Technical Conference p 28 USENIX As-sociation San Diego CA USA June 2009

[10] B Song Y Yu Y Zhou Z Wang and S Du ldquoHost loadprediction with long short-term memory in cloud comput-ingrdquo Journal of Supercomputing vol 74 no 12 pp 6554ndash6568 2018

[11] J-J Jheng F-H Tseng H-C Chao and L-D Chou ldquoA novelVM workload prediction using grey forecasting model incloud data centerrdquo in Proceedings of International Conferenceon Information Networking 2014 (ICOIN2014) pp 40ndash45IEEE Phuket +ailand February 2014

[12] A Beloglazov and R Buyya ldquoManaging overloaded hosts fordynamic consolidation of virtual machines in cloud datacenters under quality of service constraintsrdquo IEEE Trans-actions on Parallel and Distributed Systems vol 24 no 7pp 1366ndash1379 2013

[13] M Dabbagh B Hamdaoui M Guizani and A Rayes ldquoAnenergy-efficient VM prediction and migration framework forovercommitted cloudsrdquo IEEE Transactions on Cloud Com-puting vol 6 no 4 pp 955ndash966 2018

[14] D Minarolli A Mazrekaj and B Freisleben ldquoTackling un-certainty in long-term predictions for host overload andunderload detection in cloud computingrdquo Journal of CloudComputing vol 6 no 1 p 4 2017

[15] K Mason M Duggan E Barrett J Duggan and E HowleyldquoPredicting host CPU utilization in the cloud using evolu-tionary neural networksrdquo Future Generation Computer Sys-tems vol 86 pp 162ndash173 2018

[16] D Magalhatildees R N Calheiros R Buyya and D G GomesldquoWorkload modeling for resource usage analysis and simu-lation in cloud computingrdquo Computers amp Electrical Engi-neering vol 47 pp 69ndash81 2015

[17] C Tian Y Wang Y Luo et al ldquoMinimizing content re-organization and tolerating imperfect workload prediction forcloud-based video-on-demand servicesrdquo IEEE Transactionson Services Computing vol 9 no 6 pp 926ndash939 2016

[18] M Verma G R Gangadharan V Ravi and N NarendraldquoResource demand prediction in multi-tenant service cloudsrdquoin Proceedings of 2013 IEEE International Conference on CloudComputing in Emerging Markets (CCEM) pp 1ndash8 IEEEBangalore India October 2013

[19] W Zhang Y Shi L Liu L Cui and Y Zheng ldquoPerformanceand resource prediction at high utilization for n-tier servicesystems in cloud an experiment driven approachrdquo in Pro-ceedings of 2015 IEEE International Conference on Computerand Information Technology Ubiquitous Computing andCommunications Dependable Autonomic and Secure Com-puting Pervasive Intelligence and Computing pp 843ndash848IEEE Liverpool UK October 2015

[20] G Kecskemeti A Kertesz and Z Nemeth ldquoCloud workloadprediction by means of simulationsrdquo in Proceedings of theComputing Frontiers Conference on ZZZ (CFrsquo17) pp 279ndash282ACM Siena Italy May 2017

[21] Y Chen and Z-A Jiang ldquoDynamically predicting the qualityof service batch online and hybrid algorithmsrdquo Journal ofElectrical and Computer Engineering vol 2017 Article ID9547869 10 pages 2017

[22] A Khan X Yan S Tao and N Anerousis ldquoWorkloadcharacterization and prediction in the cloud a multiple timeseries approachrdquo in Proceedings of 2012 IEEE Network Op-erations and Management Symposium pp 1287ndash1294 IEEEMaui HI USA April 2012

[23] A K Mishra J L Hellerstein W Cirne and C R DasldquoTowards characterizing cloud backend workloadsrdquo ACMSIGMETRICS Performance Evaluation Review vol 37 no 4pp 34ndash41 2010

[24] D Gmach J Rolia L Cherkasova and A KemperldquoWorkload analysis and demand prediction of enterprise datacenter applicationsrdquo in Proceedings of 2007 IEEE 10th In-ternational Symposium on Workload Characterizationpp 171ndash180 IEEE Boston MA USA September 2007

[25] F-H Tseng X Wang L-D Chou H-C Chao andV C M Leung ldquoDynamic resource prediction and allocationfor cloud data center using the multiobjective genetic algo-rithmrdquo IEEE Systems Journal vol 12 no 2 pp 1688ndash16992018

[26] G K Shyam and S S Manvi ldquoVirtual resource prediction incloud environment a Bayesian approachrdquo Journal of Networkand Computer Applications vol 65 pp 144ndash154 2016

[27] Y Lu J Panneerselvam L Liu and Y Wu ldquoRVLBPNN aworkload forecasting model for smart cloud computingrdquoScientific Programming vol 2016 Article ID 5635673 9 pages2016

[28] K Rajaram and M P Malarvizhi ldquoUtilization based pre-diction model for resource provisioningrdquo in Proceedings of2017 International Conference on Computer Communicationand Signal Processing (ICCCSP) pp 1ndash6 IEEE ChennaiIndia January 2017

Journal of Electrical and Computer Engineering 13

[29] L Li and A Zhang ldquoResource demand optimization com-bined prediction under cloud computing environment basedon IOWGA operatorrdquo International Journal of Grid andDistributed Computing vol 8 no 3 pp 77ndash86 2015

[30] D Minarolli and B Freisleben ldquoCross-correlation predictionof resource demand for virtual machine resource allocation incloudsrdquo in Proceedings of 2014 Sixth International Conferenceon Computational Intelligence Communication Systems andNetworks pp 119ndash124 IEEE Tetova Macedonia May 2014

[31] W Zhang P Duan L T Yang et al ldquoResource requestsprediction in the cloud computing environment with a deepbelief networkrdquo Software Practice and Experience vol 47no 3 pp 473ndash488 2017

[32] H-B Mi H-M Wang G Yin D-X Shi Y-F Zhou andL Yuan ldquoResource on-demand reconfiguration method forvirtualized data centersrdquo Journal of Software vol 22 no 9pp 2193ndash2205 2011

[33] R N Calheiros E Masoumi R Ranjan and R BuyyaldquoWorkload prediction using ARIMAmodel and its impact oncloud applicationsrsquo QoSrdquo IEEE Transactions on CloudComputing vol 3 no 4 pp 449ndash458 2015

[34] Y Meng R Rao X Zhang and P Hong ldquoCRUPA a con-tainer resource utilization prediction algorithm for auto-scaling based on time series analysisrdquo in Proceedings of2016 International Conference on Progress in Informatics andComputing (PIC) pp 468ndash472 IEEE Shanghai China De-cember 2016

[35] E Dhib N Zangar N Tabbane and K Boussetta ldquoImpact ofseasonal ARIMA workload prediction model on QoE formassively multiplayers online gamingrdquo in Proceedings of 20165th International Conference on Multimedia Computing andSystems (ICMCS) pp 737ndash741 IEEE Marrakech MoroccoSeptember 2016

[36] A Ganapathi Y Chen A Fox R Katz and D PattersonldquoStatistics-driven workload modeling for the cloudrdquo inProceedings of 2010 IEEE 26th International Conference onData EngineeringWorkshops (ICDEW 2010) pp 87ndash92 IEEELong Beach CA USA March 2010

[37] V G Tran V Debusschere and S Bacha ldquoHourly serverworkload forecasting up to 168 hours ahead using seasonalARIMA modelrdquo in Proceedings of 2012 IEEE InternationalConference on Industrial Technology pp 1127ndash1131 IEEEAthens Greece March 2012

[38] D Xu S Yang and H Luo ldquoResearch on generalized fuzzysoft sets theory based combined model for demanded cloudcomputing resource predictionrdquo Chinese Journal of Man-agement Science vol 23 no 5 pp 56ndash64 2015

[39] S Li Y Wang X Qiu D Wang and L Wang ldquoA workloadprediction-basedmulti-VM provisioningmechanism in cloudcomputingrdquo in Proceedings of 2013 15th Asia-Pacific NetworkOperations andManagement Symposium (APNOMS) pp 1ndash6IEEE Hiroshima Japan September 2013

[40] X Fu and C Zhou ldquoPredicted affinity based virtual machineplacement in cloud computing environmentsrdquo IEEE Trans-actions on Cloud Computing vol 99 p 1 2017

[41] Y Jiang C Perng T Li and R Chang ldquoASAP a self-adaptiveprediction system for instant cloud resource demand pro-visioningrdquo in Proceedings of 2011 IEEE 11th InternationalConference on Data Mining pp 1104ndash1109 IEEE VancouverBC Canada December 2011

[42] Z Wu and N E Huang ldquoEnsemble empirical mode de-composition a noise-assisted data analysis methodrdquo Ad-vances in Adaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[43] H Zang L Fan M Guo Z Wei G Sun and L ZhangldquoShort-term wind power interval forecasting based on anEEMD-RT-RVMmodelrdquo Advances in Meteorology vol 2016Article ID 8760780 10 pages 2016

[44] N Safari C Y Chung and G C D Price ldquoNovel multi-stepshort-term wind power prediction framework based onchaotic time series analysis and singular spectrum analysisrdquoIEEE Transactions on Power Systems vol 33 no 1 pp 590ndash601 2018

[45] X Chen H Wang J Huang and H Ren ldquoAPU degradationprediction based on EEMD and Gaussian process regressionrdquoin Proceedings of 2017 International Conference on SensingDiagnostics Prognostics and Control (SDPC) pp 98ndash104IEEE Shanghai China August 2017

[46] C Yan C Yi XWu et al ldquoTurbine fault trend prediction thatbased on EEMD and ARIMA modelsrdquo Journal of GansuSciences vol 28 no 4 pp 100ndash106 2016

[47] M Jin P Li L Zhang et al ldquoA signal feature method and itsapplication based on EEMD fuzzy entropy and GK cluster-ingrdquo ACTA Metrologica Sinica vol 26 no 5 pp 501ndash5052015

[48] E H Norden Z Shen R L Steven et al ldquo+e empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysisrdquoin Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 no 1971 pp 903ndash995 RoyalSociety London UK March 1998

[49] Alibaba cluster-trace-v2017 httpsgithubcomalibabaclusterdata

14 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: A Hybrid Method for Short-Term Host Utilization Prediction ...downloads.hindawi.com/journals/jece/2019/2782349.pdf · terminesthe migratedVMs according to powersavings and workloadbalance.Dabbaghetal.[13]proposedaprediction

utilization for improving the prediction accuracy andfurther uses correlation coefficients RT values and av-erage periods to select and reconstruct efficient compo-nents for reducing prediction error accumulation andprediction time

3 Background

31 Empirical Mode Decomposition (EMD) EMD is amethod of signal processing that can decompose a signalinto multiple IMFs and an R trend item [48] Two conditionsmust be satisfied for an IMF

(1) +e number of extrema and zero-crossings musteither be identical or differ by at most one

(2) +e mean value of the envelopes of the local maximaand the local minima must be zero

EMD includes the following steps

Step 1 Make f(t) x(t) where x(t) is given as theoriginal dataStep 2 Find all the local maxima and minima of fi(t)where i is the loop times and its initial value is 1Interpolate between the local maxima and minima toobtain an upper envelope and a lower envelope andthen compute the mean value mi(t) of these envelopesStep 3 Compute the new component hi(t) fi(t)minusmi(t)Step 4 Verify whether hi(t) satisfies the above-mentioned two conditions for an IMF If it does notmake fi+1(t) hi(t) and repeat steps 2 and 3 If itsatisfies the condition hi(t) is regarded as the first IMFcomponent p1(t) where p1(t) hi(t) +en computethe R component by the formula r1(t) x(t)minusp1(t)Step 5 Repeat step 1ndash4 with r1(t) as the new data untilthe R is a monotonic function +us x(t) is decom-posed into n IMFs and an R as follows

x(t) 1113944n

i1pi(t) + r(t) (1)

32 Ensemble Empirical Mode Decomposition (EEMD)+e EMD method has a noticeable drawback of modemixing that can cause signal intermittency Wu and Huangproposed a new method named ensemble empirical modedecomposition (EEMD) to solve this problem Comparedwith the EMD method the EEMD method first executesthe decomposition process k times Each time it adds adifferent white noise to the signal and then decomposes thenew signal Generally the k iterations are set as an integerin the range [50 100] and the standard deviation d of thewhite noise is set as a value in the range [01 02] Next k

groups of decomposition results are obtained Each groupincludes n IMFs pmi(t)(i 1 n) and an R rm(t) wherem denotes the group number Finally the mean values ofthese groups of IMFs and Rs are calculated as the finalIMFs pi(t)(i 1 n) and the R r(t)

pi(t) 1113936

km1pmi(t)

k

r(t) 1113936

km1rm(t)

k

(2)

+e IMF components have three main characteristics

(1) Completeness the total of all IMFs and the R havethe same feature as the original data

(2) Orthogonality each IMF with a certain physicalmeaning is independent and has no effect on otherIMFs +e product of any two IMFs equals 0 inmathematics

(3) Adaptability an IMF with a higher frequency isdecomposed from the original data faster than thosewith low frequencies +e frequencies of IMFs reflectthe features of the original data

33 Runs Test (RT) RT is a nonparametric test method thatchecks the randomness of a sequence with only two symbolsor two values such as + and minus and 0 and 1 An RT is definedas a sequence with successive symbols (0 or 1) For examplea data sequence ldquo11110000011111000110010rdquo includes 8runs 4 of which involve successive ldquo1rdquo and the others involvesuccessive ldquo0rdquo RT can also be used to test a time series

Assume that M ltm1(t) mi(t) mn(t)gt de-notes a time series where mi(t) is an element of this timeseries and n is the total number of elements +e mean valueof these elements is calculated by the following formula

M 1n

1113944

n

i1mi(t) (3)

+en the element of this time series can be denoted asfollows

Gi mi(t)minusM 1 mi(t)geM

0 mi(t)ltM

⎧⎨

⎩ (4)

+us this time series is transformed into a sequence witha series of 0 and 1 in which the elements are independentand identically distributed +e total number of RT reflectsthe fluctuation of the sequence

4 A Hybrid Method for Short-Term HostUtilization Prediction

To improve prediction accuracy and reduce prediction timeof the EEMD-ARIMAmethod we propose a hybrid methodEEMD-RT-ARIMA for short-term host utilization pre-diction as shown in Figure 2 First the host utilization se-quence is decomposed into multiple IMF components andthe R component using the EEMD method Next we cal-culate the correlation coefficients between IMF componentsand the original data sequence to select the efficient IMFcomponents+en we use RTvalues and average periods toreconstruct these efficient IMF and R components intothree new components high-frequency and strong-

4 Journal of Electrical and Computer Engineering

volatility component medium-frequency and weak-volatility component and low-frequency trend compo-nent +en we use the ARIMA method to predict theresults of three new components Finally the overallprediction results are achieved by summing the predictionresults of the three new components

+e key to our EEMD-RT-ARIMA method is to selectand reconstruct efficient components Compared with theEEMD-ARIMA method the number of its componentsinvolving in ARIMA prediction is reduced +us theEEMD-RT-ARIMA method can reduce the prediction erroraccumulation and the total prediction time by reducing thenumber of components Obviously both the EEMD-RT-ARIMA method and EEMD-ARIMA method have a higherprediction time than the ARIMA model from their imple-mentation processes However our EEMD-RT-ARIMAmethod focuses on cost-effectiveness which has a trade-off among prediction accuracy effectiveness and time cost

41 Use of EEMD toDecompose theHostUtilization SequenceA host utilization sequence is classified into different cate-gories according to the CPU memory and disk such as CPUutilization sequence Tcpu c1 ci cn1113864 1113865 +e CPUutilization sequence is usually random and unstable owing torandom and sudden resource demands in cloud computing Itis necessary to transform such data into relatively stationarydata to improve prediction accuracy +e EEMD methodappears to be more effective in processing nonlinear andnonstationary data sequences than other decomposition al-gorithms+erefore we use the EEMDmethod to decomposethe host utilization sequence and obtain a series of the IMFi

components and the R componentA running example shows the nonstationary CPU uti-

lization trace of a physical host from our cloud platform We

divide it into the training set (673 data points) and thetesting set (24 data points) in Figure 3 +en we use theEEMD method to decompose the training set and obtainIMF1-IMF8 components and the R component +ey areshown from the high frequency to low frequency in Figure 4

42 Calculation of the Correlation Coefficients to Select Effi-cient IMF Components A correlation coefficient measuresthe correlation between two sequences We calculate thecorrelation coefficient Pj(X Y) between the IMFj compo-nent and the original training set based on the followingformula where cov(X Y) is a covariance between the se-quences X and Y and Var(X) and Var(Y) are the variancesof the sequence X and the sequence Y

Pj(X Y) Cov(X Y)

Var(X)

1113968 Var(Y)

1113968 (5)

+en the correlation coefficient Pj(X Y) is checked todetermine whether it is negative If it is negative the IMFj

component is inefficient and dropped If it is not negativethe IMFj component is efficient and reserved

We calculate the correlation coefficient between eachIMF component and the original training set Only IMF6and IMF7 have negative correlation coefficients of minus008 andminus015 Hence they are dropped IMF1ndashIMF5 and IMF8 areselected as efficient IMF components

43 Reconstruction of Efficient IMFs and R into NewComponents Each IMF component actually reflects a cer-tain physical feature of the original data If some IMFcomponents are closer in terms of frequency and amplitudefluctuation then they have similar features+us they can bereconstructed into a new component with these typical

hellip

Use EEMD to decompose the host utilization sequence

IMF1 IMF2 IMFhndash1 R

High-frequency and strong-volatility component

hellip

Medium-frequency and weak-volatility component

Low-frequency trend component

Use ARIMA to predict the result of this component

Use correlation coefficients to select efficient IMFs

IMF1 IMFm

Use RT values and average periods to reconstruct efficient IMFs and R intothree new components

Use ARIMA to predict the result of this component

Use ARIMA to predict the result of this component

Construct the overall prediction result by summing the prediction results of three new components

Figure 2 EEMD-RT-ARIMA method

Journal of Electrical and Computer Engineering 5

features +us the prediction error accumulation and theprediction time of the EEMD-ARIMA method can be re-duced by reducing the number of components

+e average period reflects the frequency of host utili-zation variation +ere exists a reciprocal relation betweenthem +e smaller the average period the higher the fre-quency If the average periods of IMF components are closerthey are closer in frequency +e average period is calculatedby the following formula in which n is the number of thetraining set and lj is the number of extrema

Tj n

lj (6)

Similarly the RT value reflects the trend of amplitudefluctuation If the RTvalue is larger the amplitude volatility isstronger If the RT values of the two IMFs are closer theoverall trend of the two IMFs is similar in amplitude volatility

To enhance the prediction accuracy and reduce theprediction time of the EEMD-ARIMA method we re-construct the IMF components and the R component intothree new components according to their average periodsand RT values in the EEMD-RT-ARIMA method Becausethe average period and RT value have different units wenormalize the average period Tj as follows

Tnj Tj minusTmin

Tmax minusTmin (7)

where Tnj denotes the normalized average period of theIMFj component Tmax and Tmin represent the maximumandminimum of the average periods of all IMF componentsSimilarly the RT value Rj can be normalized as follows

Rnj Rj minusRmin

Rmax minusRmin (8)

where Rnj is the normalized value of Rj Rmax and Rmin arethe maximum and minimum of all RT values +us thereconstruction factor (RF) is defined as follows

Fj α middot 1minusTnj1113872 1113873 + β middot Rnj (9)

An IMF component is higher in frequency and strongerin volatility and its RF value is greater If the RF values of thetwo IMF components are closer their overall trends aremore similar +us they can be reconstructed into a newcomponent All efficient IMF and R components arereconstructed into three new components high-frequencyand strong-volatility component medium-frequency andweak-volatility component and low-frequency trend com-ponent +e high-frequency and strong-volatility compo-nent reflects the strong volatility and randomness of thehigh-frequency part of the original host utilization sequence+e medium-frequency and weak-volatility componentshows the detailed features of the volatility of the originalhost utilization sequence +e low-frequency trend com-ponent depicts the overall trend of the volatility of theoriginal host utilization sequence

Table 1 shows the RT values average periods and RFvalues of efficient IMF and R components +e RF values ofIMF1 and IMF2 are large and relatively close while the RFvalues of IMF8 and R are equal to 0 +e RF values ofIMF3ndashIMF5 are close +erefore we reconstructed IMF1-IMF2 IMF3ndashIMF5 and IMF8-R into three new compo-nents as shown in Figures 5(a)ndash5(c) +ey separately reflectthe randomness the fluctuation details and the overall trendof the original host utilization sequence

44 Use of the ARIMA Model to Predict the Future HostUtilization We use the ARIMA model to predict the futureresults for each new component+en the overall predictionresults are obtained by superposing the prediction results ofeach new component+e ARIMA prediction is described asfollows (Algorithm 1)

For example we assume that three new components ChCm and Cl are obtained which represent the high-frequencyand strong-volatility component medium-frequency andweak-volatility component and low-frequency trend com-ponent respectively +en we use the ARIMA method topredict the future 24-point values for each new component+e prediction results Ph Pm and Pl of three new com-ponents can be described in the following formulas each ofwhich includes the values of the predicting 24-point data

Ph f1h f

2h f

24h1113872 1113873

Pm f1m f

2m f

24m1113872 1113873

Pl f1l f

2l f

24l1113872 1113873

(10)

Finally we calculate the overall prediction result P bysuperposing the prediction results of each new componentas follows

P Ph + Pm + Pl (11)

From this process of the EEMD-RT-ARIMAmethod wefind that the number of components decreases from 9 to 3which can reduce the total prediction time and the erroraccumulation of the component prediction compared withthe EEMD-ARIMA method

0 50 100 150 200 250 300 350 400 450 500 550 600 650 7005

10

15

20

25

30

Testing set

CPU

util

izat

ion

()

Sample data

Training set

Figure 3 CPU utilization trace of a physical host

6 Journal of Electrical and Computer Engineering

5 Experimental Setup

We conduct an experiment to evaluate our method +eexperimental dataset and measurement metrics are in-troduced as follows

51 Experimental Dataset Host utilization mainly involvesin CPU utilization memory utilization network utilizationand disk utilization In this paper we mainly focus on hostCPU utilization We randomly select CPU utilization tracesof 7 physical hosts from the dataset released by Alibaba inAugust 2017 [49] each of which includes 144 points(5minutes per point) +ese traces are all time-dependentsequences as shown in Figures 6(a)ndash6(g)

Each sequence is divided into a training set and a testingset We first use the training set to predict the future dataand then these predicting data are compared with thoseactual data in the testing set to evaluate our method In thispaper each training set is set as the first 120 points and thetesting set is defined as the subsequent points such as 6points 12 points and 24 points We set the number of it-erations k 50 and the standard deviation d 02 in EEMDdecomposition

52MeasurementMetrics We evaluate our method in termsof error effectiveness and time-cost analysis as follows

521 Error Analysis To evaluate our method we use themean absolute percentage error (MAPE) to reflect theprediction accuracy MAPE is defined as follows

MAPE 1m

1113944

m

i1

xfi minusxt

i

xti

⎛⎝ ⎞⎠lowast100 (12)

where xfi denotes the value of the prediction point xt

i

denotes the actual value in the testing set and m denotes the

0 100 200 300 400 500 600 700

ndash2

0

2

IMF1

CPU

util

izat

ion

()

(a)

0 100 200 300 400 500 600 700IMF2

ndash2

0

2

CPU

util

izat

ion

()

(b)

0 100 200 300 400 500 600 700IMF3

ndash2

0

2

CPU

util

izat

ion

()

(c)

0 100 200 300 400 500 600 700IMF4

ndash2

0

2

CPU

util

izat

ion

()

(d)

0 100 200 300 400 500 600 700IMF5

CPU

util

izat

ion

()

ndash2

0

2

(e)

0 100 200 300 400 500 600 700IMF6

CPU

util

izat

ion

()

ndash2

0

2

(f )

0 100 200 300 400 500 600 700IMF7

CPU

util

izat

ion

()

ndash2

0

2

(g)

0 100 200 300 400 500 600 700IMF8

CPU

util

izat

ion

()

ndash2

0

2

(h)

0 100 200 300 400 500 600 700R

CPU

util

izat

ion

()

101214161820

(i)

Figure 4 Decomposition results of EEMD

Table 1 RT average periods and RF of efficient IMF and Rcomponents

IMF1 IMF2 IMF3 IMF4 IMF5 IMF8 RRT 431 181 96 47 21 5 2Average period 212 831 1641 3059 673 673 673RF 1 070 060 053 047 0 0

Journal of Electrical and Computer Engineering 7

number of prediction points It is obvious that the predictionaccuracy is higher when the MAPE is lower

522 Effectiveness Analysis Host utilization under-prediction or overprediction can lead to resource under-provision or overprovision Resource underprovisioncannot guarantee applicationsrsquo QoS while resource over-provision can cause resource waste and low resource utili-zation +erefore a good prediction method should avoidunderprediction and overprediction In particular under-prediction should be avoided as much as possible because itresults in a lower QoS to users

We set up the positive and negative errors to reflect theoverprediction and underprediction and then use them toevaluate the effectiveness of our method A good predictionmethod should have a smaller negative error to avoidunderprediction+e positive and negative prediction errorsare calculated by the following formula where pi is thepredicting data ri is the actual data and m is the number ofunderprediction data (ie negative deviation) or over-prediction data (ie positive deviation)

ei 1m

1113944

m

i1pi minus ri

11138681113868111386811138681113868111386811138681113868 (13)

523 Time Cost Analysis Host utilization varies veryquickly in a cloud data center If host utilization prediction isslower than the determination of VM migration resourceprovision will be delayed which can cause poor QoS +ushost utilization prediction must be completed in a timelymanner To investigate the time cost of our proposedmethod we test the running time of the EEMD-RT-ARIMAmethod and compared it with other prediction methodsaccording to the following index tc

tc tour minus tother

totherlowast100 (14)

where tour indicates the running time of our method EEMD-RT-ARIMA and tother represents the running time of othermethods such as the ARIMA model or the EEMD-ARIMA

method tc denotes the percent of the reduced or increasedtime cost

6 Experimental Results and Analysis

To validate the prediction effectiveness of our EEMD-RT-ARIMA method we conduct experiments on ARIMAEEMD-ARIMA and EEMD-RT-ARIMA methods andcompare their predictive results All experiments wereperformed on a PC with 25GHz Intel (R) i7 CPU runningMATLAB To make three methods comparable we use thesame original dataset to execute it 5 times for each method+e mean values of the prediction results are shown in thefollowing tables and figures

61 Error Analysis Table 2 shows the MAPE values of hostutilization predictions for 7 physical hosts We can see thatEEMD-ARIMA and EEMD-RT-ARIMA methods havelowerMAPE values than ARIMAmodels for 6-point and 12-point predictions For example EEMD-ARIMA and EEMD-RT-ARIMA methods achieve MAPE values of 606 and505 for the 6-point prediction of host 109 while theARIMA model has a far higher MAPE value (up to 1685)+ey obtain MAPE values of 1013 and 546 for the 12-point prediction of host 109 while the ARIMA model ob-tains 1108 For host 22 both the EEMD-ARIMA andEEMD-RT-ARIMA methods obtain far lower MAPE valuesof 531 and 542 than the 1066 of the ARIMA modelfor 6-point prediction Similarly they also obtain bettereffectiveness on 12-point prediction +e same situation alsoexists in 6-point and 12-point predictions of other hosts+is indicates that both EEMD-ARIMA and EEMD-RT-ARIMA methods have higher prediction accuracy thanARIMAmodels in 6-point and 12-point predictions for hostutilization EEMD reduces the inherent volatility of the hostutilization sequence which improves the prediction accu-racy of the EEMD-ARIMA and EEMD-RT-ARIMAmethods However the situation changes in 24-point pre-diction +e MAPE values of hosts 1162 424 1060 and 237are all over 30 using these three methods Although theEEMD-RT-ARIMA method has lower MAPE values than

0 100 200 300 400 500 600 700ndash3ndash2ndash1

0123

CPU

util

izat

ion

()

(a)

0 100 200 300 400 500 600 700ndash3ndash2ndash1

012

CPU

util

izat

ion

()

(b)

0 100 200 300 400 500 600 700101214161820

CPU

util

izat

ion

()

(c)

Figure 5 New components (a) High-frequency and strong-volatility component (b) Medium-frequency and weak-volatility component(c) Low-frequency trend component

8 Journal of Electrical and Computer Engineering

the ARIMA and EEMD-ARIMA methods for hosts 839 and109 it has far higherMAPE values in the 24-point predictionthan those of 6-point and 12-point predictions +is showsthat the EEMD-ARIMA and EEMD-RT-ARIMA methodsare not suitable for long-term but suitable for short-termprediction

For further analysis we find that the EEMD-RT-ARIMAmethod achieves lower prediction error than the EEMD-ARIMA method for the 6-point and 12-point predictions ofhosts 839 109 and 1162 although the EEMD-RT-ARIMAmethod only selects efficient IMF components However itis the opposite for hosts 22 424 1060 and 237 +e original

CPU utilization sequences of all physical hosts are identicalin frequency so we calculate the RT value of each CPUutilization sequence shown in Table 3 Hosts 839 109 and1162 achieve lower RT values under 10 which shows thattheir CPU utilization is more stationary than other hostsSmaller RT values indicate more stationary host utilizationsequences +is phenomenon can also be seen inFigures 6(a)ndash6(c) From Tables 2 and 3 it can be found thatthe EEMD-RT-ARIMA method achieves a lower MAPEvalue than the EEMD-ARIMA method if the RT value issmaller Conversely the EEMD-RT-ARIMA method has ahigher MAPE value than the EEMD-ARIMA method if the

(1) For each new component(2) Set the order of difference d 0(3) Execute the augmented Dickey-Fuller (ADF) test If it is a stationary time series go to step 5 else go to step 4 until it is stationary(4) Difference the time series and set d d + 1 go to step 3(5) Determine the order of the ARIMA model using Bayesian information criterion (BIC)(6) Estimate the parameters of the ARIMA model using the maximum likelihood(7) Forecast the future n values of this new component using the ARIMA model(8) End(9) Obtain the overall prediction results by superposing the prediction results of each new component

ALGORITHM 1 ARIMA prediction

0204060

CPU

utili

zatio

n(

)

(a)

0204060

CPU

utili

zatio

n(

)

(b)

0204060

CPU

utili

zatio

n(

)

(c)

0204060

CPU

utili

zatio

n(

)

(d)

CPU

utili

zatio

n(

)

0204060

(e)

CPU

utili

zatio

n(

)

0204060

(f )

CPU

utili

zatio

n(

)

0204060

(g)

Figure 6 CPU utilization traces of 7 physical hosts (a) Host 839 (b) Host 109 (c) Host 1162 (d) Host 22 (e) Host 424 (f ) Host 1060 (g)Host 237

Journal of Electrical and Computer Engineering 9

RTvalue is larger For instance hosts 839 109 and 1162 withsmaller RT values obtain lower MAPE values using theEEMD-RT-ARIMA method than the EEMD-ARIMAmethod while hosts 22 424 1060 and 237 with largerRT values obtain higher MAPE values using the EEMD-RT-ARIMA method than the EEMD-ARIMA method

Furthermore the difference in theMAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods decreaseswith the increase in the RTvalues from host 839 to host 1162+eir difference changes to negative from host 22 whichindicates that the EEMD-ARIMA method has higher pre-diction accuracy than the EEMD-RT-ARIMA method+en their difference becomes larger as the RT values in-crease +e 12-point host CPU utilization prediction illus-trates this situation For example host 839 with an RTvalueof 2 has a MAPE of 503 for 12-point prediction by usingthe EEMD-RT-ARIMA method which is 545 lower thanthe 1048 of the EEMD-ARIMA method +e MAPE valueof the EEMD-RT-ARIMA method is only 419 lower thanthat of the EEMD-ARIMAmethod for host 1162 with an RTvalue of 10 For host 22 with an RT value of 16 the EEMD-RT-ARIMA method has a slightly higher MAPE of 537than 533 of the EEMD-ARIMAmethodWith the increaseof the RT value the differences of MAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods furtherincrease to 087 196 and 463 for hosts 424 1060 and237 respectively+is indicates that the EEMD-RT-ARIMAmethod is less effective than the EEMD-ARIMA method inCPU utilization prediction for these hosts Undoubtedlythe ARIMA prediction of each component decomposed by

the EEMD method generates a certain error +e super-position of the prediction results of each component causeserror accumulation +e EEMD-RT-ARIMA method re-duces the error accumulation by selecting and recon-structing the efficient IMF components into fewercomponents so it achieves better prediction accuracy thanthe EEMD-ARIMA method for hosts 839 109 and 1162Certainly the selection and reconstruction of efficient IMFcomponents also cause a certain prediction error due to theabsence of nonefficient components especially for non-stationary host utilization sequences When this kind ofprediction error exceeds the error accumulation of ARIMAprediction of all components in the EEMD-ARIMAmethod the EEMD-RT-ARIMA method is no more ef-fective than the EEMD-ARIMA method for the non-stationary CPU utilization prediction of some hosts suchas hosts 22 424 1060 and 237

62 Effectiveness Analysis To verify the effectiveness of ourmethod in short-term prediction we select the experimentalresults of hosts 839 22 and 237 with the minimum middleand maximum RT values for further analysis Figure 7 showsthe prediction results of the EEMD-RT-ARIMAARIMA andEEMD-ARIMA methods We find that the future resourceutilization of host 839 decreases below 11 According to apredefined policy host 839 is underloaded and can be closedto save energy Figure 7 shows that our method is moreaccurate and effective than the ARIMA model In particularour method tends to change with the trend of data variationwhile the ARIMAmodel cannot keep up with it Our methodis more suitable for handling nonstationary time series thanthe ARIMAmodel Additionally the predicting data using theEEMD-RT-ARIMAmethod are closer to the testing data thanthose of the EEMD-ARIMA method for host 839 +ese

Table 2 MAPE values of host utilization prediction

Host ID Prediction length ARIMA () EEMD-ARIMA () EEMD-RT-ARIMA ()

8396-point 977 473 46112-point 1378 1048 50324-point 2310 2153 882

1096-point 1685 606 50512-point 1108 1013 54624-point 4134 2468 836

11626-point 2445 776 64512-point 1781 1314 89524-point 2193 4141 3345

226-point 1066 531 54212-point 822 533 53724-point 1141 1844 1477

4246-point 3754 1612 176512-point 2174 1659 174624-point 8868 7739 1157

10606-point 3571 1005 139412-point 3760 1744 194024-point 7472 5482 9193

2376-point 2229 785 117812-point 2551 1130 159324-point 12611 15843 18951

Table 3 RT values of each host utilizationHost ID 839 109 1162 22 424 1060 237RT value 2 6 10 16 22 24 38

10 Journal of Electrical and Computer Engineering

results show that the EEMD-RT-ARIMA method is moreeffective than the EEMD-ARIMAmethod for CPU utilizationsequences with weak fluctuations When the host utilizationsequence shows stronger fluctuation the absence of non-efficient IMF components will greatly influence the predictionresults +e EEMD-RT-ARIMA method is no more effectivethan the EEMD-ARIMA method for CPU utilization pre-diction of host 237

To further analyze the effectiveness of our method wecalculate the positive and negative errors for 6-point and 12-point predictions of these hosts shown in Table 4 When thenegative error is smaller the prediction method is moresuitable for cloud resource provision because of avoidingunderprediction It can be observed that most of the pre-diction results of the ARIMAmodel are underpredicted (thecells of positive error are all ldquonullrdquo for hosts 839 and 22)Furthermore the negative errors of the ARIMA model are

all far higher than those of other methods for host 237 Forinstance the ARIMA model has a high negative error of upto 2751 for the 12-point prediction of host 237 while theEEMD-ARIMA and EEMD-RT-ARIMA methods only havenegative errors of 800 and 892 respectively If theARIMA model is used to predict future host utilization itcan cause resource underprovision which cannot ensureapplicationsrsquo QoS +e EEMD-RT-ARIMA method achievessmaller negative errors than the EEMD-ARIMA method forhosts 839 and 22 while it has a larger negative error for hosts237 For instance the EEMD-ARIMA method achieves thenegative error of 1071 for the 12-point prediction of host839 while EEMD-RT-ARIMA only achieves the negativeerror of 474 Similarly the EEMD-ARIMA method ob-tains a negative error of 609 for the 12-point prediction ofhost 22 while the EEMD-RT-ARIMA method achieves alower value of only 505 However the situation changes

6789

101112

0 1 2 3 4 5 6 7Host 839 6-point prediction

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(a)

6789

101112

0 2 4 6 8 10 12

CPU

util

izat

ion

()

Host 839 12-point prediction

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(b)

262830323436

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 6-point prediction

(c)

262830323436

0 2 4 6 8 10 12CP

U u

tiliz

atio

n (

)

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 12-point prediction

(d)

1214161820222426

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 6-point prediction

(e)

0 2 4 6 8 10 121214161820222426

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 12-point prediction

(f )

Figure 7 Prediction of the results of different methods

Journal of Electrical and Computer Engineering 11

for host 237 +e EEMD-ARIMA method has a smallernegative error than the EEMD-RT-ARIMA method Forinstance the EEMD-RT-ARIMA method obtains negativeerrors of 1137 and 892 for 6-point and 12-point pre-dictions while the EEMD-ARIMAmethod only has negativeerrors of 462 and 800

63 Time-Cost Analysis To verify the applicability of ourmethod we further compare the time cost of these methodsin Figure 8 +e running time of the EEMD-ARIMAmethod is the largest by over 180 s while the ARIMAmodel takes the least time at less than 50 s +e EEMD-RT-ARIMA method time cost is between 70 s and 117 s whichdecreases the time cost by 40ndash80 compared with theEEMD-ARIMA method For example the running time ofthe EEMD-RT-ARIMA method is 6937 s far less than the33720 s of the EEMD-ARIMA method for the 6-pointprediction of host 22 Our method saves up to 80 of thetime cost For the CPU utilization sequence of host 237with strong variability it requires 19046 s to predict thefuture 6-point values using the EEMD-ARIMA methodwhile it only takes 11364 s using the EEMD-RT-ARIMAmethod +e running time is reduced by approximately40 Considering the prediction accuracy effectivenessand time cost our EEMD-RT-ARIMA method is morecost-effective for short-term host utilization prediction incloud computing

7 Conclusions

Host utilization is an indicator of host performancewhose prediction can promote effective resource sched-uling in cloud computing However host utilizationdemonstrates strong randomness and instability caused byusersrsquo random and various resource demands It is difficultto improve prediction accuracy In this paper we proposea hybrid and cost-effective method EEMD-RT-ARIMAfor short-term host utilization prediction in cloud com-puting +e EEMD method is first used to decompose thenonstationary host utilization sequence into a few rela-tively stationary IMF components and an R component+en we calculate the correlation coefficient between eachIMF component and the original data to select efficientIMF components and use RTvalues and average periods toreconstruct these components into three new componentsto reduce error accumulation and time cost Finally three

new components are predicted by the ARIMA model andtheir prediction results are superposed to form the overallprediction results We use the real host utilization tracesfrom a cloud platform to conduct the experiments andcompare our EEMD-RT-ARIMA method with theARIMA model and EEMD-ARIMA method in terms oferror effectiveness and time-cost analysis +e resultsshow that our method is cost-effective and is more suitablefor short-term host utilization prediction in cloudcomputing

Data Availability

+e running example and experimental data used to supportthe findings of this study have been deposited in the Figsharerepository (httpsdoiorg106084m9figshare7679594)

Conflicts of Interest

+e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

+is work was supported by the Shandong ProvincialNatural Science Foundation (ZR2016FM41)

Table 4 Positive and negative error analysis

HostID

Predictionlength

ARIMA EEMD-ARIMA EEMD-RT-ARIMAPositive error

()Negative error

()Positive error

()Negative error

()Positive error

()Negative error

()

839 6-point Null 977 316 585 492 41512-point Null 1378 343 1071 480 474

22 6-point Null 1066 391 670 412 63412-point Null 972 426 609 539 505

237 6-point 039 2668 1431 462 1381 113712-point 039 2751 1744 800 1790 892

0

50

100

150

200

250

300

350

12-point

Tim

e cos

t (s)

ARIMAEEMD-ARIMA EEMD-RT-ARIMA

6-pointHost 839

12-point6-pointHost 22

12-point6-pointHost 237

Figure 8 Time cost of different prediction methods

12 Journal of Electrical and Computer Engineering

References

[1] J J Prevost K Nagothu B Kelley and M Jamshidi ldquoPre-diction of cloud data center networks loads using stochasticand neural modelsrdquo in Proceedings of 2011 6th InternationalConference on System of Systems Engineering pp 276ndash281IEEE Albuquerque NM USA June 2011

[2] M Borkowski S Schulte and C Hochreiner ldquoPredictingcloud resource utilizationrdquo in Proceedings of 9th InternationalConference on Utility and Cloud Computing (UCC) pp 37ndash42IEEE Shanghai China December 2016

[3] M Barati and S Sharifian ldquoA hybrid heuristic-based tunedsupport vector regression model for cloud load predictionrdquoJournal of Supercomputing vol 71 no 11 pp 4235ndash42592015

[4] Z Chen Y Zhu Y Di S Feng and J Geng ldquoA high-accuracyself-adaptive resource demands predicting method in IaaScloud environmentrdquo Neural Network World vol 25 no 5pp 519ndash540 2015

[5] J Chen and Y Wang ldquoA resource demand prediction methodbased on EEMD in cloud computingrdquo Procedia ComputerScience vol 131 pp 116ndash123 2018

[6] H Toumi Z Brahmi Z Benarfa and M M GammoudildquoServer load prediction using stream miningrdquo in Proceedingsof 2017 International Conference on Information Networking(ICOIN) pp 653ndash661 IEEE Da Nang Vietnam January2017

[7] S Di D Kondo and W Cirne ldquoHost load prediction in aGoogle compute cloud with a Bayesian modelrdquo in Proceedingsof 2012 International Conference on High PerformanceComputing Networking Storage and Analysis (SCrsquo12)pp 1ndash11 IEEE Salt Lake City UT USA November 2012

[8] N K Gondhi and P Kailu ldquoPrediction based energy efficientvirtual machine consolidation in cloud computingrdquo in Pro-ceedings of 2015 Second International Conference on Advancesin Computing and Communication Engineering pp 437ndash441IEEE Dehradun India May 2015

[9] A Verma G Dasgupta T K Nayak P De and R KotharildquoServer workload analysis for power minimization usingconsolidationrdquo in Proceedings of the 2009 Conference onUSENIX Annual Technical Conference p 28 USENIX As-sociation San Diego CA USA June 2009

[10] B Song Y Yu Y Zhou Z Wang and S Du ldquoHost loadprediction with long short-term memory in cloud comput-ingrdquo Journal of Supercomputing vol 74 no 12 pp 6554ndash6568 2018

[11] J-J Jheng F-H Tseng H-C Chao and L-D Chou ldquoA novelVM workload prediction using grey forecasting model incloud data centerrdquo in Proceedings of International Conferenceon Information Networking 2014 (ICOIN2014) pp 40ndash45IEEE Phuket +ailand February 2014

[12] A Beloglazov and R Buyya ldquoManaging overloaded hosts fordynamic consolidation of virtual machines in cloud datacenters under quality of service constraintsrdquo IEEE Trans-actions on Parallel and Distributed Systems vol 24 no 7pp 1366ndash1379 2013

[13] M Dabbagh B Hamdaoui M Guizani and A Rayes ldquoAnenergy-efficient VM prediction and migration framework forovercommitted cloudsrdquo IEEE Transactions on Cloud Com-puting vol 6 no 4 pp 955ndash966 2018

[14] D Minarolli A Mazrekaj and B Freisleben ldquoTackling un-certainty in long-term predictions for host overload andunderload detection in cloud computingrdquo Journal of CloudComputing vol 6 no 1 p 4 2017

[15] K Mason M Duggan E Barrett J Duggan and E HowleyldquoPredicting host CPU utilization in the cloud using evolu-tionary neural networksrdquo Future Generation Computer Sys-tems vol 86 pp 162ndash173 2018

[16] D Magalhatildees R N Calheiros R Buyya and D G GomesldquoWorkload modeling for resource usage analysis and simu-lation in cloud computingrdquo Computers amp Electrical Engi-neering vol 47 pp 69ndash81 2015

[17] C Tian Y Wang Y Luo et al ldquoMinimizing content re-organization and tolerating imperfect workload prediction forcloud-based video-on-demand servicesrdquo IEEE Transactionson Services Computing vol 9 no 6 pp 926ndash939 2016

[18] M Verma G R Gangadharan V Ravi and N NarendraldquoResource demand prediction in multi-tenant service cloudsrdquoin Proceedings of 2013 IEEE International Conference on CloudComputing in Emerging Markets (CCEM) pp 1ndash8 IEEEBangalore India October 2013

[19] W Zhang Y Shi L Liu L Cui and Y Zheng ldquoPerformanceand resource prediction at high utilization for n-tier servicesystems in cloud an experiment driven approachrdquo in Pro-ceedings of 2015 IEEE International Conference on Computerand Information Technology Ubiquitous Computing andCommunications Dependable Autonomic and Secure Com-puting Pervasive Intelligence and Computing pp 843ndash848IEEE Liverpool UK October 2015

[20] G Kecskemeti A Kertesz and Z Nemeth ldquoCloud workloadprediction by means of simulationsrdquo in Proceedings of theComputing Frontiers Conference on ZZZ (CFrsquo17) pp 279ndash282ACM Siena Italy May 2017

[21] Y Chen and Z-A Jiang ldquoDynamically predicting the qualityof service batch online and hybrid algorithmsrdquo Journal ofElectrical and Computer Engineering vol 2017 Article ID9547869 10 pages 2017

[22] A Khan X Yan S Tao and N Anerousis ldquoWorkloadcharacterization and prediction in the cloud a multiple timeseries approachrdquo in Proceedings of 2012 IEEE Network Op-erations and Management Symposium pp 1287ndash1294 IEEEMaui HI USA April 2012

[23] A K Mishra J L Hellerstein W Cirne and C R DasldquoTowards characterizing cloud backend workloadsrdquo ACMSIGMETRICS Performance Evaluation Review vol 37 no 4pp 34ndash41 2010

[24] D Gmach J Rolia L Cherkasova and A KemperldquoWorkload analysis and demand prediction of enterprise datacenter applicationsrdquo in Proceedings of 2007 IEEE 10th In-ternational Symposium on Workload Characterizationpp 171ndash180 IEEE Boston MA USA September 2007

[25] F-H Tseng X Wang L-D Chou H-C Chao andV C M Leung ldquoDynamic resource prediction and allocationfor cloud data center using the multiobjective genetic algo-rithmrdquo IEEE Systems Journal vol 12 no 2 pp 1688ndash16992018

[26] G K Shyam and S S Manvi ldquoVirtual resource prediction incloud environment a Bayesian approachrdquo Journal of Networkand Computer Applications vol 65 pp 144ndash154 2016

[27] Y Lu J Panneerselvam L Liu and Y Wu ldquoRVLBPNN aworkload forecasting model for smart cloud computingrdquoScientific Programming vol 2016 Article ID 5635673 9 pages2016

[28] K Rajaram and M P Malarvizhi ldquoUtilization based pre-diction model for resource provisioningrdquo in Proceedings of2017 International Conference on Computer Communicationand Signal Processing (ICCCSP) pp 1ndash6 IEEE ChennaiIndia January 2017

Journal of Electrical and Computer Engineering 13

[29] L Li and A Zhang ldquoResource demand optimization com-bined prediction under cloud computing environment basedon IOWGA operatorrdquo International Journal of Grid andDistributed Computing vol 8 no 3 pp 77ndash86 2015

[30] D Minarolli and B Freisleben ldquoCross-correlation predictionof resource demand for virtual machine resource allocation incloudsrdquo in Proceedings of 2014 Sixth International Conferenceon Computational Intelligence Communication Systems andNetworks pp 119ndash124 IEEE Tetova Macedonia May 2014

[31] W Zhang P Duan L T Yang et al ldquoResource requestsprediction in the cloud computing environment with a deepbelief networkrdquo Software Practice and Experience vol 47no 3 pp 473ndash488 2017

[32] H-B Mi H-M Wang G Yin D-X Shi Y-F Zhou andL Yuan ldquoResource on-demand reconfiguration method forvirtualized data centersrdquo Journal of Software vol 22 no 9pp 2193ndash2205 2011

[33] R N Calheiros E Masoumi R Ranjan and R BuyyaldquoWorkload prediction using ARIMAmodel and its impact oncloud applicationsrsquo QoSrdquo IEEE Transactions on CloudComputing vol 3 no 4 pp 449ndash458 2015

[34] Y Meng R Rao X Zhang and P Hong ldquoCRUPA a con-tainer resource utilization prediction algorithm for auto-scaling based on time series analysisrdquo in Proceedings of2016 International Conference on Progress in Informatics andComputing (PIC) pp 468ndash472 IEEE Shanghai China De-cember 2016

[35] E Dhib N Zangar N Tabbane and K Boussetta ldquoImpact ofseasonal ARIMA workload prediction model on QoE formassively multiplayers online gamingrdquo in Proceedings of 20165th International Conference on Multimedia Computing andSystems (ICMCS) pp 737ndash741 IEEE Marrakech MoroccoSeptember 2016

[36] A Ganapathi Y Chen A Fox R Katz and D PattersonldquoStatistics-driven workload modeling for the cloudrdquo inProceedings of 2010 IEEE 26th International Conference onData EngineeringWorkshops (ICDEW 2010) pp 87ndash92 IEEELong Beach CA USA March 2010

[37] V G Tran V Debusschere and S Bacha ldquoHourly serverworkload forecasting up to 168 hours ahead using seasonalARIMA modelrdquo in Proceedings of 2012 IEEE InternationalConference on Industrial Technology pp 1127ndash1131 IEEEAthens Greece March 2012

[38] D Xu S Yang and H Luo ldquoResearch on generalized fuzzysoft sets theory based combined model for demanded cloudcomputing resource predictionrdquo Chinese Journal of Man-agement Science vol 23 no 5 pp 56ndash64 2015

[39] S Li Y Wang X Qiu D Wang and L Wang ldquoA workloadprediction-basedmulti-VM provisioningmechanism in cloudcomputingrdquo in Proceedings of 2013 15th Asia-Pacific NetworkOperations andManagement Symposium (APNOMS) pp 1ndash6IEEE Hiroshima Japan September 2013

[40] X Fu and C Zhou ldquoPredicted affinity based virtual machineplacement in cloud computing environmentsrdquo IEEE Trans-actions on Cloud Computing vol 99 p 1 2017

[41] Y Jiang C Perng T Li and R Chang ldquoASAP a self-adaptiveprediction system for instant cloud resource demand pro-visioningrdquo in Proceedings of 2011 IEEE 11th InternationalConference on Data Mining pp 1104ndash1109 IEEE VancouverBC Canada December 2011

[42] Z Wu and N E Huang ldquoEnsemble empirical mode de-composition a noise-assisted data analysis methodrdquo Ad-vances in Adaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[43] H Zang L Fan M Guo Z Wei G Sun and L ZhangldquoShort-term wind power interval forecasting based on anEEMD-RT-RVMmodelrdquo Advances in Meteorology vol 2016Article ID 8760780 10 pages 2016

[44] N Safari C Y Chung and G C D Price ldquoNovel multi-stepshort-term wind power prediction framework based onchaotic time series analysis and singular spectrum analysisrdquoIEEE Transactions on Power Systems vol 33 no 1 pp 590ndash601 2018

[45] X Chen H Wang J Huang and H Ren ldquoAPU degradationprediction based on EEMD and Gaussian process regressionrdquoin Proceedings of 2017 International Conference on SensingDiagnostics Prognostics and Control (SDPC) pp 98ndash104IEEE Shanghai China August 2017

[46] C Yan C Yi XWu et al ldquoTurbine fault trend prediction thatbased on EEMD and ARIMA modelsrdquo Journal of GansuSciences vol 28 no 4 pp 100ndash106 2016

[47] M Jin P Li L Zhang et al ldquoA signal feature method and itsapplication based on EEMD fuzzy entropy and GK cluster-ingrdquo ACTA Metrologica Sinica vol 26 no 5 pp 501ndash5052015

[48] E H Norden Z Shen R L Steven et al ldquo+e empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysisrdquoin Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 no 1971 pp 903ndash995 RoyalSociety London UK March 1998

[49] Alibaba cluster-trace-v2017 httpsgithubcomalibabaclusterdata

14 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: A Hybrid Method for Short-Term Host Utilization Prediction ...downloads.hindawi.com/journals/jece/2019/2782349.pdf · terminesthe migratedVMs according to powersavings and workloadbalance.Dabbaghetal.[13]proposedaprediction

volatility component medium-frequency and weak-volatility component and low-frequency trend compo-nent +en we use the ARIMA method to predict theresults of three new components Finally the overallprediction results are achieved by summing the predictionresults of the three new components

+e key to our EEMD-RT-ARIMA method is to selectand reconstruct efficient components Compared with theEEMD-ARIMA method the number of its componentsinvolving in ARIMA prediction is reduced +us theEEMD-RT-ARIMA method can reduce the prediction erroraccumulation and the total prediction time by reducing thenumber of components Obviously both the EEMD-RT-ARIMA method and EEMD-ARIMA method have a higherprediction time than the ARIMA model from their imple-mentation processes However our EEMD-RT-ARIMAmethod focuses on cost-effectiveness which has a trade-off among prediction accuracy effectiveness and time cost

41 Use of EEMD toDecompose theHostUtilization SequenceA host utilization sequence is classified into different cate-gories according to the CPU memory and disk such as CPUutilization sequence Tcpu c1 ci cn1113864 1113865 +e CPUutilization sequence is usually random and unstable owing torandom and sudden resource demands in cloud computing Itis necessary to transform such data into relatively stationarydata to improve prediction accuracy +e EEMD methodappears to be more effective in processing nonlinear andnonstationary data sequences than other decomposition al-gorithms+erefore we use the EEMDmethod to decomposethe host utilization sequence and obtain a series of the IMFi

components and the R componentA running example shows the nonstationary CPU uti-

lization trace of a physical host from our cloud platform We

divide it into the training set (673 data points) and thetesting set (24 data points) in Figure 3 +en we use theEEMD method to decompose the training set and obtainIMF1-IMF8 components and the R component +ey areshown from the high frequency to low frequency in Figure 4

42 Calculation of the Correlation Coefficients to Select Effi-cient IMF Components A correlation coefficient measuresthe correlation between two sequences We calculate thecorrelation coefficient Pj(X Y) between the IMFj compo-nent and the original training set based on the followingformula where cov(X Y) is a covariance between the se-quences X and Y and Var(X) and Var(Y) are the variancesof the sequence X and the sequence Y

Pj(X Y) Cov(X Y)

Var(X)

1113968 Var(Y)

1113968 (5)

+en the correlation coefficient Pj(X Y) is checked todetermine whether it is negative If it is negative the IMFj

component is inefficient and dropped If it is not negativethe IMFj component is efficient and reserved

We calculate the correlation coefficient between eachIMF component and the original training set Only IMF6and IMF7 have negative correlation coefficients of minus008 andminus015 Hence they are dropped IMF1ndashIMF5 and IMF8 areselected as efficient IMF components

43 Reconstruction of Efficient IMFs and R into NewComponents Each IMF component actually reflects a cer-tain physical feature of the original data If some IMFcomponents are closer in terms of frequency and amplitudefluctuation then they have similar features+us they can bereconstructed into a new component with these typical

hellip

Use EEMD to decompose the host utilization sequence

IMF1 IMF2 IMFhndash1 R

High-frequency and strong-volatility component

hellip

Medium-frequency and weak-volatility component

Low-frequency trend component

Use ARIMA to predict the result of this component

Use correlation coefficients to select efficient IMFs

IMF1 IMFm

Use RT values and average periods to reconstruct efficient IMFs and R intothree new components

Use ARIMA to predict the result of this component

Use ARIMA to predict the result of this component

Construct the overall prediction result by summing the prediction results of three new components

Figure 2 EEMD-RT-ARIMA method

Journal of Electrical and Computer Engineering 5

features +us the prediction error accumulation and theprediction time of the EEMD-ARIMA method can be re-duced by reducing the number of components

+e average period reflects the frequency of host utili-zation variation +ere exists a reciprocal relation betweenthem +e smaller the average period the higher the fre-quency If the average periods of IMF components are closerthey are closer in frequency +e average period is calculatedby the following formula in which n is the number of thetraining set and lj is the number of extrema

Tj n

lj (6)

Similarly the RT value reflects the trend of amplitudefluctuation If the RTvalue is larger the amplitude volatility isstronger If the RT values of the two IMFs are closer theoverall trend of the two IMFs is similar in amplitude volatility

To enhance the prediction accuracy and reduce theprediction time of the EEMD-ARIMA method we re-construct the IMF components and the R component intothree new components according to their average periodsand RT values in the EEMD-RT-ARIMA method Becausethe average period and RT value have different units wenormalize the average period Tj as follows

Tnj Tj minusTmin

Tmax minusTmin (7)

where Tnj denotes the normalized average period of theIMFj component Tmax and Tmin represent the maximumandminimum of the average periods of all IMF componentsSimilarly the RT value Rj can be normalized as follows

Rnj Rj minusRmin

Rmax minusRmin (8)

where Rnj is the normalized value of Rj Rmax and Rmin arethe maximum and minimum of all RT values +us thereconstruction factor (RF) is defined as follows

Fj α middot 1minusTnj1113872 1113873 + β middot Rnj (9)

An IMF component is higher in frequency and strongerin volatility and its RF value is greater If the RF values of thetwo IMF components are closer their overall trends aremore similar +us they can be reconstructed into a newcomponent All efficient IMF and R components arereconstructed into three new components high-frequencyand strong-volatility component medium-frequency andweak-volatility component and low-frequency trend com-ponent +e high-frequency and strong-volatility compo-nent reflects the strong volatility and randomness of thehigh-frequency part of the original host utilization sequence+e medium-frequency and weak-volatility componentshows the detailed features of the volatility of the originalhost utilization sequence +e low-frequency trend com-ponent depicts the overall trend of the volatility of theoriginal host utilization sequence

Table 1 shows the RT values average periods and RFvalues of efficient IMF and R components +e RF values ofIMF1 and IMF2 are large and relatively close while the RFvalues of IMF8 and R are equal to 0 +e RF values ofIMF3ndashIMF5 are close +erefore we reconstructed IMF1-IMF2 IMF3ndashIMF5 and IMF8-R into three new compo-nents as shown in Figures 5(a)ndash5(c) +ey separately reflectthe randomness the fluctuation details and the overall trendof the original host utilization sequence

44 Use of the ARIMA Model to Predict the Future HostUtilization We use the ARIMA model to predict the futureresults for each new component+en the overall predictionresults are obtained by superposing the prediction results ofeach new component+e ARIMA prediction is described asfollows (Algorithm 1)

For example we assume that three new components ChCm and Cl are obtained which represent the high-frequencyand strong-volatility component medium-frequency andweak-volatility component and low-frequency trend com-ponent respectively +en we use the ARIMA method topredict the future 24-point values for each new component+e prediction results Ph Pm and Pl of three new com-ponents can be described in the following formulas each ofwhich includes the values of the predicting 24-point data

Ph f1h f

2h f

24h1113872 1113873

Pm f1m f

2m f

24m1113872 1113873

Pl f1l f

2l f

24l1113872 1113873

(10)

Finally we calculate the overall prediction result P bysuperposing the prediction results of each new componentas follows

P Ph + Pm + Pl (11)

From this process of the EEMD-RT-ARIMAmethod wefind that the number of components decreases from 9 to 3which can reduce the total prediction time and the erroraccumulation of the component prediction compared withthe EEMD-ARIMA method

0 50 100 150 200 250 300 350 400 450 500 550 600 650 7005

10

15

20

25

30

Testing set

CPU

util

izat

ion

()

Sample data

Training set

Figure 3 CPU utilization trace of a physical host

6 Journal of Electrical and Computer Engineering

5 Experimental Setup

We conduct an experiment to evaluate our method +eexperimental dataset and measurement metrics are in-troduced as follows

51 Experimental Dataset Host utilization mainly involvesin CPU utilization memory utilization network utilizationand disk utilization In this paper we mainly focus on hostCPU utilization We randomly select CPU utilization tracesof 7 physical hosts from the dataset released by Alibaba inAugust 2017 [49] each of which includes 144 points(5minutes per point) +ese traces are all time-dependentsequences as shown in Figures 6(a)ndash6(g)

Each sequence is divided into a training set and a testingset We first use the training set to predict the future dataand then these predicting data are compared with thoseactual data in the testing set to evaluate our method In thispaper each training set is set as the first 120 points and thetesting set is defined as the subsequent points such as 6points 12 points and 24 points We set the number of it-erations k 50 and the standard deviation d 02 in EEMDdecomposition

52MeasurementMetrics We evaluate our method in termsof error effectiveness and time-cost analysis as follows

521 Error Analysis To evaluate our method we use themean absolute percentage error (MAPE) to reflect theprediction accuracy MAPE is defined as follows

MAPE 1m

1113944

m

i1

xfi minusxt

i

xti

⎛⎝ ⎞⎠lowast100 (12)

where xfi denotes the value of the prediction point xt

i

denotes the actual value in the testing set and m denotes the

0 100 200 300 400 500 600 700

ndash2

0

2

IMF1

CPU

util

izat

ion

()

(a)

0 100 200 300 400 500 600 700IMF2

ndash2

0

2

CPU

util

izat

ion

()

(b)

0 100 200 300 400 500 600 700IMF3

ndash2

0

2

CPU

util

izat

ion

()

(c)

0 100 200 300 400 500 600 700IMF4

ndash2

0

2

CPU

util

izat

ion

()

(d)

0 100 200 300 400 500 600 700IMF5

CPU

util

izat

ion

()

ndash2

0

2

(e)

0 100 200 300 400 500 600 700IMF6

CPU

util

izat

ion

()

ndash2

0

2

(f )

0 100 200 300 400 500 600 700IMF7

CPU

util

izat

ion

()

ndash2

0

2

(g)

0 100 200 300 400 500 600 700IMF8

CPU

util

izat

ion

()

ndash2

0

2

(h)

0 100 200 300 400 500 600 700R

CPU

util

izat

ion

()

101214161820

(i)

Figure 4 Decomposition results of EEMD

Table 1 RT average periods and RF of efficient IMF and Rcomponents

IMF1 IMF2 IMF3 IMF4 IMF5 IMF8 RRT 431 181 96 47 21 5 2Average period 212 831 1641 3059 673 673 673RF 1 070 060 053 047 0 0

Journal of Electrical and Computer Engineering 7

number of prediction points It is obvious that the predictionaccuracy is higher when the MAPE is lower

522 Effectiveness Analysis Host utilization under-prediction or overprediction can lead to resource under-provision or overprovision Resource underprovisioncannot guarantee applicationsrsquo QoS while resource over-provision can cause resource waste and low resource utili-zation +erefore a good prediction method should avoidunderprediction and overprediction In particular under-prediction should be avoided as much as possible because itresults in a lower QoS to users

We set up the positive and negative errors to reflect theoverprediction and underprediction and then use them toevaluate the effectiveness of our method A good predictionmethod should have a smaller negative error to avoidunderprediction+e positive and negative prediction errorsare calculated by the following formula where pi is thepredicting data ri is the actual data and m is the number ofunderprediction data (ie negative deviation) or over-prediction data (ie positive deviation)

ei 1m

1113944

m

i1pi minus ri

11138681113868111386811138681113868111386811138681113868 (13)

523 Time Cost Analysis Host utilization varies veryquickly in a cloud data center If host utilization prediction isslower than the determination of VM migration resourceprovision will be delayed which can cause poor QoS +ushost utilization prediction must be completed in a timelymanner To investigate the time cost of our proposedmethod we test the running time of the EEMD-RT-ARIMAmethod and compared it with other prediction methodsaccording to the following index tc

tc tour minus tother

totherlowast100 (14)

where tour indicates the running time of our method EEMD-RT-ARIMA and tother represents the running time of othermethods such as the ARIMA model or the EEMD-ARIMA

method tc denotes the percent of the reduced or increasedtime cost

6 Experimental Results and Analysis

To validate the prediction effectiveness of our EEMD-RT-ARIMA method we conduct experiments on ARIMAEEMD-ARIMA and EEMD-RT-ARIMA methods andcompare their predictive results All experiments wereperformed on a PC with 25GHz Intel (R) i7 CPU runningMATLAB To make three methods comparable we use thesame original dataset to execute it 5 times for each method+e mean values of the prediction results are shown in thefollowing tables and figures

61 Error Analysis Table 2 shows the MAPE values of hostutilization predictions for 7 physical hosts We can see thatEEMD-ARIMA and EEMD-RT-ARIMA methods havelowerMAPE values than ARIMAmodels for 6-point and 12-point predictions For example EEMD-ARIMA and EEMD-RT-ARIMA methods achieve MAPE values of 606 and505 for the 6-point prediction of host 109 while theARIMA model has a far higher MAPE value (up to 1685)+ey obtain MAPE values of 1013 and 546 for the 12-point prediction of host 109 while the ARIMA model ob-tains 1108 For host 22 both the EEMD-ARIMA andEEMD-RT-ARIMA methods obtain far lower MAPE valuesof 531 and 542 than the 1066 of the ARIMA modelfor 6-point prediction Similarly they also obtain bettereffectiveness on 12-point prediction +e same situation alsoexists in 6-point and 12-point predictions of other hosts+is indicates that both EEMD-ARIMA and EEMD-RT-ARIMA methods have higher prediction accuracy thanARIMAmodels in 6-point and 12-point predictions for hostutilization EEMD reduces the inherent volatility of the hostutilization sequence which improves the prediction accu-racy of the EEMD-ARIMA and EEMD-RT-ARIMAmethods However the situation changes in 24-point pre-diction +e MAPE values of hosts 1162 424 1060 and 237are all over 30 using these three methods Although theEEMD-RT-ARIMA method has lower MAPE values than

0 100 200 300 400 500 600 700ndash3ndash2ndash1

0123

CPU

util

izat

ion

()

(a)

0 100 200 300 400 500 600 700ndash3ndash2ndash1

012

CPU

util

izat

ion

()

(b)

0 100 200 300 400 500 600 700101214161820

CPU

util

izat

ion

()

(c)

Figure 5 New components (a) High-frequency and strong-volatility component (b) Medium-frequency and weak-volatility component(c) Low-frequency trend component

8 Journal of Electrical and Computer Engineering

the ARIMA and EEMD-ARIMA methods for hosts 839 and109 it has far higherMAPE values in the 24-point predictionthan those of 6-point and 12-point predictions +is showsthat the EEMD-ARIMA and EEMD-RT-ARIMA methodsare not suitable for long-term but suitable for short-termprediction

For further analysis we find that the EEMD-RT-ARIMAmethod achieves lower prediction error than the EEMD-ARIMA method for the 6-point and 12-point predictions ofhosts 839 109 and 1162 although the EEMD-RT-ARIMAmethod only selects efficient IMF components However itis the opposite for hosts 22 424 1060 and 237 +e original

CPU utilization sequences of all physical hosts are identicalin frequency so we calculate the RT value of each CPUutilization sequence shown in Table 3 Hosts 839 109 and1162 achieve lower RT values under 10 which shows thattheir CPU utilization is more stationary than other hostsSmaller RT values indicate more stationary host utilizationsequences +is phenomenon can also be seen inFigures 6(a)ndash6(c) From Tables 2 and 3 it can be found thatthe EEMD-RT-ARIMA method achieves a lower MAPEvalue than the EEMD-ARIMA method if the RT value issmaller Conversely the EEMD-RT-ARIMA method has ahigher MAPE value than the EEMD-ARIMA method if the

(1) For each new component(2) Set the order of difference d 0(3) Execute the augmented Dickey-Fuller (ADF) test If it is a stationary time series go to step 5 else go to step 4 until it is stationary(4) Difference the time series and set d d + 1 go to step 3(5) Determine the order of the ARIMA model using Bayesian information criterion (BIC)(6) Estimate the parameters of the ARIMA model using the maximum likelihood(7) Forecast the future n values of this new component using the ARIMA model(8) End(9) Obtain the overall prediction results by superposing the prediction results of each new component

ALGORITHM 1 ARIMA prediction

0204060

CPU

utili

zatio

n(

)

(a)

0204060

CPU

utili

zatio

n(

)

(b)

0204060

CPU

utili

zatio

n(

)

(c)

0204060

CPU

utili

zatio

n(

)

(d)

CPU

utili

zatio

n(

)

0204060

(e)

CPU

utili

zatio

n(

)

0204060

(f )

CPU

utili

zatio

n(

)

0204060

(g)

Figure 6 CPU utilization traces of 7 physical hosts (a) Host 839 (b) Host 109 (c) Host 1162 (d) Host 22 (e) Host 424 (f ) Host 1060 (g)Host 237

Journal of Electrical and Computer Engineering 9

RTvalue is larger For instance hosts 839 109 and 1162 withsmaller RT values obtain lower MAPE values using theEEMD-RT-ARIMA method than the EEMD-ARIMAmethod while hosts 22 424 1060 and 237 with largerRT values obtain higher MAPE values using the EEMD-RT-ARIMA method than the EEMD-ARIMA method

Furthermore the difference in theMAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods decreaseswith the increase in the RTvalues from host 839 to host 1162+eir difference changes to negative from host 22 whichindicates that the EEMD-ARIMA method has higher pre-diction accuracy than the EEMD-RT-ARIMA method+en their difference becomes larger as the RT values in-crease +e 12-point host CPU utilization prediction illus-trates this situation For example host 839 with an RTvalueof 2 has a MAPE of 503 for 12-point prediction by usingthe EEMD-RT-ARIMA method which is 545 lower thanthe 1048 of the EEMD-ARIMA method +e MAPE valueof the EEMD-RT-ARIMA method is only 419 lower thanthat of the EEMD-ARIMAmethod for host 1162 with an RTvalue of 10 For host 22 with an RT value of 16 the EEMD-RT-ARIMA method has a slightly higher MAPE of 537than 533 of the EEMD-ARIMAmethodWith the increaseof the RT value the differences of MAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods furtherincrease to 087 196 and 463 for hosts 424 1060 and237 respectively+is indicates that the EEMD-RT-ARIMAmethod is less effective than the EEMD-ARIMA method inCPU utilization prediction for these hosts Undoubtedlythe ARIMA prediction of each component decomposed by

the EEMD method generates a certain error +e super-position of the prediction results of each component causeserror accumulation +e EEMD-RT-ARIMA method re-duces the error accumulation by selecting and recon-structing the efficient IMF components into fewercomponents so it achieves better prediction accuracy thanthe EEMD-ARIMA method for hosts 839 109 and 1162Certainly the selection and reconstruction of efficient IMFcomponents also cause a certain prediction error due to theabsence of nonefficient components especially for non-stationary host utilization sequences When this kind ofprediction error exceeds the error accumulation of ARIMAprediction of all components in the EEMD-ARIMAmethod the EEMD-RT-ARIMA method is no more ef-fective than the EEMD-ARIMA method for the non-stationary CPU utilization prediction of some hosts suchas hosts 22 424 1060 and 237

62 Effectiveness Analysis To verify the effectiveness of ourmethod in short-term prediction we select the experimentalresults of hosts 839 22 and 237 with the minimum middleand maximum RT values for further analysis Figure 7 showsthe prediction results of the EEMD-RT-ARIMAARIMA andEEMD-ARIMA methods We find that the future resourceutilization of host 839 decreases below 11 According to apredefined policy host 839 is underloaded and can be closedto save energy Figure 7 shows that our method is moreaccurate and effective than the ARIMA model In particularour method tends to change with the trend of data variationwhile the ARIMAmodel cannot keep up with it Our methodis more suitable for handling nonstationary time series thanthe ARIMAmodel Additionally the predicting data using theEEMD-RT-ARIMAmethod are closer to the testing data thanthose of the EEMD-ARIMA method for host 839 +ese

Table 2 MAPE values of host utilization prediction

Host ID Prediction length ARIMA () EEMD-ARIMA () EEMD-RT-ARIMA ()

8396-point 977 473 46112-point 1378 1048 50324-point 2310 2153 882

1096-point 1685 606 50512-point 1108 1013 54624-point 4134 2468 836

11626-point 2445 776 64512-point 1781 1314 89524-point 2193 4141 3345

226-point 1066 531 54212-point 822 533 53724-point 1141 1844 1477

4246-point 3754 1612 176512-point 2174 1659 174624-point 8868 7739 1157

10606-point 3571 1005 139412-point 3760 1744 194024-point 7472 5482 9193

2376-point 2229 785 117812-point 2551 1130 159324-point 12611 15843 18951

Table 3 RT values of each host utilizationHost ID 839 109 1162 22 424 1060 237RT value 2 6 10 16 22 24 38

10 Journal of Electrical and Computer Engineering

results show that the EEMD-RT-ARIMA method is moreeffective than the EEMD-ARIMAmethod for CPU utilizationsequences with weak fluctuations When the host utilizationsequence shows stronger fluctuation the absence of non-efficient IMF components will greatly influence the predictionresults +e EEMD-RT-ARIMA method is no more effectivethan the EEMD-ARIMA method for CPU utilization pre-diction of host 237

To further analyze the effectiveness of our method wecalculate the positive and negative errors for 6-point and 12-point predictions of these hosts shown in Table 4 When thenegative error is smaller the prediction method is moresuitable for cloud resource provision because of avoidingunderprediction It can be observed that most of the pre-diction results of the ARIMAmodel are underpredicted (thecells of positive error are all ldquonullrdquo for hosts 839 and 22)Furthermore the negative errors of the ARIMA model are

all far higher than those of other methods for host 237 Forinstance the ARIMA model has a high negative error of upto 2751 for the 12-point prediction of host 237 while theEEMD-ARIMA and EEMD-RT-ARIMA methods only havenegative errors of 800 and 892 respectively If theARIMA model is used to predict future host utilization itcan cause resource underprovision which cannot ensureapplicationsrsquo QoS +e EEMD-RT-ARIMA method achievessmaller negative errors than the EEMD-ARIMA method forhosts 839 and 22 while it has a larger negative error for hosts237 For instance the EEMD-ARIMA method achieves thenegative error of 1071 for the 12-point prediction of host839 while EEMD-RT-ARIMA only achieves the negativeerror of 474 Similarly the EEMD-ARIMA method ob-tains a negative error of 609 for the 12-point prediction ofhost 22 while the EEMD-RT-ARIMA method achieves alower value of only 505 However the situation changes

6789

101112

0 1 2 3 4 5 6 7Host 839 6-point prediction

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(a)

6789

101112

0 2 4 6 8 10 12

CPU

util

izat

ion

()

Host 839 12-point prediction

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(b)

262830323436

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 6-point prediction

(c)

262830323436

0 2 4 6 8 10 12CP

U u

tiliz

atio

n (

)

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 12-point prediction

(d)

1214161820222426

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 6-point prediction

(e)

0 2 4 6 8 10 121214161820222426

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 12-point prediction

(f )

Figure 7 Prediction of the results of different methods

Journal of Electrical and Computer Engineering 11

for host 237 +e EEMD-ARIMA method has a smallernegative error than the EEMD-RT-ARIMA method Forinstance the EEMD-RT-ARIMA method obtains negativeerrors of 1137 and 892 for 6-point and 12-point pre-dictions while the EEMD-ARIMAmethod only has negativeerrors of 462 and 800

63 Time-Cost Analysis To verify the applicability of ourmethod we further compare the time cost of these methodsin Figure 8 +e running time of the EEMD-ARIMAmethod is the largest by over 180 s while the ARIMAmodel takes the least time at less than 50 s +e EEMD-RT-ARIMA method time cost is between 70 s and 117 s whichdecreases the time cost by 40ndash80 compared with theEEMD-ARIMA method For example the running time ofthe EEMD-RT-ARIMA method is 6937 s far less than the33720 s of the EEMD-ARIMA method for the 6-pointprediction of host 22 Our method saves up to 80 of thetime cost For the CPU utilization sequence of host 237with strong variability it requires 19046 s to predict thefuture 6-point values using the EEMD-ARIMA methodwhile it only takes 11364 s using the EEMD-RT-ARIMAmethod +e running time is reduced by approximately40 Considering the prediction accuracy effectivenessand time cost our EEMD-RT-ARIMA method is morecost-effective for short-term host utilization prediction incloud computing

7 Conclusions

Host utilization is an indicator of host performancewhose prediction can promote effective resource sched-uling in cloud computing However host utilizationdemonstrates strong randomness and instability caused byusersrsquo random and various resource demands It is difficultto improve prediction accuracy In this paper we proposea hybrid and cost-effective method EEMD-RT-ARIMAfor short-term host utilization prediction in cloud com-puting +e EEMD method is first used to decompose thenonstationary host utilization sequence into a few rela-tively stationary IMF components and an R component+en we calculate the correlation coefficient between eachIMF component and the original data to select efficientIMF components and use RTvalues and average periods toreconstruct these components into three new componentsto reduce error accumulation and time cost Finally three

new components are predicted by the ARIMA model andtheir prediction results are superposed to form the overallprediction results We use the real host utilization tracesfrom a cloud platform to conduct the experiments andcompare our EEMD-RT-ARIMA method with theARIMA model and EEMD-ARIMA method in terms oferror effectiveness and time-cost analysis +e resultsshow that our method is cost-effective and is more suitablefor short-term host utilization prediction in cloudcomputing

Data Availability

+e running example and experimental data used to supportthe findings of this study have been deposited in the Figsharerepository (httpsdoiorg106084m9figshare7679594)

Conflicts of Interest

+e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

+is work was supported by the Shandong ProvincialNatural Science Foundation (ZR2016FM41)

Table 4 Positive and negative error analysis

HostID

Predictionlength

ARIMA EEMD-ARIMA EEMD-RT-ARIMAPositive error

()Negative error

()Positive error

()Negative error

()Positive error

()Negative error

()

839 6-point Null 977 316 585 492 41512-point Null 1378 343 1071 480 474

22 6-point Null 1066 391 670 412 63412-point Null 972 426 609 539 505

237 6-point 039 2668 1431 462 1381 113712-point 039 2751 1744 800 1790 892

0

50

100

150

200

250

300

350

12-point

Tim

e cos

t (s)

ARIMAEEMD-ARIMA EEMD-RT-ARIMA

6-pointHost 839

12-point6-pointHost 22

12-point6-pointHost 237

Figure 8 Time cost of different prediction methods

12 Journal of Electrical and Computer Engineering

References

[1] J J Prevost K Nagothu B Kelley and M Jamshidi ldquoPre-diction of cloud data center networks loads using stochasticand neural modelsrdquo in Proceedings of 2011 6th InternationalConference on System of Systems Engineering pp 276ndash281IEEE Albuquerque NM USA June 2011

[2] M Borkowski S Schulte and C Hochreiner ldquoPredictingcloud resource utilizationrdquo in Proceedings of 9th InternationalConference on Utility and Cloud Computing (UCC) pp 37ndash42IEEE Shanghai China December 2016

[3] M Barati and S Sharifian ldquoA hybrid heuristic-based tunedsupport vector regression model for cloud load predictionrdquoJournal of Supercomputing vol 71 no 11 pp 4235ndash42592015

[4] Z Chen Y Zhu Y Di S Feng and J Geng ldquoA high-accuracyself-adaptive resource demands predicting method in IaaScloud environmentrdquo Neural Network World vol 25 no 5pp 519ndash540 2015

[5] J Chen and Y Wang ldquoA resource demand prediction methodbased on EEMD in cloud computingrdquo Procedia ComputerScience vol 131 pp 116ndash123 2018

[6] H Toumi Z Brahmi Z Benarfa and M M GammoudildquoServer load prediction using stream miningrdquo in Proceedingsof 2017 International Conference on Information Networking(ICOIN) pp 653ndash661 IEEE Da Nang Vietnam January2017

[7] S Di D Kondo and W Cirne ldquoHost load prediction in aGoogle compute cloud with a Bayesian modelrdquo in Proceedingsof 2012 International Conference on High PerformanceComputing Networking Storage and Analysis (SCrsquo12)pp 1ndash11 IEEE Salt Lake City UT USA November 2012

[8] N K Gondhi and P Kailu ldquoPrediction based energy efficientvirtual machine consolidation in cloud computingrdquo in Pro-ceedings of 2015 Second International Conference on Advancesin Computing and Communication Engineering pp 437ndash441IEEE Dehradun India May 2015

[9] A Verma G Dasgupta T K Nayak P De and R KotharildquoServer workload analysis for power minimization usingconsolidationrdquo in Proceedings of the 2009 Conference onUSENIX Annual Technical Conference p 28 USENIX As-sociation San Diego CA USA June 2009

[10] B Song Y Yu Y Zhou Z Wang and S Du ldquoHost loadprediction with long short-term memory in cloud comput-ingrdquo Journal of Supercomputing vol 74 no 12 pp 6554ndash6568 2018

[11] J-J Jheng F-H Tseng H-C Chao and L-D Chou ldquoA novelVM workload prediction using grey forecasting model incloud data centerrdquo in Proceedings of International Conferenceon Information Networking 2014 (ICOIN2014) pp 40ndash45IEEE Phuket +ailand February 2014

[12] A Beloglazov and R Buyya ldquoManaging overloaded hosts fordynamic consolidation of virtual machines in cloud datacenters under quality of service constraintsrdquo IEEE Trans-actions on Parallel and Distributed Systems vol 24 no 7pp 1366ndash1379 2013

[13] M Dabbagh B Hamdaoui M Guizani and A Rayes ldquoAnenergy-efficient VM prediction and migration framework forovercommitted cloudsrdquo IEEE Transactions on Cloud Com-puting vol 6 no 4 pp 955ndash966 2018

[14] D Minarolli A Mazrekaj and B Freisleben ldquoTackling un-certainty in long-term predictions for host overload andunderload detection in cloud computingrdquo Journal of CloudComputing vol 6 no 1 p 4 2017

[15] K Mason M Duggan E Barrett J Duggan and E HowleyldquoPredicting host CPU utilization in the cloud using evolu-tionary neural networksrdquo Future Generation Computer Sys-tems vol 86 pp 162ndash173 2018

[16] D Magalhatildees R N Calheiros R Buyya and D G GomesldquoWorkload modeling for resource usage analysis and simu-lation in cloud computingrdquo Computers amp Electrical Engi-neering vol 47 pp 69ndash81 2015

[17] C Tian Y Wang Y Luo et al ldquoMinimizing content re-organization and tolerating imperfect workload prediction forcloud-based video-on-demand servicesrdquo IEEE Transactionson Services Computing vol 9 no 6 pp 926ndash939 2016

[18] M Verma G R Gangadharan V Ravi and N NarendraldquoResource demand prediction in multi-tenant service cloudsrdquoin Proceedings of 2013 IEEE International Conference on CloudComputing in Emerging Markets (CCEM) pp 1ndash8 IEEEBangalore India October 2013

[19] W Zhang Y Shi L Liu L Cui and Y Zheng ldquoPerformanceand resource prediction at high utilization for n-tier servicesystems in cloud an experiment driven approachrdquo in Pro-ceedings of 2015 IEEE International Conference on Computerand Information Technology Ubiquitous Computing andCommunications Dependable Autonomic and Secure Com-puting Pervasive Intelligence and Computing pp 843ndash848IEEE Liverpool UK October 2015

[20] G Kecskemeti A Kertesz and Z Nemeth ldquoCloud workloadprediction by means of simulationsrdquo in Proceedings of theComputing Frontiers Conference on ZZZ (CFrsquo17) pp 279ndash282ACM Siena Italy May 2017

[21] Y Chen and Z-A Jiang ldquoDynamically predicting the qualityof service batch online and hybrid algorithmsrdquo Journal ofElectrical and Computer Engineering vol 2017 Article ID9547869 10 pages 2017

[22] A Khan X Yan S Tao and N Anerousis ldquoWorkloadcharacterization and prediction in the cloud a multiple timeseries approachrdquo in Proceedings of 2012 IEEE Network Op-erations and Management Symposium pp 1287ndash1294 IEEEMaui HI USA April 2012

[23] A K Mishra J L Hellerstein W Cirne and C R DasldquoTowards characterizing cloud backend workloadsrdquo ACMSIGMETRICS Performance Evaluation Review vol 37 no 4pp 34ndash41 2010

[24] D Gmach J Rolia L Cherkasova and A KemperldquoWorkload analysis and demand prediction of enterprise datacenter applicationsrdquo in Proceedings of 2007 IEEE 10th In-ternational Symposium on Workload Characterizationpp 171ndash180 IEEE Boston MA USA September 2007

[25] F-H Tseng X Wang L-D Chou H-C Chao andV C M Leung ldquoDynamic resource prediction and allocationfor cloud data center using the multiobjective genetic algo-rithmrdquo IEEE Systems Journal vol 12 no 2 pp 1688ndash16992018

[26] G K Shyam and S S Manvi ldquoVirtual resource prediction incloud environment a Bayesian approachrdquo Journal of Networkand Computer Applications vol 65 pp 144ndash154 2016

[27] Y Lu J Panneerselvam L Liu and Y Wu ldquoRVLBPNN aworkload forecasting model for smart cloud computingrdquoScientific Programming vol 2016 Article ID 5635673 9 pages2016

[28] K Rajaram and M P Malarvizhi ldquoUtilization based pre-diction model for resource provisioningrdquo in Proceedings of2017 International Conference on Computer Communicationand Signal Processing (ICCCSP) pp 1ndash6 IEEE ChennaiIndia January 2017

Journal of Electrical and Computer Engineering 13

[29] L Li and A Zhang ldquoResource demand optimization com-bined prediction under cloud computing environment basedon IOWGA operatorrdquo International Journal of Grid andDistributed Computing vol 8 no 3 pp 77ndash86 2015

[30] D Minarolli and B Freisleben ldquoCross-correlation predictionof resource demand for virtual machine resource allocation incloudsrdquo in Proceedings of 2014 Sixth International Conferenceon Computational Intelligence Communication Systems andNetworks pp 119ndash124 IEEE Tetova Macedonia May 2014

[31] W Zhang P Duan L T Yang et al ldquoResource requestsprediction in the cloud computing environment with a deepbelief networkrdquo Software Practice and Experience vol 47no 3 pp 473ndash488 2017

[32] H-B Mi H-M Wang G Yin D-X Shi Y-F Zhou andL Yuan ldquoResource on-demand reconfiguration method forvirtualized data centersrdquo Journal of Software vol 22 no 9pp 2193ndash2205 2011

[33] R N Calheiros E Masoumi R Ranjan and R BuyyaldquoWorkload prediction using ARIMAmodel and its impact oncloud applicationsrsquo QoSrdquo IEEE Transactions on CloudComputing vol 3 no 4 pp 449ndash458 2015

[34] Y Meng R Rao X Zhang and P Hong ldquoCRUPA a con-tainer resource utilization prediction algorithm for auto-scaling based on time series analysisrdquo in Proceedings of2016 International Conference on Progress in Informatics andComputing (PIC) pp 468ndash472 IEEE Shanghai China De-cember 2016

[35] E Dhib N Zangar N Tabbane and K Boussetta ldquoImpact ofseasonal ARIMA workload prediction model on QoE formassively multiplayers online gamingrdquo in Proceedings of 20165th International Conference on Multimedia Computing andSystems (ICMCS) pp 737ndash741 IEEE Marrakech MoroccoSeptember 2016

[36] A Ganapathi Y Chen A Fox R Katz and D PattersonldquoStatistics-driven workload modeling for the cloudrdquo inProceedings of 2010 IEEE 26th International Conference onData EngineeringWorkshops (ICDEW 2010) pp 87ndash92 IEEELong Beach CA USA March 2010

[37] V G Tran V Debusschere and S Bacha ldquoHourly serverworkload forecasting up to 168 hours ahead using seasonalARIMA modelrdquo in Proceedings of 2012 IEEE InternationalConference on Industrial Technology pp 1127ndash1131 IEEEAthens Greece March 2012

[38] D Xu S Yang and H Luo ldquoResearch on generalized fuzzysoft sets theory based combined model for demanded cloudcomputing resource predictionrdquo Chinese Journal of Man-agement Science vol 23 no 5 pp 56ndash64 2015

[39] S Li Y Wang X Qiu D Wang and L Wang ldquoA workloadprediction-basedmulti-VM provisioningmechanism in cloudcomputingrdquo in Proceedings of 2013 15th Asia-Pacific NetworkOperations andManagement Symposium (APNOMS) pp 1ndash6IEEE Hiroshima Japan September 2013

[40] X Fu and C Zhou ldquoPredicted affinity based virtual machineplacement in cloud computing environmentsrdquo IEEE Trans-actions on Cloud Computing vol 99 p 1 2017

[41] Y Jiang C Perng T Li and R Chang ldquoASAP a self-adaptiveprediction system for instant cloud resource demand pro-visioningrdquo in Proceedings of 2011 IEEE 11th InternationalConference on Data Mining pp 1104ndash1109 IEEE VancouverBC Canada December 2011

[42] Z Wu and N E Huang ldquoEnsemble empirical mode de-composition a noise-assisted data analysis methodrdquo Ad-vances in Adaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[43] H Zang L Fan M Guo Z Wei G Sun and L ZhangldquoShort-term wind power interval forecasting based on anEEMD-RT-RVMmodelrdquo Advances in Meteorology vol 2016Article ID 8760780 10 pages 2016

[44] N Safari C Y Chung and G C D Price ldquoNovel multi-stepshort-term wind power prediction framework based onchaotic time series analysis and singular spectrum analysisrdquoIEEE Transactions on Power Systems vol 33 no 1 pp 590ndash601 2018

[45] X Chen H Wang J Huang and H Ren ldquoAPU degradationprediction based on EEMD and Gaussian process regressionrdquoin Proceedings of 2017 International Conference on SensingDiagnostics Prognostics and Control (SDPC) pp 98ndash104IEEE Shanghai China August 2017

[46] C Yan C Yi XWu et al ldquoTurbine fault trend prediction thatbased on EEMD and ARIMA modelsrdquo Journal of GansuSciences vol 28 no 4 pp 100ndash106 2016

[47] M Jin P Li L Zhang et al ldquoA signal feature method and itsapplication based on EEMD fuzzy entropy and GK cluster-ingrdquo ACTA Metrologica Sinica vol 26 no 5 pp 501ndash5052015

[48] E H Norden Z Shen R L Steven et al ldquo+e empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysisrdquoin Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 no 1971 pp 903ndash995 RoyalSociety London UK March 1998

[49] Alibaba cluster-trace-v2017 httpsgithubcomalibabaclusterdata

14 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: A Hybrid Method for Short-Term Host Utilization Prediction ...downloads.hindawi.com/journals/jece/2019/2782349.pdf · terminesthe migratedVMs according to powersavings and workloadbalance.Dabbaghetal.[13]proposedaprediction

features +us the prediction error accumulation and theprediction time of the EEMD-ARIMA method can be re-duced by reducing the number of components

+e average period reflects the frequency of host utili-zation variation +ere exists a reciprocal relation betweenthem +e smaller the average period the higher the fre-quency If the average periods of IMF components are closerthey are closer in frequency +e average period is calculatedby the following formula in which n is the number of thetraining set and lj is the number of extrema

Tj n

lj (6)

Similarly the RT value reflects the trend of amplitudefluctuation If the RTvalue is larger the amplitude volatility isstronger If the RT values of the two IMFs are closer theoverall trend of the two IMFs is similar in amplitude volatility

To enhance the prediction accuracy and reduce theprediction time of the EEMD-ARIMA method we re-construct the IMF components and the R component intothree new components according to their average periodsand RT values in the EEMD-RT-ARIMA method Becausethe average period and RT value have different units wenormalize the average period Tj as follows

Tnj Tj minusTmin

Tmax minusTmin (7)

where Tnj denotes the normalized average period of theIMFj component Tmax and Tmin represent the maximumandminimum of the average periods of all IMF componentsSimilarly the RT value Rj can be normalized as follows

Rnj Rj minusRmin

Rmax minusRmin (8)

where Rnj is the normalized value of Rj Rmax and Rmin arethe maximum and minimum of all RT values +us thereconstruction factor (RF) is defined as follows

Fj α middot 1minusTnj1113872 1113873 + β middot Rnj (9)

An IMF component is higher in frequency and strongerin volatility and its RF value is greater If the RF values of thetwo IMF components are closer their overall trends aremore similar +us they can be reconstructed into a newcomponent All efficient IMF and R components arereconstructed into three new components high-frequencyand strong-volatility component medium-frequency andweak-volatility component and low-frequency trend com-ponent +e high-frequency and strong-volatility compo-nent reflects the strong volatility and randomness of thehigh-frequency part of the original host utilization sequence+e medium-frequency and weak-volatility componentshows the detailed features of the volatility of the originalhost utilization sequence +e low-frequency trend com-ponent depicts the overall trend of the volatility of theoriginal host utilization sequence

Table 1 shows the RT values average periods and RFvalues of efficient IMF and R components +e RF values ofIMF1 and IMF2 are large and relatively close while the RFvalues of IMF8 and R are equal to 0 +e RF values ofIMF3ndashIMF5 are close +erefore we reconstructed IMF1-IMF2 IMF3ndashIMF5 and IMF8-R into three new compo-nents as shown in Figures 5(a)ndash5(c) +ey separately reflectthe randomness the fluctuation details and the overall trendof the original host utilization sequence

44 Use of the ARIMA Model to Predict the Future HostUtilization We use the ARIMA model to predict the futureresults for each new component+en the overall predictionresults are obtained by superposing the prediction results ofeach new component+e ARIMA prediction is described asfollows (Algorithm 1)

For example we assume that three new components ChCm and Cl are obtained which represent the high-frequencyand strong-volatility component medium-frequency andweak-volatility component and low-frequency trend com-ponent respectively +en we use the ARIMA method topredict the future 24-point values for each new component+e prediction results Ph Pm and Pl of three new com-ponents can be described in the following formulas each ofwhich includes the values of the predicting 24-point data

Ph f1h f

2h f

24h1113872 1113873

Pm f1m f

2m f

24m1113872 1113873

Pl f1l f

2l f

24l1113872 1113873

(10)

Finally we calculate the overall prediction result P bysuperposing the prediction results of each new componentas follows

P Ph + Pm + Pl (11)

From this process of the EEMD-RT-ARIMAmethod wefind that the number of components decreases from 9 to 3which can reduce the total prediction time and the erroraccumulation of the component prediction compared withthe EEMD-ARIMA method

0 50 100 150 200 250 300 350 400 450 500 550 600 650 7005

10

15

20

25

30

Testing set

CPU

util

izat

ion

()

Sample data

Training set

Figure 3 CPU utilization trace of a physical host

6 Journal of Electrical and Computer Engineering

5 Experimental Setup

We conduct an experiment to evaluate our method +eexperimental dataset and measurement metrics are in-troduced as follows

51 Experimental Dataset Host utilization mainly involvesin CPU utilization memory utilization network utilizationand disk utilization In this paper we mainly focus on hostCPU utilization We randomly select CPU utilization tracesof 7 physical hosts from the dataset released by Alibaba inAugust 2017 [49] each of which includes 144 points(5minutes per point) +ese traces are all time-dependentsequences as shown in Figures 6(a)ndash6(g)

Each sequence is divided into a training set and a testingset We first use the training set to predict the future dataand then these predicting data are compared with thoseactual data in the testing set to evaluate our method In thispaper each training set is set as the first 120 points and thetesting set is defined as the subsequent points such as 6points 12 points and 24 points We set the number of it-erations k 50 and the standard deviation d 02 in EEMDdecomposition

52MeasurementMetrics We evaluate our method in termsof error effectiveness and time-cost analysis as follows

521 Error Analysis To evaluate our method we use themean absolute percentage error (MAPE) to reflect theprediction accuracy MAPE is defined as follows

MAPE 1m

1113944

m

i1

xfi minusxt

i

xti

⎛⎝ ⎞⎠lowast100 (12)

where xfi denotes the value of the prediction point xt

i

denotes the actual value in the testing set and m denotes the

0 100 200 300 400 500 600 700

ndash2

0

2

IMF1

CPU

util

izat

ion

()

(a)

0 100 200 300 400 500 600 700IMF2

ndash2

0

2

CPU

util

izat

ion

()

(b)

0 100 200 300 400 500 600 700IMF3

ndash2

0

2

CPU

util

izat

ion

()

(c)

0 100 200 300 400 500 600 700IMF4

ndash2

0

2

CPU

util

izat

ion

()

(d)

0 100 200 300 400 500 600 700IMF5

CPU

util

izat

ion

()

ndash2

0

2

(e)

0 100 200 300 400 500 600 700IMF6

CPU

util

izat

ion

()

ndash2

0

2

(f )

0 100 200 300 400 500 600 700IMF7

CPU

util

izat

ion

()

ndash2

0

2

(g)

0 100 200 300 400 500 600 700IMF8

CPU

util

izat

ion

()

ndash2

0

2

(h)

0 100 200 300 400 500 600 700R

CPU

util

izat

ion

()

101214161820

(i)

Figure 4 Decomposition results of EEMD

Table 1 RT average periods and RF of efficient IMF and Rcomponents

IMF1 IMF2 IMF3 IMF4 IMF5 IMF8 RRT 431 181 96 47 21 5 2Average period 212 831 1641 3059 673 673 673RF 1 070 060 053 047 0 0

Journal of Electrical and Computer Engineering 7

number of prediction points It is obvious that the predictionaccuracy is higher when the MAPE is lower

522 Effectiveness Analysis Host utilization under-prediction or overprediction can lead to resource under-provision or overprovision Resource underprovisioncannot guarantee applicationsrsquo QoS while resource over-provision can cause resource waste and low resource utili-zation +erefore a good prediction method should avoidunderprediction and overprediction In particular under-prediction should be avoided as much as possible because itresults in a lower QoS to users

We set up the positive and negative errors to reflect theoverprediction and underprediction and then use them toevaluate the effectiveness of our method A good predictionmethod should have a smaller negative error to avoidunderprediction+e positive and negative prediction errorsare calculated by the following formula where pi is thepredicting data ri is the actual data and m is the number ofunderprediction data (ie negative deviation) or over-prediction data (ie positive deviation)

ei 1m

1113944

m

i1pi minus ri

11138681113868111386811138681113868111386811138681113868 (13)

523 Time Cost Analysis Host utilization varies veryquickly in a cloud data center If host utilization prediction isslower than the determination of VM migration resourceprovision will be delayed which can cause poor QoS +ushost utilization prediction must be completed in a timelymanner To investigate the time cost of our proposedmethod we test the running time of the EEMD-RT-ARIMAmethod and compared it with other prediction methodsaccording to the following index tc

tc tour minus tother

totherlowast100 (14)

where tour indicates the running time of our method EEMD-RT-ARIMA and tother represents the running time of othermethods such as the ARIMA model or the EEMD-ARIMA

method tc denotes the percent of the reduced or increasedtime cost

6 Experimental Results and Analysis

To validate the prediction effectiveness of our EEMD-RT-ARIMA method we conduct experiments on ARIMAEEMD-ARIMA and EEMD-RT-ARIMA methods andcompare their predictive results All experiments wereperformed on a PC with 25GHz Intel (R) i7 CPU runningMATLAB To make three methods comparable we use thesame original dataset to execute it 5 times for each method+e mean values of the prediction results are shown in thefollowing tables and figures

61 Error Analysis Table 2 shows the MAPE values of hostutilization predictions for 7 physical hosts We can see thatEEMD-ARIMA and EEMD-RT-ARIMA methods havelowerMAPE values than ARIMAmodels for 6-point and 12-point predictions For example EEMD-ARIMA and EEMD-RT-ARIMA methods achieve MAPE values of 606 and505 for the 6-point prediction of host 109 while theARIMA model has a far higher MAPE value (up to 1685)+ey obtain MAPE values of 1013 and 546 for the 12-point prediction of host 109 while the ARIMA model ob-tains 1108 For host 22 both the EEMD-ARIMA andEEMD-RT-ARIMA methods obtain far lower MAPE valuesof 531 and 542 than the 1066 of the ARIMA modelfor 6-point prediction Similarly they also obtain bettereffectiveness on 12-point prediction +e same situation alsoexists in 6-point and 12-point predictions of other hosts+is indicates that both EEMD-ARIMA and EEMD-RT-ARIMA methods have higher prediction accuracy thanARIMAmodels in 6-point and 12-point predictions for hostutilization EEMD reduces the inherent volatility of the hostutilization sequence which improves the prediction accu-racy of the EEMD-ARIMA and EEMD-RT-ARIMAmethods However the situation changes in 24-point pre-diction +e MAPE values of hosts 1162 424 1060 and 237are all over 30 using these three methods Although theEEMD-RT-ARIMA method has lower MAPE values than

0 100 200 300 400 500 600 700ndash3ndash2ndash1

0123

CPU

util

izat

ion

()

(a)

0 100 200 300 400 500 600 700ndash3ndash2ndash1

012

CPU

util

izat

ion

()

(b)

0 100 200 300 400 500 600 700101214161820

CPU

util

izat

ion

()

(c)

Figure 5 New components (a) High-frequency and strong-volatility component (b) Medium-frequency and weak-volatility component(c) Low-frequency trend component

8 Journal of Electrical and Computer Engineering

the ARIMA and EEMD-ARIMA methods for hosts 839 and109 it has far higherMAPE values in the 24-point predictionthan those of 6-point and 12-point predictions +is showsthat the EEMD-ARIMA and EEMD-RT-ARIMA methodsare not suitable for long-term but suitable for short-termprediction

For further analysis we find that the EEMD-RT-ARIMAmethod achieves lower prediction error than the EEMD-ARIMA method for the 6-point and 12-point predictions ofhosts 839 109 and 1162 although the EEMD-RT-ARIMAmethod only selects efficient IMF components However itis the opposite for hosts 22 424 1060 and 237 +e original

CPU utilization sequences of all physical hosts are identicalin frequency so we calculate the RT value of each CPUutilization sequence shown in Table 3 Hosts 839 109 and1162 achieve lower RT values under 10 which shows thattheir CPU utilization is more stationary than other hostsSmaller RT values indicate more stationary host utilizationsequences +is phenomenon can also be seen inFigures 6(a)ndash6(c) From Tables 2 and 3 it can be found thatthe EEMD-RT-ARIMA method achieves a lower MAPEvalue than the EEMD-ARIMA method if the RT value issmaller Conversely the EEMD-RT-ARIMA method has ahigher MAPE value than the EEMD-ARIMA method if the

(1) For each new component(2) Set the order of difference d 0(3) Execute the augmented Dickey-Fuller (ADF) test If it is a stationary time series go to step 5 else go to step 4 until it is stationary(4) Difference the time series and set d d + 1 go to step 3(5) Determine the order of the ARIMA model using Bayesian information criterion (BIC)(6) Estimate the parameters of the ARIMA model using the maximum likelihood(7) Forecast the future n values of this new component using the ARIMA model(8) End(9) Obtain the overall prediction results by superposing the prediction results of each new component

ALGORITHM 1 ARIMA prediction

0204060

CPU

utili

zatio

n(

)

(a)

0204060

CPU

utili

zatio

n(

)

(b)

0204060

CPU

utili

zatio

n(

)

(c)

0204060

CPU

utili

zatio

n(

)

(d)

CPU

utili

zatio

n(

)

0204060

(e)

CPU

utili

zatio

n(

)

0204060

(f )

CPU

utili

zatio

n(

)

0204060

(g)

Figure 6 CPU utilization traces of 7 physical hosts (a) Host 839 (b) Host 109 (c) Host 1162 (d) Host 22 (e) Host 424 (f ) Host 1060 (g)Host 237

Journal of Electrical and Computer Engineering 9

RTvalue is larger For instance hosts 839 109 and 1162 withsmaller RT values obtain lower MAPE values using theEEMD-RT-ARIMA method than the EEMD-ARIMAmethod while hosts 22 424 1060 and 237 with largerRT values obtain higher MAPE values using the EEMD-RT-ARIMA method than the EEMD-ARIMA method

Furthermore the difference in theMAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods decreaseswith the increase in the RTvalues from host 839 to host 1162+eir difference changes to negative from host 22 whichindicates that the EEMD-ARIMA method has higher pre-diction accuracy than the EEMD-RT-ARIMA method+en their difference becomes larger as the RT values in-crease +e 12-point host CPU utilization prediction illus-trates this situation For example host 839 with an RTvalueof 2 has a MAPE of 503 for 12-point prediction by usingthe EEMD-RT-ARIMA method which is 545 lower thanthe 1048 of the EEMD-ARIMA method +e MAPE valueof the EEMD-RT-ARIMA method is only 419 lower thanthat of the EEMD-ARIMAmethod for host 1162 with an RTvalue of 10 For host 22 with an RT value of 16 the EEMD-RT-ARIMA method has a slightly higher MAPE of 537than 533 of the EEMD-ARIMAmethodWith the increaseof the RT value the differences of MAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods furtherincrease to 087 196 and 463 for hosts 424 1060 and237 respectively+is indicates that the EEMD-RT-ARIMAmethod is less effective than the EEMD-ARIMA method inCPU utilization prediction for these hosts Undoubtedlythe ARIMA prediction of each component decomposed by

the EEMD method generates a certain error +e super-position of the prediction results of each component causeserror accumulation +e EEMD-RT-ARIMA method re-duces the error accumulation by selecting and recon-structing the efficient IMF components into fewercomponents so it achieves better prediction accuracy thanthe EEMD-ARIMA method for hosts 839 109 and 1162Certainly the selection and reconstruction of efficient IMFcomponents also cause a certain prediction error due to theabsence of nonefficient components especially for non-stationary host utilization sequences When this kind ofprediction error exceeds the error accumulation of ARIMAprediction of all components in the EEMD-ARIMAmethod the EEMD-RT-ARIMA method is no more ef-fective than the EEMD-ARIMA method for the non-stationary CPU utilization prediction of some hosts suchas hosts 22 424 1060 and 237

62 Effectiveness Analysis To verify the effectiveness of ourmethod in short-term prediction we select the experimentalresults of hosts 839 22 and 237 with the minimum middleand maximum RT values for further analysis Figure 7 showsthe prediction results of the EEMD-RT-ARIMAARIMA andEEMD-ARIMA methods We find that the future resourceutilization of host 839 decreases below 11 According to apredefined policy host 839 is underloaded and can be closedto save energy Figure 7 shows that our method is moreaccurate and effective than the ARIMA model In particularour method tends to change with the trend of data variationwhile the ARIMAmodel cannot keep up with it Our methodis more suitable for handling nonstationary time series thanthe ARIMAmodel Additionally the predicting data using theEEMD-RT-ARIMAmethod are closer to the testing data thanthose of the EEMD-ARIMA method for host 839 +ese

Table 2 MAPE values of host utilization prediction

Host ID Prediction length ARIMA () EEMD-ARIMA () EEMD-RT-ARIMA ()

8396-point 977 473 46112-point 1378 1048 50324-point 2310 2153 882

1096-point 1685 606 50512-point 1108 1013 54624-point 4134 2468 836

11626-point 2445 776 64512-point 1781 1314 89524-point 2193 4141 3345

226-point 1066 531 54212-point 822 533 53724-point 1141 1844 1477

4246-point 3754 1612 176512-point 2174 1659 174624-point 8868 7739 1157

10606-point 3571 1005 139412-point 3760 1744 194024-point 7472 5482 9193

2376-point 2229 785 117812-point 2551 1130 159324-point 12611 15843 18951

Table 3 RT values of each host utilizationHost ID 839 109 1162 22 424 1060 237RT value 2 6 10 16 22 24 38

10 Journal of Electrical and Computer Engineering

results show that the EEMD-RT-ARIMA method is moreeffective than the EEMD-ARIMAmethod for CPU utilizationsequences with weak fluctuations When the host utilizationsequence shows stronger fluctuation the absence of non-efficient IMF components will greatly influence the predictionresults +e EEMD-RT-ARIMA method is no more effectivethan the EEMD-ARIMA method for CPU utilization pre-diction of host 237

To further analyze the effectiveness of our method wecalculate the positive and negative errors for 6-point and 12-point predictions of these hosts shown in Table 4 When thenegative error is smaller the prediction method is moresuitable for cloud resource provision because of avoidingunderprediction It can be observed that most of the pre-diction results of the ARIMAmodel are underpredicted (thecells of positive error are all ldquonullrdquo for hosts 839 and 22)Furthermore the negative errors of the ARIMA model are

all far higher than those of other methods for host 237 Forinstance the ARIMA model has a high negative error of upto 2751 for the 12-point prediction of host 237 while theEEMD-ARIMA and EEMD-RT-ARIMA methods only havenegative errors of 800 and 892 respectively If theARIMA model is used to predict future host utilization itcan cause resource underprovision which cannot ensureapplicationsrsquo QoS +e EEMD-RT-ARIMA method achievessmaller negative errors than the EEMD-ARIMA method forhosts 839 and 22 while it has a larger negative error for hosts237 For instance the EEMD-ARIMA method achieves thenegative error of 1071 for the 12-point prediction of host839 while EEMD-RT-ARIMA only achieves the negativeerror of 474 Similarly the EEMD-ARIMA method ob-tains a negative error of 609 for the 12-point prediction ofhost 22 while the EEMD-RT-ARIMA method achieves alower value of only 505 However the situation changes

6789

101112

0 1 2 3 4 5 6 7Host 839 6-point prediction

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(a)

6789

101112

0 2 4 6 8 10 12

CPU

util

izat

ion

()

Host 839 12-point prediction

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(b)

262830323436

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 6-point prediction

(c)

262830323436

0 2 4 6 8 10 12CP

U u

tiliz

atio

n (

)

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 12-point prediction

(d)

1214161820222426

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 6-point prediction

(e)

0 2 4 6 8 10 121214161820222426

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 12-point prediction

(f )

Figure 7 Prediction of the results of different methods

Journal of Electrical and Computer Engineering 11

for host 237 +e EEMD-ARIMA method has a smallernegative error than the EEMD-RT-ARIMA method Forinstance the EEMD-RT-ARIMA method obtains negativeerrors of 1137 and 892 for 6-point and 12-point pre-dictions while the EEMD-ARIMAmethod only has negativeerrors of 462 and 800

63 Time-Cost Analysis To verify the applicability of ourmethod we further compare the time cost of these methodsin Figure 8 +e running time of the EEMD-ARIMAmethod is the largest by over 180 s while the ARIMAmodel takes the least time at less than 50 s +e EEMD-RT-ARIMA method time cost is between 70 s and 117 s whichdecreases the time cost by 40ndash80 compared with theEEMD-ARIMA method For example the running time ofthe EEMD-RT-ARIMA method is 6937 s far less than the33720 s of the EEMD-ARIMA method for the 6-pointprediction of host 22 Our method saves up to 80 of thetime cost For the CPU utilization sequence of host 237with strong variability it requires 19046 s to predict thefuture 6-point values using the EEMD-ARIMA methodwhile it only takes 11364 s using the EEMD-RT-ARIMAmethod +e running time is reduced by approximately40 Considering the prediction accuracy effectivenessand time cost our EEMD-RT-ARIMA method is morecost-effective for short-term host utilization prediction incloud computing

7 Conclusions

Host utilization is an indicator of host performancewhose prediction can promote effective resource sched-uling in cloud computing However host utilizationdemonstrates strong randomness and instability caused byusersrsquo random and various resource demands It is difficultto improve prediction accuracy In this paper we proposea hybrid and cost-effective method EEMD-RT-ARIMAfor short-term host utilization prediction in cloud com-puting +e EEMD method is first used to decompose thenonstationary host utilization sequence into a few rela-tively stationary IMF components and an R component+en we calculate the correlation coefficient between eachIMF component and the original data to select efficientIMF components and use RTvalues and average periods toreconstruct these components into three new componentsto reduce error accumulation and time cost Finally three

new components are predicted by the ARIMA model andtheir prediction results are superposed to form the overallprediction results We use the real host utilization tracesfrom a cloud platform to conduct the experiments andcompare our EEMD-RT-ARIMA method with theARIMA model and EEMD-ARIMA method in terms oferror effectiveness and time-cost analysis +e resultsshow that our method is cost-effective and is more suitablefor short-term host utilization prediction in cloudcomputing

Data Availability

+e running example and experimental data used to supportthe findings of this study have been deposited in the Figsharerepository (httpsdoiorg106084m9figshare7679594)

Conflicts of Interest

+e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

+is work was supported by the Shandong ProvincialNatural Science Foundation (ZR2016FM41)

Table 4 Positive and negative error analysis

HostID

Predictionlength

ARIMA EEMD-ARIMA EEMD-RT-ARIMAPositive error

()Negative error

()Positive error

()Negative error

()Positive error

()Negative error

()

839 6-point Null 977 316 585 492 41512-point Null 1378 343 1071 480 474

22 6-point Null 1066 391 670 412 63412-point Null 972 426 609 539 505

237 6-point 039 2668 1431 462 1381 113712-point 039 2751 1744 800 1790 892

0

50

100

150

200

250

300

350

12-point

Tim

e cos

t (s)

ARIMAEEMD-ARIMA EEMD-RT-ARIMA

6-pointHost 839

12-point6-pointHost 22

12-point6-pointHost 237

Figure 8 Time cost of different prediction methods

12 Journal of Electrical and Computer Engineering

References

[1] J J Prevost K Nagothu B Kelley and M Jamshidi ldquoPre-diction of cloud data center networks loads using stochasticand neural modelsrdquo in Proceedings of 2011 6th InternationalConference on System of Systems Engineering pp 276ndash281IEEE Albuquerque NM USA June 2011

[2] M Borkowski S Schulte and C Hochreiner ldquoPredictingcloud resource utilizationrdquo in Proceedings of 9th InternationalConference on Utility and Cloud Computing (UCC) pp 37ndash42IEEE Shanghai China December 2016

[3] M Barati and S Sharifian ldquoA hybrid heuristic-based tunedsupport vector regression model for cloud load predictionrdquoJournal of Supercomputing vol 71 no 11 pp 4235ndash42592015

[4] Z Chen Y Zhu Y Di S Feng and J Geng ldquoA high-accuracyself-adaptive resource demands predicting method in IaaScloud environmentrdquo Neural Network World vol 25 no 5pp 519ndash540 2015

[5] J Chen and Y Wang ldquoA resource demand prediction methodbased on EEMD in cloud computingrdquo Procedia ComputerScience vol 131 pp 116ndash123 2018

[6] H Toumi Z Brahmi Z Benarfa and M M GammoudildquoServer load prediction using stream miningrdquo in Proceedingsof 2017 International Conference on Information Networking(ICOIN) pp 653ndash661 IEEE Da Nang Vietnam January2017

[7] S Di D Kondo and W Cirne ldquoHost load prediction in aGoogle compute cloud with a Bayesian modelrdquo in Proceedingsof 2012 International Conference on High PerformanceComputing Networking Storage and Analysis (SCrsquo12)pp 1ndash11 IEEE Salt Lake City UT USA November 2012

[8] N K Gondhi and P Kailu ldquoPrediction based energy efficientvirtual machine consolidation in cloud computingrdquo in Pro-ceedings of 2015 Second International Conference on Advancesin Computing and Communication Engineering pp 437ndash441IEEE Dehradun India May 2015

[9] A Verma G Dasgupta T K Nayak P De and R KotharildquoServer workload analysis for power minimization usingconsolidationrdquo in Proceedings of the 2009 Conference onUSENIX Annual Technical Conference p 28 USENIX As-sociation San Diego CA USA June 2009

[10] B Song Y Yu Y Zhou Z Wang and S Du ldquoHost loadprediction with long short-term memory in cloud comput-ingrdquo Journal of Supercomputing vol 74 no 12 pp 6554ndash6568 2018

[11] J-J Jheng F-H Tseng H-C Chao and L-D Chou ldquoA novelVM workload prediction using grey forecasting model incloud data centerrdquo in Proceedings of International Conferenceon Information Networking 2014 (ICOIN2014) pp 40ndash45IEEE Phuket +ailand February 2014

[12] A Beloglazov and R Buyya ldquoManaging overloaded hosts fordynamic consolidation of virtual machines in cloud datacenters under quality of service constraintsrdquo IEEE Trans-actions on Parallel and Distributed Systems vol 24 no 7pp 1366ndash1379 2013

[13] M Dabbagh B Hamdaoui M Guizani and A Rayes ldquoAnenergy-efficient VM prediction and migration framework forovercommitted cloudsrdquo IEEE Transactions on Cloud Com-puting vol 6 no 4 pp 955ndash966 2018

[14] D Minarolli A Mazrekaj and B Freisleben ldquoTackling un-certainty in long-term predictions for host overload andunderload detection in cloud computingrdquo Journal of CloudComputing vol 6 no 1 p 4 2017

[15] K Mason M Duggan E Barrett J Duggan and E HowleyldquoPredicting host CPU utilization in the cloud using evolu-tionary neural networksrdquo Future Generation Computer Sys-tems vol 86 pp 162ndash173 2018

[16] D Magalhatildees R N Calheiros R Buyya and D G GomesldquoWorkload modeling for resource usage analysis and simu-lation in cloud computingrdquo Computers amp Electrical Engi-neering vol 47 pp 69ndash81 2015

[17] C Tian Y Wang Y Luo et al ldquoMinimizing content re-organization and tolerating imperfect workload prediction forcloud-based video-on-demand servicesrdquo IEEE Transactionson Services Computing vol 9 no 6 pp 926ndash939 2016

[18] M Verma G R Gangadharan V Ravi and N NarendraldquoResource demand prediction in multi-tenant service cloudsrdquoin Proceedings of 2013 IEEE International Conference on CloudComputing in Emerging Markets (CCEM) pp 1ndash8 IEEEBangalore India October 2013

[19] W Zhang Y Shi L Liu L Cui and Y Zheng ldquoPerformanceand resource prediction at high utilization for n-tier servicesystems in cloud an experiment driven approachrdquo in Pro-ceedings of 2015 IEEE International Conference on Computerand Information Technology Ubiquitous Computing andCommunications Dependable Autonomic and Secure Com-puting Pervasive Intelligence and Computing pp 843ndash848IEEE Liverpool UK October 2015

[20] G Kecskemeti A Kertesz and Z Nemeth ldquoCloud workloadprediction by means of simulationsrdquo in Proceedings of theComputing Frontiers Conference on ZZZ (CFrsquo17) pp 279ndash282ACM Siena Italy May 2017

[21] Y Chen and Z-A Jiang ldquoDynamically predicting the qualityof service batch online and hybrid algorithmsrdquo Journal ofElectrical and Computer Engineering vol 2017 Article ID9547869 10 pages 2017

[22] A Khan X Yan S Tao and N Anerousis ldquoWorkloadcharacterization and prediction in the cloud a multiple timeseries approachrdquo in Proceedings of 2012 IEEE Network Op-erations and Management Symposium pp 1287ndash1294 IEEEMaui HI USA April 2012

[23] A K Mishra J L Hellerstein W Cirne and C R DasldquoTowards characterizing cloud backend workloadsrdquo ACMSIGMETRICS Performance Evaluation Review vol 37 no 4pp 34ndash41 2010

[24] D Gmach J Rolia L Cherkasova and A KemperldquoWorkload analysis and demand prediction of enterprise datacenter applicationsrdquo in Proceedings of 2007 IEEE 10th In-ternational Symposium on Workload Characterizationpp 171ndash180 IEEE Boston MA USA September 2007

[25] F-H Tseng X Wang L-D Chou H-C Chao andV C M Leung ldquoDynamic resource prediction and allocationfor cloud data center using the multiobjective genetic algo-rithmrdquo IEEE Systems Journal vol 12 no 2 pp 1688ndash16992018

[26] G K Shyam and S S Manvi ldquoVirtual resource prediction incloud environment a Bayesian approachrdquo Journal of Networkand Computer Applications vol 65 pp 144ndash154 2016

[27] Y Lu J Panneerselvam L Liu and Y Wu ldquoRVLBPNN aworkload forecasting model for smart cloud computingrdquoScientific Programming vol 2016 Article ID 5635673 9 pages2016

[28] K Rajaram and M P Malarvizhi ldquoUtilization based pre-diction model for resource provisioningrdquo in Proceedings of2017 International Conference on Computer Communicationand Signal Processing (ICCCSP) pp 1ndash6 IEEE ChennaiIndia January 2017

Journal of Electrical and Computer Engineering 13

[29] L Li and A Zhang ldquoResource demand optimization com-bined prediction under cloud computing environment basedon IOWGA operatorrdquo International Journal of Grid andDistributed Computing vol 8 no 3 pp 77ndash86 2015

[30] D Minarolli and B Freisleben ldquoCross-correlation predictionof resource demand for virtual machine resource allocation incloudsrdquo in Proceedings of 2014 Sixth International Conferenceon Computational Intelligence Communication Systems andNetworks pp 119ndash124 IEEE Tetova Macedonia May 2014

[31] W Zhang P Duan L T Yang et al ldquoResource requestsprediction in the cloud computing environment with a deepbelief networkrdquo Software Practice and Experience vol 47no 3 pp 473ndash488 2017

[32] H-B Mi H-M Wang G Yin D-X Shi Y-F Zhou andL Yuan ldquoResource on-demand reconfiguration method forvirtualized data centersrdquo Journal of Software vol 22 no 9pp 2193ndash2205 2011

[33] R N Calheiros E Masoumi R Ranjan and R BuyyaldquoWorkload prediction using ARIMAmodel and its impact oncloud applicationsrsquo QoSrdquo IEEE Transactions on CloudComputing vol 3 no 4 pp 449ndash458 2015

[34] Y Meng R Rao X Zhang and P Hong ldquoCRUPA a con-tainer resource utilization prediction algorithm for auto-scaling based on time series analysisrdquo in Proceedings of2016 International Conference on Progress in Informatics andComputing (PIC) pp 468ndash472 IEEE Shanghai China De-cember 2016

[35] E Dhib N Zangar N Tabbane and K Boussetta ldquoImpact ofseasonal ARIMA workload prediction model on QoE formassively multiplayers online gamingrdquo in Proceedings of 20165th International Conference on Multimedia Computing andSystems (ICMCS) pp 737ndash741 IEEE Marrakech MoroccoSeptember 2016

[36] A Ganapathi Y Chen A Fox R Katz and D PattersonldquoStatistics-driven workload modeling for the cloudrdquo inProceedings of 2010 IEEE 26th International Conference onData EngineeringWorkshops (ICDEW 2010) pp 87ndash92 IEEELong Beach CA USA March 2010

[37] V G Tran V Debusschere and S Bacha ldquoHourly serverworkload forecasting up to 168 hours ahead using seasonalARIMA modelrdquo in Proceedings of 2012 IEEE InternationalConference on Industrial Technology pp 1127ndash1131 IEEEAthens Greece March 2012

[38] D Xu S Yang and H Luo ldquoResearch on generalized fuzzysoft sets theory based combined model for demanded cloudcomputing resource predictionrdquo Chinese Journal of Man-agement Science vol 23 no 5 pp 56ndash64 2015

[39] S Li Y Wang X Qiu D Wang and L Wang ldquoA workloadprediction-basedmulti-VM provisioningmechanism in cloudcomputingrdquo in Proceedings of 2013 15th Asia-Pacific NetworkOperations andManagement Symposium (APNOMS) pp 1ndash6IEEE Hiroshima Japan September 2013

[40] X Fu and C Zhou ldquoPredicted affinity based virtual machineplacement in cloud computing environmentsrdquo IEEE Trans-actions on Cloud Computing vol 99 p 1 2017

[41] Y Jiang C Perng T Li and R Chang ldquoASAP a self-adaptiveprediction system for instant cloud resource demand pro-visioningrdquo in Proceedings of 2011 IEEE 11th InternationalConference on Data Mining pp 1104ndash1109 IEEE VancouverBC Canada December 2011

[42] Z Wu and N E Huang ldquoEnsemble empirical mode de-composition a noise-assisted data analysis methodrdquo Ad-vances in Adaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[43] H Zang L Fan M Guo Z Wei G Sun and L ZhangldquoShort-term wind power interval forecasting based on anEEMD-RT-RVMmodelrdquo Advances in Meteorology vol 2016Article ID 8760780 10 pages 2016

[44] N Safari C Y Chung and G C D Price ldquoNovel multi-stepshort-term wind power prediction framework based onchaotic time series analysis and singular spectrum analysisrdquoIEEE Transactions on Power Systems vol 33 no 1 pp 590ndash601 2018

[45] X Chen H Wang J Huang and H Ren ldquoAPU degradationprediction based on EEMD and Gaussian process regressionrdquoin Proceedings of 2017 International Conference on SensingDiagnostics Prognostics and Control (SDPC) pp 98ndash104IEEE Shanghai China August 2017

[46] C Yan C Yi XWu et al ldquoTurbine fault trend prediction thatbased on EEMD and ARIMA modelsrdquo Journal of GansuSciences vol 28 no 4 pp 100ndash106 2016

[47] M Jin P Li L Zhang et al ldquoA signal feature method and itsapplication based on EEMD fuzzy entropy and GK cluster-ingrdquo ACTA Metrologica Sinica vol 26 no 5 pp 501ndash5052015

[48] E H Norden Z Shen R L Steven et al ldquo+e empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysisrdquoin Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 no 1971 pp 903ndash995 RoyalSociety London UK March 1998

[49] Alibaba cluster-trace-v2017 httpsgithubcomalibabaclusterdata

14 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: A Hybrid Method for Short-Term Host Utilization Prediction ...downloads.hindawi.com/journals/jece/2019/2782349.pdf · terminesthe migratedVMs according to powersavings and workloadbalance.Dabbaghetal.[13]proposedaprediction

5 Experimental Setup

We conduct an experiment to evaluate our method +eexperimental dataset and measurement metrics are in-troduced as follows

51 Experimental Dataset Host utilization mainly involvesin CPU utilization memory utilization network utilizationand disk utilization In this paper we mainly focus on hostCPU utilization We randomly select CPU utilization tracesof 7 physical hosts from the dataset released by Alibaba inAugust 2017 [49] each of which includes 144 points(5minutes per point) +ese traces are all time-dependentsequences as shown in Figures 6(a)ndash6(g)

Each sequence is divided into a training set and a testingset We first use the training set to predict the future dataand then these predicting data are compared with thoseactual data in the testing set to evaluate our method In thispaper each training set is set as the first 120 points and thetesting set is defined as the subsequent points such as 6points 12 points and 24 points We set the number of it-erations k 50 and the standard deviation d 02 in EEMDdecomposition

52MeasurementMetrics We evaluate our method in termsof error effectiveness and time-cost analysis as follows

521 Error Analysis To evaluate our method we use themean absolute percentage error (MAPE) to reflect theprediction accuracy MAPE is defined as follows

MAPE 1m

1113944

m

i1

xfi minusxt

i

xti

⎛⎝ ⎞⎠lowast100 (12)

where xfi denotes the value of the prediction point xt

i

denotes the actual value in the testing set and m denotes the

0 100 200 300 400 500 600 700

ndash2

0

2

IMF1

CPU

util

izat

ion

()

(a)

0 100 200 300 400 500 600 700IMF2

ndash2

0

2

CPU

util

izat

ion

()

(b)

0 100 200 300 400 500 600 700IMF3

ndash2

0

2

CPU

util

izat

ion

()

(c)

0 100 200 300 400 500 600 700IMF4

ndash2

0

2

CPU

util

izat

ion

()

(d)

0 100 200 300 400 500 600 700IMF5

CPU

util

izat

ion

()

ndash2

0

2

(e)

0 100 200 300 400 500 600 700IMF6

CPU

util

izat

ion

()

ndash2

0

2

(f )

0 100 200 300 400 500 600 700IMF7

CPU

util

izat

ion

()

ndash2

0

2

(g)

0 100 200 300 400 500 600 700IMF8

CPU

util

izat

ion

()

ndash2

0

2

(h)

0 100 200 300 400 500 600 700R

CPU

util

izat

ion

()

101214161820

(i)

Figure 4 Decomposition results of EEMD

Table 1 RT average periods and RF of efficient IMF and Rcomponents

IMF1 IMF2 IMF3 IMF4 IMF5 IMF8 RRT 431 181 96 47 21 5 2Average period 212 831 1641 3059 673 673 673RF 1 070 060 053 047 0 0

Journal of Electrical and Computer Engineering 7

number of prediction points It is obvious that the predictionaccuracy is higher when the MAPE is lower

522 Effectiveness Analysis Host utilization under-prediction or overprediction can lead to resource under-provision or overprovision Resource underprovisioncannot guarantee applicationsrsquo QoS while resource over-provision can cause resource waste and low resource utili-zation +erefore a good prediction method should avoidunderprediction and overprediction In particular under-prediction should be avoided as much as possible because itresults in a lower QoS to users

We set up the positive and negative errors to reflect theoverprediction and underprediction and then use them toevaluate the effectiveness of our method A good predictionmethod should have a smaller negative error to avoidunderprediction+e positive and negative prediction errorsare calculated by the following formula where pi is thepredicting data ri is the actual data and m is the number ofunderprediction data (ie negative deviation) or over-prediction data (ie positive deviation)

ei 1m

1113944

m

i1pi minus ri

11138681113868111386811138681113868111386811138681113868 (13)

523 Time Cost Analysis Host utilization varies veryquickly in a cloud data center If host utilization prediction isslower than the determination of VM migration resourceprovision will be delayed which can cause poor QoS +ushost utilization prediction must be completed in a timelymanner To investigate the time cost of our proposedmethod we test the running time of the EEMD-RT-ARIMAmethod and compared it with other prediction methodsaccording to the following index tc

tc tour minus tother

totherlowast100 (14)

where tour indicates the running time of our method EEMD-RT-ARIMA and tother represents the running time of othermethods such as the ARIMA model or the EEMD-ARIMA

method tc denotes the percent of the reduced or increasedtime cost

6 Experimental Results and Analysis

To validate the prediction effectiveness of our EEMD-RT-ARIMA method we conduct experiments on ARIMAEEMD-ARIMA and EEMD-RT-ARIMA methods andcompare their predictive results All experiments wereperformed on a PC with 25GHz Intel (R) i7 CPU runningMATLAB To make three methods comparable we use thesame original dataset to execute it 5 times for each method+e mean values of the prediction results are shown in thefollowing tables and figures

61 Error Analysis Table 2 shows the MAPE values of hostutilization predictions for 7 physical hosts We can see thatEEMD-ARIMA and EEMD-RT-ARIMA methods havelowerMAPE values than ARIMAmodels for 6-point and 12-point predictions For example EEMD-ARIMA and EEMD-RT-ARIMA methods achieve MAPE values of 606 and505 for the 6-point prediction of host 109 while theARIMA model has a far higher MAPE value (up to 1685)+ey obtain MAPE values of 1013 and 546 for the 12-point prediction of host 109 while the ARIMA model ob-tains 1108 For host 22 both the EEMD-ARIMA andEEMD-RT-ARIMA methods obtain far lower MAPE valuesof 531 and 542 than the 1066 of the ARIMA modelfor 6-point prediction Similarly they also obtain bettereffectiveness on 12-point prediction +e same situation alsoexists in 6-point and 12-point predictions of other hosts+is indicates that both EEMD-ARIMA and EEMD-RT-ARIMA methods have higher prediction accuracy thanARIMAmodels in 6-point and 12-point predictions for hostutilization EEMD reduces the inherent volatility of the hostutilization sequence which improves the prediction accu-racy of the EEMD-ARIMA and EEMD-RT-ARIMAmethods However the situation changes in 24-point pre-diction +e MAPE values of hosts 1162 424 1060 and 237are all over 30 using these three methods Although theEEMD-RT-ARIMA method has lower MAPE values than

0 100 200 300 400 500 600 700ndash3ndash2ndash1

0123

CPU

util

izat

ion

()

(a)

0 100 200 300 400 500 600 700ndash3ndash2ndash1

012

CPU

util

izat

ion

()

(b)

0 100 200 300 400 500 600 700101214161820

CPU

util

izat

ion

()

(c)

Figure 5 New components (a) High-frequency and strong-volatility component (b) Medium-frequency and weak-volatility component(c) Low-frequency trend component

8 Journal of Electrical and Computer Engineering

the ARIMA and EEMD-ARIMA methods for hosts 839 and109 it has far higherMAPE values in the 24-point predictionthan those of 6-point and 12-point predictions +is showsthat the EEMD-ARIMA and EEMD-RT-ARIMA methodsare not suitable for long-term but suitable for short-termprediction

For further analysis we find that the EEMD-RT-ARIMAmethod achieves lower prediction error than the EEMD-ARIMA method for the 6-point and 12-point predictions ofhosts 839 109 and 1162 although the EEMD-RT-ARIMAmethod only selects efficient IMF components However itis the opposite for hosts 22 424 1060 and 237 +e original

CPU utilization sequences of all physical hosts are identicalin frequency so we calculate the RT value of each CPUutilization sequence shown in Table 3 Hosts 839 109 and1162 achieve lower RT values under 10 which shows thattheir CPU utilization is more stationary than other hostsSmaller RT values indicate more stationary host utilizationsequences +is phenomenon can also be seen inFigures 6(a)ndash6(c) From Tables 2 and 3 it can be found thatthe EEMD-RT-ARIMA method achieves a lower MAPEvalue than the EEMD-ARIMA method if the RT value issmaller Conversely the EEMD-RT-ARIMA method has ahigher MAPE value than the EEMD-ARIMA method if the

(1) For each new component(2) Set the order of difference d 0(3) Execute the augmented Dickey-Fuller (ADF) test If it is a stationary time series go to step 5 else go to step 4 until it is stationary(4) Difference the time series and set d d + 1 go to step 3(5) Determine the order of the ARIMA model using Bayesian information criterion (BIC)(6) Estimate the parameters of the ARIMA model using the maximum likelihood(7) Forecast the future n values of this new component using the ARIMA model(8) End(9) Obtain the overall prediction results by superposing the prediction results of each new component

ALGORITHM 1 ARIMA prediction

0204060

CPU

utili

zatio

n(

)

(a)

0204060

CPU

utili

zatio

n(

)

(b)

0204060

CPU

utili

zatio

n(

)

(c)

0204060

CPU

utili

zatio

n(

)

(d)

CPU

utili

zatio

n(

)

0204060

(e)

CPU

utili

zatio

n(

)

0204060

(f )

CPU

utili

zatio

n(

)

0204060

(g)

Figure 6 CPU utilization traces of 7 physical hosts (a) Host 839 (b) Host 109 (c) Host 1162 (d) Host 22 (e) Host 424 (f ) Host 1060 (g)Host 237

Journal of Electrical and Computer Engineering 9

RTvalue is larger For instance hosts 839 109 and 1162 withsmaller RT values obtain lower MAPE values using theEEMD-RT-ARIMA method than the EEMD-ARIMAmethod while hosts 22 424 1060 and 237 with largerRT values obtain higher MAPE values using the EEMD-RT-ARIMA method than the EEMD-ARIMA method

Furthermore the difference in theMAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods decreaseswith the increase in the RTvalues from host 839 to host 1162+eir difference changes to negative from host 22 whichindicates that the EEMD-ARIMA method has higher pre-diction accuracy than the EEMD-RT-ARIMA method+en their difference becomes larger as the RT values in-crease +e 12-point host CPU utilization prediction illus-trates this situation For example host 839 with an RTvalueof 2 has a MAPE of 503 for 12-point prediction by usingthe EEMD-RT-ARIMA method which is 545 lower thanthe 1048 of the EEMD-ARIMA method +e MAPE valueof the EEMD-RT-ARIMA method is only 419 lower thanthat of the EEMD-ARIMAmethod for host 1162 with an RTvalue of 10 For host 22 with an RT value of 16 the EEMD-RT-ARIMA method has a slightly higher MAPE of 537than 533 of the EEMD-ARIMAmethodWith the increaseof the RT value the differences of MAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods furtherincrease to 087 196 and 463 for hosts 424 1060 and237 respectively+is indicates that the EEMD-RT-ARIMAmethod is less effective than the EEMD-ARIMA method inCPU utilization prediction for these hosts Undoubtedlythe ARIMA prediction of each component decomposed by

the EEMD method generates a certain error +e super-position of the prediction results of each component causeserror accumulation +e EEMD-RT-ARIMA method re-duces the error accumulation by selecting and recon-structing the efficient IMF components into fewercomponents so it achieves better prediction accuracy thanthe EEMD-ARIMA method for hosts 839 109 and 1162Certainly the selection and reconstruction of efficient IMFcomponents also cause a certain prediction error due to theabsence of nonefficient components especially for non-stationary host utilization sequences When this kind ofprediction error exceeds the error accumulation of ARIMAprediction of all components in the EEMD-ARIMAmethod the EEMD-RT-ARIMA method is no more ef-fective than the EEMD-ARIMA method for the non-stationary CPU utilization prediction of some hosts suchas hosts 22 424 1060 and 237

62 Effectiveness Analysis To verify the effectiveness of ourmethod in short-term prediction we select the experimentalresults of hosts 839 22 and 237 with the minimum middleand maximum RT values for further analysis Figure 7 showsthe prediction results of the EEMD-RT-ARIMAARIMA andEEMD-ARIMA methods We find that the future resourceutilization of host 839 decreases below 11 According to apredefined policy host 839 is underloaded and can be closedto save energy Figure 7 shows that our method is moreaccurate and effective than the ARIMA model In particularour method tends to change with the trend of data variationwhile the ARIMAmodel cannot keep up with it Our methodis more suitable for handling nonstationary time series thanthe ARIMAmodel Additionally the predicting data using theEEMD-RT-ARIMAmethod are closer to the testing data thanthose of the EEMD-ARIMA method for host 839 +ese

Table 2 MAPE values of host utilization prediction

Host ID Prediction length ARIMA () EEMD-ARIMA () EEMD-RT-ARIMA ()

8396-point 977 473 46112-point 1378 1048 50324-point 2310 2153 882

1096-point 1685 606 50512-point 1108 1013 54624-point 4134 2468 836

11626-point 2445 776 64512-point 1781 1314 89524-point 2193 4141 3345

226-point 1066 531 54212-point 822 533 53724-point 1141 1844 1477

4246-point 3754 1612 176512-point 2174 1659 174624-point 8868 7739 1157

10606-point 3571 1005 139412-point 3760 1744 194024-point 7472 5482 9193

2376-point 2229 785 117812-point 2551 1130 159324-point 12611 15843 18951

Table 3 RT values of each host utilizationHost ID 839 109 1162 22 424 1060 237RT value 2 6 10 16 22 24 38

10 Journal of Electrical and Computer Engineering

results show that the EEMD-RT-ARIMA method is moreeffective than the EEMD-ARIMAmethod for CPU utilizationsequences with weak fluctuations When the host utilizationsequence shows stronger fluctuation the absence of non-efficient IMF components will greatly influence the predictionresults +e EEMD-RT-ARIMA method is no more effectivethan the EEMD-ARIMA method for CPU utilization pre-diction of host 237

To further analyze the effectiveness of our method wecalculate the positive and negative errors for 6-point and 12-point predictions of these hosts shown in Table 4 When thenegative error is smaller the prediction method is moresuitable for cloud resource provision because of avoidingunderprediction It can be observed that most of the pre-diction results of the ARIMAmodel are underpredicted (thecells of positive error are all ldquonullrdquo for hosts 839 and 22)Furthermore the negative errors of the ARIMA model are

all far higher than those of other methods for host 237 Forinstance the ARIMA model has a high negative error of upto 2751 for the 12-point prediction of host 237 while theEEMD-ARIMA and EEMD-RT-ARIMA methods only havenegative errors of 800 and 892 respectively If theARIMA model is used to predict future host utilization itcan cause resource underprovision which cannot ensureapplicationsrsquo QoS +e EEMD-RT-ARIMA method achievessmaller negative errors than the EEMD-ARIMA method forhosts 839 and 22 while it has a larger negative error for hosts237 For instance the EEMD-ARIMA method achieves thenegative error of 1071 for the 12-point prediction of host839 while EEMD-RT-ARIMA only achieves the negativeerror of 474 Similarly the EEMD-ARIMA method ob-tains a negative error of 609 for the 12-point prediction ofhost 22 while the EEMD-RT-ARIMA method achieves alower value of only 505 However the situation changes

6789

101112

0 1 2 3 4 5 6 7Host 839 6-point prediction

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(a)

6789

101112

0 2 4 6 8 10 12

CPU

util

izat

ion

()

Host 839 12-point prediction

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(b)

262830323436

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 6-point prediction

(c)

262830323436

0 2 4 6 8 10 12CP

U u

tiliz

atio

n (

)

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 12-point prediction

(d)

1214161820222426

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 6-point prediction

(e)

0 2 4 6 8 10 121214161820222426

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 12-point prediction

(f )

Figure 7 Prediction of the results of different methods

Journal of Electrical and Computer Engineering 11

for host 237 +e EEMD-ARIMA method has a smallernegative error than the EEMD-RT-ARIMA method Forinstance the EEMD-RT-ARIMA method obtains negativeerrors of 1137 and 892 for 6-point and 12-point pre-dictions while the EEMD-ARIMAmethod only has negativeerrors of 462 and 800

63 Time-Cost Analysis To verify the applicability of ourmethod we further compare the time cost of these methodsin Figure 8 +e running time of the EEMD-ARIMAmethod is the largest by over 180 s while the ARIMAmodel takes the least time at less than 50 s +e EEMD-RT-ARIMA method time cost is between 70 s and 117 s whichdecreases the time cost by 40ndash80 compared with theEEMD-ARIMA method For example the running time ofthe EEMD-RT-ARIMA method is 6937 s far less than the33720 s of the EEMD-ARIMA method for the 6-pointprediction of host 22 Our method saves up to 80 of thetime cost For the CPU utilization sequence of host 237with strong variability it requires 19046 s to predict thefuture 6-point values using the EEMD-ARIMA methodwhile it only takes 11364 s using the EEMD-RT-ARIMAmethod +e running time is reduced by approximately40 Considering the prediction accuracy effectivenessand time cost our EEMD-RT-ARIMA method is morecost-effective for short-term host utilization prediction incloud computing

7 Conclusions

Host utilization is an indicator of host performancewhose prediction can promote effective resource sched-uling in cloud computing However host utilizationdemonstrates strong randomness and instability caused byusersrsquo random and various resource demands It is difficultto improve prediction accuracy In this paper we proposea hybrid and cost-effective method EEMD-RT-ARIMAfor short-term host utilization prediction in cloud com-puting +e EEMD method is first used to decompose thenonstationary host utilization sequence into a few rela-tively stationary IMF components and an R component+en we calculate the correlation coefficient between eachIMF component and the original data to select efficientIMF components and use RTvalues and average periods toreconstruct these components into three new componentsto reduce error accumulation and time cost Finally three

new components are predicted by the ARIMA model andtheir prediction results are superposed to form the overallprediction results We use the real host utilization tracesfrom a cloud platform to conduct the experiments andcompare our EEMD-RT-ARIMA method with theARIMA model and EEMD-ARIMA method in terms oferror effectiveness and time-cost analysis +e resultsshow that our method is cost-effective and is more suitablefor short-term host utilization prediction in cloudcomputing

Data Availability

+e running example and experimental data used to supportthe findings of this study have been deposited in the Figsharerepository (httpsdoiorg106084m9figshare7679594)

Conflicts of Interest

+e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

+is work was supported by the Shandong ProvincialNatural Science Foundation (ZR2016FM41)

Table 4 Positive and negative error analysis

HostID

Predictionlength

ARIMA EEMD-ARIMA EEMD-RT-ARIMAPositive error

()Negative error

()Positive error

()Negative error

()Positive error

()Negative error

()

839 6-point Null 977 316 585 492 41512-point Null 1378 343 1071 480 474

22 6-point Null 1066 391 670 412 63412-point Null 972 426 609 539 505

237 6-point 039 2668 1431 462 1381 113712-point 039 2751 1744 800 1790 892

0

50

100

150

200

250

300

350

12-point

Tim

e cos

t (s)

ARIMAEEMD-ARIMA EEMD-RT-ARIMA

6-pointHost 839

12-point6-pointHost 22

12-point6-pointHost 237

Figure 8 Time cost of different prediction methods

12 Journal of Electrical and Computer Engineering

References

[1] J J Prevost K Nagothu B Kelley and M Jamshidi ldquoPre-diction of cloud data center networks loads using stochasticand neural modelsrdquo in Proceedings of 2011 6th InternationalConference on System of Systems Engineering pp 276ndash281IEEE Albuquerque NM USA June 2011

[2] M Borkowski S Schulte and C Hochreiner ldquoPredictingcloud resource utilizationrdquo in Proceedings of 9th InternationalConference on Utility and Cloud Computing (UCC) pp 37ndash42IEEE Shanghai China December 2016

[3] M Barati and S Sharifian ldquoA hybrid heuristic-based tunedsupport vector regression model for cloud load predictionrdquoJournal of Supercomputing vol 71 no 11 pp 4235ndash42592015

[4] Z Chen Y Zhu Y Di S Feng and J Geng ldquoA high-accuracyself-adaptive resource demands predicting method in IaaScloud environmentrdquo Neural Network World vol 25 no 5pp 519ndash540 2015

[5] J Chen and Y Wang ldquoA resource demand prediction methodbased on EEMD in cloud computingrdquo Procedia ComputerScience vol 131 pp 116ndash123 2018

[6] H Toumi Z Brahmi Z Benarfa and M M GammoudildquoServer load prediction using stream miningrdquo in Proceedingsof 2017 International Conference on Information Networking(ICOIN) pp 653ndash661 IEEE Da Nang Vietnam January2017

[7] S Di D Kondo and W Cirne ldquoHost load prediction in aGoogle compute cloud with a Bayesian modelrdquo in Proceedingsof 2012 International Conference on High PerformanceComputing Networking Storage and Analysis (SCrsquo12)pp 1ndash11 IEEE Salt Lake City UT USA November 2012

[8] N K Gondhi and P Kailu ldquoPrediction based energy efficientvirtual machine consolidation in cloud computingrdquo in Pro-ceedings of 2015 Second International Conference on Advancesin Computing and Communication Engineering pp 437ndash441IEEE Dehradun India May 2015

[9] A Verma G Dasgupta T K Nayak P De and R KotharildquoServer workload analysis for power minimization usingconsolidationrdquo in Proceedings of the 2009 Conference onUSENIX Annual Technical Conference p 28 USENIX As-sociation San Diego CA USA June 2009

[10] B Song Y Yu Y Zhou Z Wang and S Du ldquoHost loadprediction with long short-term memory in cloud comput-ingrdquo Journal of Supercomputing vol 74 no 12 pp 6554ndash6568 2018

[11] J-J Jheng F-H Tseng H-C Chao and L-D Chou ldquoA novelVM workload prediction using grey forecasting model incloud data centerrdquo in Proceedings of International Conferenceon Information Networking 2014 (ICOIN2014) pp 40ndash45IEEE Phuket +ailand February 2014

[12] A Beloglazov and R Buyya ldquoManaging overloaded hosts fordynamic consolidation of virtual machines in cloud datacenters under quality of service constraintsrdquo IEEE Trans-actions on Parallel and Distributed Systems vol 24 no 7pp 1366ndash1379 2013

[13] M Dabbagh B Hamdaoui M Guizani and A Rayes ldquoAnenergy-efficient VM prediction and migration framework forovercommitted cloudsrdquo IEEE Transactions on Cloud Com-puting vol 6 no 4 pp 955ndash966 2018

[14] D Minarolli A Mazrekaj and B Freisleben ldquoTackling un-certainty in long-term predictions for host overload andunderload detection in cloud computingrdquo Journal of CloudComputing vol 6 no 1 p 4 2017

[15] K Mason M Duggan E Barrett J Duggan and E HowleyldquoPredicting host CPU utilization in the cloud using evolu-tionary neural networksrdquo Future Generation Computer Sys-tems vol 86 pp 162ndash173 2018

[16] D Magalhatildees R N Calheiros R Buyya and D G GomesldquoWorkload modeling for resource usage analysis and simu-lation in cloud computingrdquo Computers amp Electrical Engi-neering vol 47 pp 69ndash81 2015

[17] C Tian Y Wang Y Luo et al ldquoMinimizing content re-organization and tolerating imperfect workload prediction forcloud-based video-on-demand servicesrdquo IEEE Transactionson Services Computing vol 9 no 6 pp 926ndash939 2016

[18] M Verma G R Gangadharan V Ravi and N NarendraldquoResource demand prediction in multi-tenant service cloudsrdquoin Proceedings of 2013 IEEE International Conference on CloudComputing in Emerging Markets (CCEM) pp 1ndash8 IEEEBangalore India October 2013

[19] W Zhang Y Shi L Liu L Cui and Y Zheng ldquoPerformanceand resource prediction at high utilization for n-tier servicesystems in cloud an experiment driven approachrdquo in Pro-ceedings of 2015 IEEE International Conference on Computerand Information Technology Ubiquitous Computing andCommunications Dependable Autonomic and Secure Com-puting Pervasive Intelligence and Computing pp 843ndash848IEEE Liverpool UK October 2015

[20] G Kecskemeti A Kertesz and Z Nemeth ldquoCloud workloadprediction by means of simulationsrdquo in Proceedings of theComputing Frontiers Conference on ZZZ (CFrsquo17) pp 279ndash282ACM Siena Italy May 2017

[21] Y Chen and Z-A Jiang ldquoDynamically predicting the qualityof service batch online and hybrid algorithmsrdquo Journal ofElectrical and Computer Engineering vol 2017 Article ID9547869 10 pages 2017

[22] A Khan X Yan S Tao and N Anerousis ldquoWorkloadcharacterization and prediction in the cloud a multiple timeseries approachrdquo in Proceedings of 2012 IEEE Network Op-erations and Management Symposium pp 1287ndash1294 IEEEMaui HI USA April 2012

[23] A K Mishra J L Hellerstein W Cirne and C R DasldquoTowards characterizing cloud backend workloadsrdquo ACMSIGMETRICS Performance Evaluation Review vol 37 no 4pp 34ndash41 2010

[24] D Gmach J Rolia L Cherkasova and A KemperldquoWorkload analysis and demand prediction of enterprise datacenter applicationsrdquo in Proceedings of 2007 IEEE 10th In-ternational Symposium on Workload Characterizationpp 171ndash180 IEEE Boston MA USA September 2007

[25] F-H Tseng X Wang L-D Chou H-C Chao andV C M Leung ldquoDynamic resource prediction and allocationfor cloud data center using the multiobjective genetic algo-rithmrdquo IEEE Systems Journal vol 12 no 2 pp 1688ndash16992018

[26] G K Shyam and S S Manvi ldquoVirtual resource prediction incloud environment a Bayesian approachrdquo Journal of Networkand Computer Applications vol 65 pp 144ndash154 2016

[27] Y Lu J Panneerselvam L Liu and Y Wu ldquoRVLBPNN aworkload forecasting model for smart cloud computingrdquoScientific Programming vol 2016 Article ID 5635673 9 pages2016

[28] K Rajaram and M P Malarvizhi ldquoUtilization based pre-diction model for resource provisioningrdquo in Proceedings of2017 International Conference on Computer Communicationand Signal Processing (ICCCSP) pp 1ndash6 IEEE ChennaiIndia January 2017

Journal of Electrical and Computer Engineering 13

[29] L Li and A Zhang ldquoResource demand optimization com-bined prediction under cloud computing environment basedon IOWGA operatorrdquo International Journal of Grid andDistributed Computing vol 8 no 3 pp 77ndash86 2015

[30] D Minarolli and B Freisleben ldquoCross-correlation predictionof resource demand for virtual machine resource allocation incloudsrdquo in Proceedings of 2014 Sixth International Conferenceon Computational Intelligence Communication Systems andNetworks pp 119ndash124 IEEE Tetova Macedonia May 2014

[31] W Zhang P Duan L T Yang et al ldquoResource requestsprediction in the cloud computing environment with a deepbelief networkrdquo Software Practice and Experience vol 47no 3 pp 473ndash488 2017

[32] H-B Mi H-M Wang G Yin D-X Shi Y-F Zhou andL Yuan ldquoResource on-demand reconfiguration method forvirtualized data centersrdquo Journal of Software vol 22 no 9pp 2193ndash2205 2011

[33] R N Calheiros E Masoumi R Ranjan and R BuyyaldquoWorkload prediction using ARIMAmodel and its impact oncloud applicationsrsquo QoSrdquo IEEE Transactions on CloudComputing vol 3 no 4 pp 449ndash458 2015

[34] Y Meng R Rao X Zhang and P Hong ldquoCRUPA a con-tainer resource utilization prediction algorithm for auto-scaling based on time series analysisrdquo in Proceedings of2016 International Conference on Progress in Informatics andComputing (PIC) pp 468ndash472 IEEE Shanghai China De-cember 2016

[35] E Dhib N Zangar N Tabbane and K Boussetta ldquoImpact ofseasonal ARIMA workload prediction model on QoE formassively multiplayers online gamingrdquo in Proceedings of 20165th International Conference on Multimedia Computing andSystems (ICMCS) pp 737ndash741 IEEE Marrakech MoroccoSeptember 2016

[36] A Ganapathi Y Chen A Fox R Katz and D PattersonldquoStatistics-driven workload modeling for the cloudrdquo inProceedings of 2010 IEEE 26th International Conference onData EngineeringWorkshops (ICDEW 2010) pp 87ndash92 IEEELong Beach CA USA March 2010

[37] V G Tran V Debusschere and S Bacha ldquoHourly serverworkload forecasting up to 168 hours ahead using seasonalARIMA modelrdquo in Proceedings of 2012 IEEE InternationalConference on Industrial Technology pp 1127ndash1131 IEEEAthens Greece March 2012

[38] D Xu S Yang and H Luo ldquoResearch on generalized fuzzysoft sets theory based combined model for demanded cloudcomputing resource predictionrdquo Chinese Journal of Man-agement Science vol 23 no 5 pp 56ndash64 2015

[39] S Li Y Wang X Qiu D Wang and L Wang ldquoA workloadprediction-basedmulti-VM provisioningmechanism in cloudcomputingrdquo in Proceedings of 2013 15th Asia-Pacific NetworkOperations andManagement Symposium (APNOMS) pp 1ndash6IEEE Hiroshima Japan September 2013

[40] X Fu and C Zhou ldquoPredicted affinity based virtual machineplacement in cloud computing environmentsrdquo IEEE Trans-actions on Cloud Computing vol 99 p 1 2017

[41] Y Jiang C Perng T Li and R Chang ldquoASAP a self-adaptiveprediction system for instant cloud resource demand pro-visioningrdquo in Proceedings of 2011 IEEE 11th InternationalConference on Data Mining pp 1104ndash1109 IEEE VancouverBC Canada December 2011

[42] Z Wu and N E Huang ldquoEnsemble empirical mode de-composition a noise-assisted data analysis methodrdquo Ad-vances in Adaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[43] H Zang L Fan M Guo Z Wei G Sun and L ZhangldquoShort-term wind power interval forecasting based on anEEMD-RT-RVMmodelrdquo Advances in Meteorology vol 2016Article ID 8760780 10 pages 2016

[44] N Safari C Y Chung and G C D Price ldquoNovel multi-stepshort-term wind power prediction framework based onchaotic time series analysis and singular spectrum analysisrdquoIEEE Transactions on Power Systems vol 33 no 1 pp 590ndash601 2018

[45] X Chen H Wang J Huang and H Ren ldquoAPU degradationprediction based on EEMD and Gaussian process regressionrdquoin Proceedings of 2017 International Conference on SensingDiagnostics Prognostics and Control (SDPC) pp 98ndash104IEEE Shanghai China August 2017

[46] C Yan C Yi XWu et al ldquoTurbine fault trend prediction thatbased on EEMD and ARIMA modelsrdquo Journal of GansuSciences vol 28 no 4 pp 100ndash106 2016

[47] M Jin P Li L Zhang et al ldquoA signal feature method and itsapplication based on EEMD fuzzy entropy and GK cluster-ingrdquo ACTA Metrologica Sinica vol 26 no 5 pp 501ndash5052015

[48] E H Norden Z Shen R L Steven et al ldquo+e empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysisrdquoin Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 no 1971 pp 903ndash995 RoyalSociety London UK March 1998

[49] Alibaba cluster-trace-v2017 httpsgithubcomalibabaclusterdata

14 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: A Hybrid Method for Short-Term Host Utilization Prediction ...downloads.hindawi.com/journals/jece/2019/2782349.pdf · terminesthe migratedVMs according to powersavings and workloadbalance.Dabbaghetal.[13]proposedaprediction

number of prediction points It is obvious that the predictionaccuracy is higher when the MAPE is lower

522 Effectiveness Analysis Host utilization under-prediction or overprediction can lead to resource under-provision or overprovision Resource underprovisioncannot guarantee applicationsrsquo QoS while resource over-provision can cause resource waste and low resource utili-zation +erefore a good prediction method should avoidunderprediction and overprediction In particular under-prediction should be avoided as much as possible because itresults in a lower QoS to users

We set up the positive and negative errors to reflect theoverprediction and underprediction and then use them toevaluate the effectiveness of our method A good predictionmethod should have a smaller negative error to avoidunderprediction+e positive and negative prediction errorsare calculated by the following formula where pi is thepredicting data ri is the actual data and m is the number ofunderprediction data (ie negative deviation) or over-prediction data (ie positive deviation)

ei 1m

1113944

m

i1pi minus ri

11138681113868111386811138681113868111386811138681113868 (13)

523 Time Cost Analysis Host utilization varies veryquickly in a cloud data center If host utilization prediction isslower than the determination of VM migration resourceprovision will be delayed which can cause poor QoS +ushost utilization prediction must be completed in a timelymanner To investigate the time cost of our proposedmethod we test the running time of the EEMD-RT-ARIMAmethod and compared it with other prediction methodsaccording to the following index tc

tc tour minus tother

totherlowast100 (14)

where tour indicates the running time of our method EEMD-RT-ARIMA and tother represents the running time of othermethods such as the ARIMA model or the EEMD-ARIMA

method tc denotes the percent of the reduced or increasedtime cost

6 Experimental Results and Analysis

To validate the prediction effectiveness of our EEMD-RT-ARIMA method we conduct experiments on ARIMAEEMD-ARIMA and EEMD-RT-ARIMA methods andcompare their predictive results All experiments wereperformed on a PC with 25GHz Intel (R) i7 CPU runningMATLAB To make three methods comparable we use thesame original dataset to execute it 5 times for each method+e mean values of the prediction results are shown in thefollowing tables and figures

61 Error Analysis Table 2 shows the MAPE values of hostutilization predictions for 7 physical hosts We can see thatEEMD-ARIMA and EEMD-RT-ARIMA methods havelowerMAPE values than ARIMAmodels for 6-point and 12-point predictions For example EEMD-ARIMA and EEMD-RT-ARIMA methods achieve MAPE values of 606 and505 for the 6-point prediction of host 109 while theARIMA model has a far higher MAPE value (up to 1685)+ey obtain MAPE values of 1013 and 546 for the 12-point prediction of host 109 while the ARIMA model ob-tains 1108 For host 22 both the EEMD-ARIMA andEEMD-RT-ARIMA methods obtain far lower MAPE valuesof 531 and 542 than the 1066 of the ARIMA modelfor 6-point prediction Similarly they also obtain bettereffectiveness on 12-point prediction +e same situation alsoexists in 6-point and 12-point predictions of other hosts+is indicates that both EEMD-ARIMA and EEMD-RT-ARIMA methods have higher prediction accuracy thanARIMAmodels in 6-point and 12-point predictions for hostutilization EEMD reduces the inherent volatility of the hostutilization sequence which improves the prediction accu-racy of the EEMD-ARIMA and EEMD-RT-ARIMAmethods However the situation changes in 24-point pre-diction +e MAPE values of hosts 1162 424 1060 and 237are all over 30 using these three methods Although theEEMD-RT-ARIMA method has lower MAPE values than

0 100 200 300 400 500 600 700ndash3ndash2ndash1

0123

CPU

util

izat

ion

()

(a)

0 100 200 300 400 500 600 700ndash3ndash2ndash1

012

CPU

util

izat

ion

()

(b)

0 100 200 300 400 500 600 700101214161820

CPU

util

izat

ion

()

(c)

Figure 5 New components (a) High-frequency and strong-volatility component (b) Medium-frequency and weak-volatility component(c) Low-frequency trend component

8 Journal of Electrical and Computer Engineering

the ARIMA and EEMD-ARIMA methods for hosts 839 and109 it has far higherMAPE values in the 24-point predictionthan those of 6-point and 12-point predictions +is showsthat the EEMD-ARIMA and EEMD-RT-ARIMA methodsare not suitable for long-term but suitable for short-termprediction

For further analysis we find that the EEMD-RT-ARIMAmethod achieves lower prediction error than the EEMD-ARIMA method for the 6-point and 12-point predictions ofhosts 839 109 and 1162 although the EEMD-RT-ARIMAmethod only selects efficient IMF components However itis the opposite for hosts 22 424 1060 and 237 +e original

CPU utilization sequences of all physical hosts are identicalin frequency so we calculate the RT value of each CPUutilization sequence shown in Table 3 Hosts 839 109 and1162 achieve lower RT values under 10 which shows thattheir CPU utilization is more stationary than other hostsSmaller RT values indicate more stationary host utilizationsequences +is phenomenon can also be seen inFigures 6(a)ndash6(c) From Tables 2 and 3 it can be found thatthe EEMD-RT-ARIMA method achieves a lower MAPEvalue than the EEMD-ARIMA method if the RT value issmaller Conversely the EEMD-RT-ARIMA method has ahigher MAPE value than the EEMD-ARIMA method if the

(1) For each new component(2) Set the order of difference d 0(3) Execute the augmented Dickey-Fuller (ADF) test If it is a stationary time series go to step 5 else go to step 4 until it is stationary(4) Difference the time series and set d d + 1 go to step 3(5) Determine the order of the ARIMA model using Bayesian information criterion (BIC)(6) Estimate the parameters of the ARIMA model using the maximum likelihood(7) Forecast the future n values of this new component using the ARIMA model(8) End(9) Obtain the overall prediction results by superposing the prediction results of each new component

ALGORITHM 1 ARIMA prediction

0204060

CPU

utili

zatio

n(

)

(a)

0204060

CPU

utili

zatio

n(

)

(b)

0204060

CPU

utili

zatio

n(

)

(c)

0204060

CPU

utili

zatio

n(

)

(d)

CPU

utili

zatio

n(

)

0204060

(e)

CPU

utili

zatio

n(

)

0204060

(f )

CPU

utili

zatio

n(

)

0204060

(g)

Figure 6 CPU utilization traces of 7 physical hosts (a) Host 839 (b) Host 109 (c) Host 1162 (d) Host 22 (e) Host 424 (f ) Host 1060 (g)Host 237

Journal of Electrical and Computer Engineering 9

RTvalue is larger For instance hosts 839 109 and 1162 withsmaller RT values obtain lower MAPE values using theEEMD-RT-ARIMA method than the EEMD-ARIMAmethod while hosts 22 424 1060 and 237 with largerRT values obtain higher MAPE values using the EEMD-RT-ARIMA method than the EEMD-ARIMA method

Furthermore the difference in theMAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods decreaseswith the increase in the RTvalues from host 839 to host 1162+eir difference changes to negative from host 22 whichindicates that the EEMD-ARIMA method has higher pre-diction accuracy than the EEMD-RT-ARIMA method+en their difference becomes larger as the RT values in-crease +e 12-point host CPU utilization prediction illus-trates this situation For example host 839 with an RTvalueof 2 has a MAPE of 503 for 12-point prediction by usingthe EEMD-RT-ARIMA method which is 545 lower thanthe 1048 of the EEMD-ARIMA method +e MAPE valueof the EEMD-RT-ARIMA method is only 419 lower thanthat of the EEMD-ARIMAmethod for host 1162 with an RTvalue of 10 For host 22 with an RT value of 16 the EEMD-RT-ARIMA method has a slightly higher MAPE of 537than 533 of the EEMD-ARIMAmethodWith the increaseof the RT value the differences of MAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods furtherincrease to 087 196 and 463 for hosts 424 1060 and237 respectively+is indicates that the EEMD-RT-ARIMAmethod is less effective than the EEMD-ARIMA method inCPU utilization prediction for these hosts Undoubtedlythe ARIMA prediction of each component decomposed by

the EEMD method generates a certain error +e super-position of the prediction results of each component causeserror accumulation +e EEMD-RT-ARIMA method re-duces the error accumulation by selecting and recon-structing the efficient IMF components into fewercomponents so it achieves better prediction accuracy thanthe EEMD-ARIMA method for hosts 839 109 and 1162Certainly the selection and reconstruction of efficient IMFcomponents also cause a certain prediction error due to theabsence of nonefficient components especially for non-stationary host utilization sequences When this kind ofprediction error exceeds the error accumulation of ARIMAprediction of all components in the EEMD-ARIMAmethod the EEMD-RT-ARIMA method is no more ef-fective than the EEMD-ARIMA method for the non-stationary CPU utilization prediction of some hosts suchas hosts 22 424 1060 and 237

62 Effectiveness Analysis To verify the effectiveness of ourmethod in short-term prediction we select the experimentalresults of hosts 839 22 and 237 with the minimum middleand maximum RT values for further analysis Figure 7 showsthe prediction results of the EEMD-RT-ARIMAARIMA andEEMD-ARIMA methods We find that the future resourceutilization of host 839 decreases below 11 According to apredefined policy host 839 is underloaded and can be closedto save energy Figure 7 shows that our method is moreaccurate and effective than the ARIMA model In particularour method tends to change with the trend of data variationwhile the ARIMAmodel cannot keep up with it Our methodis more suitable for handling nonstationary time series thanthe ARIMAmodel Additionally the predicting data using theEEMD-RT-ARIMAmethod are closer to the testing data thanthose of the EEMD-ARIMA method for host 839 +ese

Table 2 MAPE values of host utilization prediction

Host ID Prediction length ARIMA () EEMD-ARIMA () EEMD-RT-ARIMA ()

8396-point 977 473 46112-point 1378 1048 50324-point 2310 2153 882

1096-point 1685 606 50512-point 1108 1013 54624-point 4134 2468 836

11626-point 2445 776 64512-point 1781 1314 89524-point 2193 4141 3345

226-point 1066 531 54212-point 822 533 53724-point 1141 1844 1477

4246-point 3754 1612 176512-point 2174 1659 174624-point 8868 7739 1157

10606-point 3571 1005 139412-point 3760 1744 194024-point 7472 5482 9193

2376-point 2229 785 117812-point 2551 1130 159324-point 12611 15843 18951

Table 3 RT values of each host utilizationHost ID 839 109 1162 22 424 1060 237RT value 2 6 10 16 22 24 38

10 Journal of Electrical and Computer Engineering

results show that the EEMD-RT-ARIMA method is moreeffective than the EEMD-ARIMAmethod for CPU utilizationsequences with weak fluctuations When the host utilizationsequence shows stronger fluctuation the absence of non-efficient IMF components will greatly influence the predictionresults +e EEMD-RT-ARIMA method is no more effectivethan the EEMD-ARIMA method for CPU utilization pre-diction of host 237

To further analyze the effectiveness of our method wecalculate the positive and negative errors for 6-point and 12-point predictions of these hosts shown in Table 4 When thenegative error is smaller the prediction method is moresuitable for cloud resource provision because of avoidingunderprediction It can be observed that most of the pre-diction results of the ARIMAmodel are underpredicted (thecells of positive error are all ldquonullrdquo for hosts 839 and 22)Furthermore the negative errors of the ARIMA model are

all far higher than those of other methods for host 237 Forinstance the ARIMA model has a high negative error of upto 2751 for the 12-point prediction of host 237 while theEEMD-ARIMA and EEMD-RT-ARIMA methods only havenegative errors of 800 and 892 respectively If theARIMA model is used to predict future host utilization itcan cause resource underprovision which cannot ensureapplicationsrsquo QoS +e EEMD-RT-ARIMA method achievessmaller negative errors than the EEMD-ARIMA method forhosts 839 and 22 while it has a larger negative error for hosts237 For instance the EEMD-ARIMA method achieves thenegative error of 1071 for the 12-point prediction of host839 while EEMD-RT-ARIMA only achieves the negativeerror of 474 Similarly the EEMD-ARIMA method ob-tains a negative error of 609 for the 12-point prediction ofhost 22 while the EEMD-RT-ARIMA method achieves alower value of only 505 However the situation changes

6789

101112

0 1 2 3 4 5 6 7Host 839 6-point prediction

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(a)

6789

101112

0 2 4 6 8 10 12

CPU

util

izat

ion

()

Host 839 12-point prediction

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(b)

262830323436

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 6-point prediction

(c)

262830323436

0 2 4 6 8 10 12CP

U u

tiliz

atio

n (

)

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 12-point prediction

(d)

1214161820222426

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 6-point prediction

(e)

0 2 4 6 8 10 121214161820222426

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 12-point prediction

(f )

Figure 7 Prediction of the results of different methods

Journal of Electrical and Computer Engineering 11

for host 237 +e EEMD-ARIMA method has a smallernegative error than the EEMD-RT-ARIMA method Forinstance the EEMD-RT-ARIMA method obtains negativeerrors of 1137 and 892 for 6-point and 12-point pre-dictions while the EEMD-ARIMAmethod only has negativeerrors of 462 and 800

63 Time-Cost Analysis To verify the applicability of ourmethod we further compare the time cost of these methodsin Figure 8 +e running time of the EEMD-ARIMAmethod is the largest by over 180 s while the ARIMAmodel takes the least time at less than 50 s +e EEMD-RT-ARIMA method time cost is between 70 s and 117 s whichdecreases the time cost by 40ndash80 compared with theEEMD-ARIMA method For example the running time ofthe EEMD-RT-ARIMA method is 6937 s far less than the33720 s of the EEMD-ARIMA method for the 6-pointprediction of host 22 Our method saves up to 80 of thetime cost For the CPU utilization sequence of host 237with strong variability it requires 19046 s to predict thefuture 6-point values using the EEMD-ARIMA methodwhile it only takes 11364 s using the EEMD-RT-ARIMAmethod +e running time is reduced by approximately40 Considering the prediction accuracy effectivenessand time cost our EEMD-RT-ARIMA method is morecost-effective for short-term host utilization prediction incloud computing

7 Conclusions

Host utilization is an indicator of host performancewhose prediction can promote effective resource sched-uling in cloud computing However host utilizationdemonstrates strong randomness and instability caused byusersrsquo random and various resource demands It is difficultto improve prediction accuracy In this paper we proposea hybrid and cost-effective method EEMD-RT-ARIMAfor short-term host utilization prediction in cloud com-puting +e EEMD method is first used to decompose thenonstationary host utilization sequence into a few rela-tively stationary IMF components and an R component+en we calculate the correlation coefficient between eachIMF component and the original data to select efficientIMF components and use RTvalues and average periods toreconstruct these components into three new componentsto reduce error accumulation and time cost Finally three

new components are predicted by the ARIMA model andtheir prediction results are superposed to form the overallprediction results We use the real host utilization tracesfrom a cloud platform to conduct the experiments andcompare our EEMD-RT-ARIMA method with theARIMA model and EEMD-ARIMA method in terms oferror effectiveness and time-cost analysis +e resultsshow that our method is cost-effective and is more suitablefor short-term host utilization prediction in cloudcomputing

Data Availability

+e running example and experimental data used to supportthe findings of this study have been deposited in the Figsharerepository (httpsdoiorg106084m9figshare7679594)

Conflicts of Interest

+e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

+is work was supported by the Shandong ProvincialNatural Science Foundation (ZR2016FM41)

Table 4 Positive and negative error analysis

HostID

Predictionlength

ARIMA EEMD-ARIMA EEMD-RT-ARIMAPositive error

()Negative error

()Positive error

()Negative error

()Positive error

()Negative error

()

839 6-point Null 977 316 585 492 41512-point Null 1378 343 1071 480 474

22 6-point Null 1066 391 670 412 63412-point Null 972 426 609 539 505

237 6-point 039 2668 1431 462 1381 113712-point 039 2751 1744 800 1790 892

0

50

100

150

200

250

300

350

12-point

Tim

e cos

t (s)

ARIMAEEMD-ARIMA EEMD-RT-ARIMA

6-pointHost 839

12-point6-pointHost 22

12-point6-pointHost 237

Figure 8 Time cost of different prediction methods

12 Journal of Electrical and Computer Engineering

References

[1] J J Prevost K Nagothu B Kelley and M Jamshidi ldquoPre-diction of cloud data center networks loads using stochasticand neural modelsrdquo in Proceedings of 2011 6th InternationalConference on System of Systems Engineering pp 276ndash281IEEE Albuquerque NM USA June 2011

[2] M Borkowski S Schulte and C Hochreiner ldquoPredictingcloud resource utilizationrdquo in Proceedings of 9th InternationalConference on Utility and Cloud Computing (UCC) pp 37ndash42IEEE Shanghai China December 2016

[3] M Barati and S Sharifian ldquoA hybrid heuristic-based tunedsupport vector regression model for cloud load predictionrdquoJournal of Supercomputing vol 71 no 11 pp 4235ndash42592015

[4] Z Chen Y Zhu Y Di S Feng and J Geng ldquoA high-accuracyself-adaptive resource demands predicting method in IaaScloud environmentrdquo Neural Network World vol 25 no 5pp 519ndash540 2015

[5] J Chen and Y Wang ldquoA resource demand prediction methodbased on EEMD in cloud computingrdquo Procedia ComputerScience vol 131 pp 116ndash123 2018

[6] H Toumi Z Brahmi Z Benarfa and M M GammoudildquoServer load prediction using stream miningrdquo in Proceedingsof 2017 International Conference on Information Networking(ICOIN) pp 653ndash661 IEEE Da Nang Vietnam January2017

[7] S Di D Kondo and W Cirne ldquoHost load prediction in aGoogle compute cloud with a Bayesian modelrdquo in Proceedingsof 2012 International Conference on High PerformanceComputing Networking Storage and Analysis (SCrsquo12)pp 1ndash11 IEEE Salt Lake City UT USA November 2012

[8] N K Gondhi and P Kailu ldquoPrediction based energy efficientvirtual machine consolidation in cloud computingrdquo in Pro-ceedings of 2015 Second International Conference on Advancesin Computing and Communication Engineering pp 437ndash441IEEE Dehradun India May 2015

[9] A Verma G Dasgupta T K Nayak P De and R KotharildquoServer workload analysis for power minimization usingconsolidationrdquo in Proceedings of the 2009 Conference onUSENIX Annual Technical Conference p 28 USENIX As-sociation San Diego CA USA June 2009

[10] B Song Y Yu Y Zhou Z Wang and S Du ldquoHost loadprediction with long short-term memory in cloud comput-ingrdquo Journal of Supercomputing vol 74 no 12 pp 6554ndash6568 2018

[11] J-J Jheng F-H Tseng H-C Chao and L-D Chou ldquoA novelVM workload prediction using grey forecasting model incloud data centerrdquo in Proceedings of International Conferenceon Information Networking 2014 (ICOIN2014) pp 40ndash45IEEE Phuket +ailand February 2014

[12] A Beloglazov and R Buyya ldquoManaging overloaded hosts fordynamic consolidation of virtual machines in cloud datacenters under quality of service constraintsrdquo IEEE Trans-actions on Parallel and Distributed Systems vol 24 no 7pp 1366ndash1379 2013

[13] M Dabbagh B Hamdaoui M Guizani and A Rayes ldquoAnenergy-efficient VM prediction and migration framework forovercommitted cloudsrdquo IEEE Transactions on Cloud Com-puting vol 6 no 4 pp 955ndash966 2018

[14] D Minarolli A Mazrekaj and B Freisleben ldquoTackling un-certainty in long-term predictions for host overload andunderload detection in cloud computingrdquo Journal of CloudComputing vol 6 no 1 p 4 2017

[15] K Mason M Duggan E Barrett J Duggan and E HowleyldquoPredicting host CPU utilization in the cloud using evolu-tionary neural networksrdquo Future Generation Computer Sys-tems vol 86 pp 162ndash173 2018

[16] D Magalhatildees R N Calheiros R Buyya and D G GomesldquoWorkload modeling for resource usage analysis and simu-lation in cloud computingrdquo Computers amp Electrical Engi-neering vol 47 pp 69ndash81 2015

[17] C Tian Y Wang Y Luo et al ldquoMinimizing content re-organization and tolerating imperfect workload prediction forcloud-based video-on-demand servicesrdquo IEEE Transactionson Services Computing vol 9 no 6 pp 926ndash939 2016

[18] M Verma G R Gangadharan V Ravi and N NarendraldquoResource demand prediction in multi-tenant service cloudsrdquoin Proceedings of 2013 IEEE International Conference on CloudComputing in Emerging Markets (CCEM) pp 1ndash8 IEEEBangalore India October 2013

[19] W Zhang Y Shi L Liu L Cui and Y Zheng ldquoPerformanceand resource prediction at high utilization for n-tier servicesystems in cloud an experiment driven approachrdquo in Pro-ceedings of 2015 IEEE International Conference on Computerand Information Technology Ubiquitous Computing andCommunications Dependable Autonomic and Secure Com-puting Pervasive Intelligence and Computing pp 843ndash848IEEE Liverpool UK October 2015

[20] G Kecskemeti A Kertesz and Z Nemeth ldquoCloud workloadprediction by means of simulationsrdquo in Proceedings of theComputing Frontiers Conference on ZZZ (CFrsquo17) pp 279ndash282ACM Siena Italy May 2017

[21] Y Chen and Z-A Jiang ldquoDynamically predicting the qualityof service batch online and hybrid algorithmsrdquo Journal ofElectrical and Computer Engineering vol 2017 Article ID9547869 10 pages 2017

[22] A Khan X Yan S Tao and N Anerousis ldquoWorkloadcharacterization and prediction in the cloud a multiple timeseries approachrdquo in Proceedings of 2012 IEEE Network Op-erations and Management Symposium pp 1287ndash1294 IEEEMaui HI USA April 2012

[23] A K Mishra J L Hellerstein W Cirne and C R DasldquoTowards characterizing cloud backend workloadsrdquo ACMSIGMETRICS Performance Evaluation Review vol 37 no 4pp 34ndash41 2010

[24] D Gmach J Rolia L Cherkasova and A KemperldquoWorkload analysis and demand prediction of enterprise datacenter applicationsrdquo in Proceedings of 2007 IEEE 10th In-ternational Symposium on Workload Characterizationpp 171ndash180 IEEE Boston MA USA September 2007

[25] F-H Tseng X Wang L-D Chou H-C Chao andV C M Leung ldquoDynamic resource prediction and allocationfor cloud data center using the multiobjective genetic algo-rithmrdquo IEEE Systems Journal vol 12 no 2 pp 1688ndash16992018

[26] G K Shyam and S S Manvi ldquoVirtual resource prediction incloud environment a Bayesian approachrdquo Journal of Networkand Computer Applications vol 65 pp 144ndash154 2016

[27] Y Lu J Panneerselvam L Liu and Y Wu ldquoRVLBPNN aworkload forecasting model for smart cloud computingrdquoScientific Programming vol 2016 Article ID 5635673 9 pages2016

[28] K Rajaram and M P Malarvizhi ldquoUtilization based pre-diction model for resource provisioningrdquo in Proceedings of2017 International Conference on Computer Communicationand Signal Processing (ICCCSP) pp 1ndash6 IEEE ChennaiIndia January 2017

Journal of Electrical and Computer Engineering 13

[29] L Li and A Zhang ldquoResource demand optimization com-bined prediction under cloud computing environment basedon IOWGA operatorrdquo International Journal of Grid andDistributed Computing vol 8 no 3 pp 77ndash86 2015

[30] D Minarolli and B Freisleben ldquoCross-correlation predictionof resource demand for virtual machine resource allocation incloudsrdquo in Proceedings of 2014 Sixth International Conferenceon Computational Intelligence Communication Systems andNetworks pp 119ndash124 IEEE Tetova Macedonia May 2014

[31] W Zhang P Duan L T Yang et al ldquoResource requestsprediction in the cloud computing environment with a deepbelief networkrdquo Software Practice and Experience vol 47no 3 pp 473ndash488 2017

[32] H-B Mi H-M Wang G Yin D-X Shi Y-F Zhou andL Yuan ldquoResource on-demand reconfiguration method forvirtualized data centersrdquo Journal of Software vol 22 no 9pp 2193ndash2205 2011

[33] R N Calheiros E Masoumi R Ranjan and R BuyyaldquoWorkload prediction using ARIMAmodel and its impact oncloud applicationsrsquo QoSrdquo IEEE Transactions on CloudComputing vol 3 no 4 pp 449ndash458 2015

[34] Y Meng R Rao X Zhang and P Hong ldquoCRUPA a con-tainer resource utilization prediction algorithm for auto-scaling based on time series analysisrdquo in Proceedings of2016 International Conference on Progress in Informatics andComputing (PIC) pp 468ndash472 IEEE Shanghai China De-cember 2016

[35] E Dhib N Zangar N Tabbane and K Boussetta ldquoImpact ofseasonal ARIMA workload prediction model on QoE formassively multiplayers online gamingrdquo in Proceedings of 20165th International Conference on Multimedia Computing andSystems (ICMCS) pp 737ndash741 IEEE Marrakech MoroccoSeptember 2016

[36] A Ganapathi Y Chen A Fox R Katz and D PattersonldquoStatistics-driven workload modeling for the cloudrdquo inProceedings of 2010 IEEE 26th International Conference onData EngineeringWorkshops (ICDEW 2010) pp 87ndash92 IEEELong Beach CA USA March 2010

[37] V G Tran V Debusschere and S Bacha ldquoHourly serverworkload forecasting up to 168 hours ahead using seasonalARIMA modelrdquo in Proceedings of 2012 IEEE InternationalConference on Industrial Technology pp 1127ndash1131 IEEEAthens Greece March 2012

[38] D Xu S Yang and H Luo ldquoResearch on generalized fuzzysoft sets theory based combined model for demanded cloudcomputing resource predictionrdquo Chinese Journal of Man-agement Science vol 23 no 5 pp 56ndash64 2015

[39] S Li Y Wang X Qiu D Wang and L Wang ldquoA workloadprediction-basedmulti-VM provisioningmechanism in cloudcomputingrdquo in Proceedings of 2013 15th Asia-Pacific NetworkOperations andManagement Symposium (APNOMS) pp 1ndash6IEEE Hiroshima Japan September 2013

[40] X Fu and C Zhou ldquoPredicted affinity based virtual machineplacement in cloud computing environmentsrdquo IEEE Trans-actions on Cloud Computing vol 99 p 1 2017

[41] Y Jiang C Perng T Li and R Chang ldquoASAP a self-adaptiveprediction system for instant cloud resource demand pro-visioningrdquo in Proceedings of 2011 IEEE 11th InternationalConference on Data Mining pp 1104ndash1109 IEEE VancouverBC Canada December 2011

[42] Z Wu and N E Huang ldquoEnsemble empirical mode de-composition a noise-assisted data analysis methodrdquo Ad-vances in Adaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[43] H Zang L Fan M Guo Z Wei G Sun and L ZhangldquoShort-term wind power interval forecasting based on anEEMD-RT-RVMmodelrdquo Advances in Meteorology vol 2016Article ID 8760780 10 pages 2016

[44] N Safari C Y Chung and G C D Price ldquoNovel multi-stepshort-term wind power prediction framework based onchaotic time series analysis and singular spectrum analysisrdquoIEEE Transactions on Power Systems vol 33 no 1 pp 590ndash601 2018

[45] X Chen H Wang J Huang and H Ren ldquoAPU degradationprediction based on EEMD and Gaussian process regressionrdquoin Proceedings of 2017 International Conference on SensingDiagnostics Prognostics and Control (SDPC) pp 98ndash104IEEE Shanghai China August 2017

[46] C Yan C Yi XWu et al ldquoTurbine fault trend prediction thatbased on EEMD and ARIMA modelsrdquo Journal of GansuSciences vol 28 no 4 pp 100ndash106 2016

[47] M Jin P Li L Zhang et al ldquoA signal feature method and itsapplication based on EEMD fuzzy entropy and GK cluster-ingrdquo ACTA Metrologica Sinica vol 26 no 5 pp 501ndash5052015

[48] E H Norden Z Shen R L Steven et al ldquo+e empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysisrdquoin Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 no 1971 pp 903ndash995 RoyalSociety London UK March 1998

[49] Alibaba cluster-trace-v2017 httpsgithubcomalibabaclusterdata

14 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: A Hybrid Method for Short-Term Host Utilization Prediction ...downloads.hindawi.com/journals/jece/2019/2782349.pdf · terminesthe migratedVMs according to powersavings and workloadbalance.Dabbaghetal.[13]proposedaprediction

the ARIMA and EEMD-ARIMA methods for hosts 839 and109 it has far higherMAPE values in the 24-point predictionthan those of 6-point and 12-point predictions +is showsthat the EEMD-ARIMA and EEMD-RT-ARIMA methodsare not suitable for long-term but suitable for short-termprediction

For further analysis we find that the EEMD-RT-ARIMAmethod achieves lower prediction error than the EEMD-ARIMA method for the 6-point and 12-point predictions ofhosts 839 109 and 1162 although the EEMD-RT-ARIMAmethod only selects efficient IMF components However itis the opposite for hosts 22 424 1060 and 237 +e original

CPU utilization sequences of all physical hosts are identicalin frequency so we calculate the RT value of each CPUutilization sequence shown in Table 3 Hosts 839 109 and1162 achieve lower RT values under 10 which shows thattheir CPU utilization is more stationary than other hostsSmaller RT values indicate more stationary host utilizationsequences +is phenomenon can also be seen inFigures 6(a)ndash6(c) From Tables 2 and 3 it can be found thatthe EEMD-RT-ARIMA method achieves a lower MAPEvalue than the EEMD-ARIMA method if the RT value issmaller Conversely the EEMD-RT-ARIMA method has ahigher MAPE value than the EEMD-ARIMA method if the

(1) For each new component(2) Set the order of difference d 0(3) Execute the augmented Dickey-Fuller (ADF) test If it is a stationary time series go to step 5 else go to step 4 until it is stationary(4) Difference the time series and set d d + 1 go to step 3(5) Determine the order of the ARIMA model using Bayesian information criterion (BIC)(6) Estimate the parameters of the ARIMA model using the maximum likelihood(7) Forecast the future n values of this new component using the ARIMA model(8) End(9) Obtain the overall prediction results by superposing the prediction results of each new component

ALGORITHM 1 ARIMA prediction

0204060

CPU

utili

zatio

n(

)

(a)

0204060

CPU

utili

zatio

n(

)

(b)

0204060

CPU

utili

zatio

n(

)

(c)

0204060

CPU

utili

zatio

n(

)

(d)

CPU

utili

zatio

n(

)

0204060

(e)

CPU

utili

zatio

n(

)

0204060

(f )

CPU

utili

zatio

n(

)

0204060

(g)

Figure 6 CPU utilization traces of 7 physical hosts (a) Host 839 (b) Host 109 (c) Host 1162 (d) Host 22 (e) Host 424 (f ) Host 1060 (g)Host 237

Journal of Electrical and Computer Engineering 9

RTvalue is larger For instance hosts 839 109 and 1162 withsmaller RT values obtain lower MAPE values using theEEMD-RT-ARIMA method than the EEMD-ARIMAmethod while hosts 22 424 1060 and 237 with largerRT values obtain higher MAPE values using the EEMD-RT-ARIMA method than the EEMD-ARIMA method

Furthermore the difference in theMAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods decreaseswith the increase in the RTvalues from host 839 to host 1162+eir difference changes to negative from host 22 whichindicates that the EEMD-ARIMA method has higher pre-diction accuracy than the EEMD-RT-ARIMA method+en their difference becomes larger as the RT values in-crease +e 12-point host CPU utilization prediction illus-trates this situation For example host 839 with an RTvalueof 2 has a MAPE of 503 for 12-point prediction by usingthe EEMD-RT-ARIMA method which is 545 lower thanthe 1048 of the EEMD-ARIMA method +e MAPE valueof the EEMD-RT-ARIMA method is only 419 lower thanthat of the EEMD-ARIMAmethod for host 1162 with an RTvalue of 10 For host 22 with an RT value of 16 the EEMD-RT-ARIMA method has a slightly higher MAPE of 537than 533 of the EEMD-ARIMAmethodWith the increaseof the RT value the differences of MAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods furtherincrease to 087 196 and 463 for hosts 424 1060 and237 respectively+is indicates that the EEMD-RT-ARIMAmethod is less effective than the EEMD-ARIMA method inCPU utilization prediction for these hosts Undoubtedlythe ARIMA prediction of each component decomposed by

the EEMD method generates a certain error +e super-position of the prediction results of each component causeserror accumulation +e EEMD-RT-ARIMA method re-duces the error accumulation by selecting and recon-structing the efficient IMF components into fewercomponents so it achieves better prediction accuracy thanthe EEMD-ARIMA method for hosts 839 109 and 1162Certainly the selection and reconstruction of efficient IMFcomponents also cause a certain prediction error due to theabsence of nonefficient components especially for non-stationary host utilization sequences When this kind ofprediction error exceeds the error accumulation of ARIMAprediction of all components in the EEMD-ARIMAmethod the EEMD-RT-ARIMA method is no more ef-fective than the EEMD-ARIMA method for the non-stationary CPU utilization prediction of some hosts suchas hosts 22 424 1060 and 237

62 Effectiveness Analysis To verify the effectiveness of ourmethod in short-term prediction we select the experimentalresults of hosts 839 22 and 237 with the minimum middleand maximum RT values for further analysis Figure 7 showsthe prediction results of the EEMD-RT-ARIMAARIMA andEEMD-ARIMA methods We find that the future resourceutilization of host 839 decreases below 11 According to apredefined policy host 839 is underloaded and can be closedto save energy Figure 7 shows that our method is moreaccurate and effective than the ARIMA model In particularour method tends to change with the trend of data variationwhile the ARIMAmodel cannot keep up with it Our methodis more suitable for handling nonstationary time series thanthe ARIMAmodel Additionally the predicting data using theEEMD-RT-ARIMAmethod are closer to the testing data thanthose of the EEMD-ARIMA method for host 839 +ese

Table 2 MAPE values of host utilization prediction

Host ID Prediction length ARIMA () EEMD-ARIMA () EEMD-RT-ARIMA ()

8396-point 977 473 46112-point 1378 1048 50324-point 2310 2153 882

1096-point 1685 606 50512-point 1108 1013 54624-point 4134 2468 836

11626-point 2445 776 64512-point 1781 1314 89524-point 2193 4141 3345

226-point 1066 531 54212-point 822 533 53724-point 1141 1844 1477

4246-point 3754 1612 176512-point 2174 1659 174624-point 8868 7739 1157

10606-point 3571 1005 139412-point 3760 1744 194024-point 7472 5482 9193

2376-point 2229 785 117812-point 2551 1130 159324-point 12611 15843 18951

Table 3 RT values of each host utilizationHost ID 839 109 1162 22 424 1060 237RT value 2 6 10 16 22 24 38

10 Journal of Electrical and Computer Engineering

results show that the EEMD-RT-ARIMA method is moreeffective than the EEMD-ARIMAmethod for CPU utilizationsequences with weak fluctuations When the host utilizationsequence shows stronger fluctuation the absence of non-efficient IMF components will greatly influence the predictionresults +e EEMD-RT-ARIMA method is no more effectivethan the EEMD-ARIMA method for CPU utilization pre-diction of host 237

To further analyze the effectiveness of our method wecalculate the positive and negative errors for 6-point and 12-point predictions of these hosts shown in Table 4 When thenegative error is smaller the prediction method is moresuitable for cloud resource provision because of avoidingunderprediction It can be observed that most of the pre-diction results of the ARIMAmodel are underpredicted (thecells of positive error are all ldquonullrdquo for hosts 839 and 22)Furthermore the negative errors of the ARIMA model are

all far higher than those of other methods for host 237 Forinstance the ARIMA model has a high negative error of upto 2751 for the 12-point prediction of host 237 while theEEMD-ARIMA and EEMD-RT-ARIMA methods only havenegative errors of 800 and 892 respectively If theARIMA model is used to predict future host utilization itcan cause resource underprovision which cannot ensureapplicationsrsquo QoS +e EEMD-RT-ARIMA method achievessmaller negative errors than the EEMD-ARIMA method forhosts 839 and 22 while it has a larger negative error for hosts237 For instance the EEMD-ARIMA method achieves thenegative error of 1071 for the 12-point prediction of host839 while EEMD-RT-ARIMA only achieves the negativeerror of 474 Similarly the EEMD-ARIMA method ob-tains a negative error of 609 for the 12-point prediction ofhost 22 while the EEMD-RT-ARIMA method achieves alower value of only 505 However the situation changes

6789

101112

0 1 2 3 4 5 6 7Host 839 6-point prediction

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(a)

6789

101112

0 2 4 6 8 10 12

CPU

util

izat

ion

()

Host 839 12-point prediction

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(b)

262830323436

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 6-point prediction

(c)

262830323436

0 2 4 6 8 10 12CP

U u

tiliz

atio

n (

)

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 12-point prediction

(d)

1214161820222426

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 6-point prediction

(e)

0 2 4 6 8 10 121214161820222426

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 12-point prediction

(f )

Figure 7 Prediction of the results of different methods

Journal of Electrical and Computer Engineering 11

for host 237 +e EEMD-ARIMA method has a smallernegative error than the EEMD-RT-ARIMA method Forinstance the EEMD-RT-ARIMA method obtains negativeerrors of 1137 and 892 for 6-point and 12-point pre-dictions while the EEMD-ARIMAmethod only has negativeerrors of 462 and 800

63 Time-Cost Analysis To verify the applicability of ourmethod we further compare the time cost of these methodsin Figure 8 +e running time of the EEMD-ARIMAmethod is the largest by over 180 s while the ARIMAmodel takes the least time at less than 50 s +e EEMD-RT-ARIMA method time cost is between 70 s and 117 s whichdecreases the time cost by 40ndash80 compared with theEEMD-ARIMA method For example the running time ofthe EEMD-RT-ARIMA method is 6937 s far less than the33720 s of the EEMD-ARIMA method for the 6-pointprediction of host 22 Our method saves up to 80 of thetime cost For the CPU utilization sequence of host 237with strong variability it requires 19046 s to predict thefuture 6-point values using the EEMD-ARIMA methodwhile it only takes 11364 s using the EEMD-RT-ARIMAmethod +e running time is reduced by approximately40 Considering the prediction accuracy effectivenessand time cost our EEMD-RT-ARIMA method is morecost-effective for short-term host utilization prediction incloud computing

7 Conclusions

Host utilization is an indicator of host performancewhose prediction can promote effective resource sched-uling in cloud computing However host utilizationdemonstrates strong randomness and instability caused byusersrsquo random and various resource demands It is difficultto improve prediction accuracy In this paper we proposea hybrid and cost-effective method EEMD-RT-ARIMAfor short-term host utilization prediction in cloud com-puting +e EEMD method is first used to decompose thenonstationary host utilization sequence into a few rela-tively stationary IMF components and an R component+en we calculate the correlation coefficient between eachIMF component and the original data to select efficientIMF components and use RTvalues and average periods toreconstruct these components into three new componentsto reduce error accumulation and time cost Finally three

new components are predicted by the ARIMA model andtheir prediction results are superposed to form the overallprediction results We use the real host utilization tracesfrom a cloud platform to conduct the experiments andcompare our EEMD-RT-ARIMA method with theARIMA model and EEMD-ARIMA method in terms oferror effectiveness and time-cost analysis +e resultsshow that our method is cost-effective and is more suitablefor short-term host utilization prediction in cloudcomputing

Data Availability

+e running example and experimental data used to supportthe findings of this study have been deposited in the Figsharerepository (httpsdoiorg106084m9figshare7679594)

Conflicts of Interest

+e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

+is work was supported by the Shandong ProvincialNatural Science Foundation (ZR2016FM41)

Table 4 Positive and negative error analysis

HostID

Predictionlength

ARIMA EEMD-ARIMA EEMD-RT-ARIMAPositive error

()Negative error

()Positive error

()Negative error

()Positive error

()Negative error

()

839 6-point Null 977 316 585 492 41512-point Null 1378 343 1071 480 474

22 6-point Null 1066 391 670 412 63412-point Null 972 426 609 539 505

237 6-point 039 2668 1431 462 1381 113712-point 039 2751 1744 800 1790 892

0

50

100

150

200

250

300

350

12-point

Tim

e cos

t (s)

ARIMAEEMD-ARIMA EEMD-RT-ARIMA

6-pointHost 839

12-point6-pointHost 22

12-point6-pointHost 237

Figure 8 Time cost of different prediction methods

12 Journal of Electrical and Computer Engineering

References

[1] J J Prevost K Nagothu B Kelley and M Jamshidi ldquoPre-diction of cloud data center networks loads using stochasticand neural modelsrdquo in Proceedings of 2011 6th InternationalConference on System of Systems Engineering pp 276ndash281IEEE Albuquerque NM USA June 2011

[2] M Borkowski S Schulte and C Hochreiner ldquoPredictingcloud resource utilizationrdquo in Proceedings of 9th InternationalConference on Utility and Cloud Computing (UCC) pp 37ndash42IEEE Shanghai China December 2016

[3] M Barati and S Sharifian ldquoA hybrid heuristic-based tunedsupport vector regression model for cloud load predictionrdquoJournal of Supercomputing vol 71 no 11 pp 4235ndash42592015

[4] Z Chen Y Zhu Y Di S Feng and J Geng ldquoA high-accuracyself-adaptive resource demands predicting method in IaaScloud environmentrdquo Neural Network World vol 25 no 5pp 519ndash540 2015

[5] J Chen and Y Wang ldquoA resource demand prediction methodbased on EEMD in cloud computingrdquo Procedia ComputerScience vol 131 pp 116ndash123 2018

[6] H Toumi Z Brahmi Z Benarfa and M M GammoudildquoServer load prediction using stream miningrdquo in Proceedingsof 2017 International Conference on Information Networking(ICOIN) pp 653ndash661 IEEE Da Nang Vietnam January2017

[7] S Di D Kondo and W Cirne ldquoHost load prediction in aGoogle compute cloud with a Bayesian modelrdquo in Proceedingsof 2012 International Conference on High PerformanceComputing Networking Storage and Analysis (SCrsquo12)pp 1ndash11 IEEE Salt Lake City UT USA November 2012

[8] N K Gondhi and P Kailu ldquoPrediction based energy efficientvirtual machine consolidation in cloud computingrdquo in Pro-ceedings of 2015 Second International Conference on Advancesin Computing and Communication Engineering pp 437ndash441IEEE Dehradun India May 2015

[9] A Verma G Dasgupta T K Nayak P De and R KotharildquoServer workload analysis for power minimization usingconsolidationrdquo in Proceedings of the 2009 Conference onUSENIX Annual Technical Conference p 28 USENIX As-sociation San Diego CA USA June 2009

[10] B Song Y Yu Y Zhou Z Wang and S Du ldquoHost loadprediction with long short-term memory in cloud comput-ingrdquo Journal of Supercomputing vol 74 no 12 pp 6554ndash6568 2018

[11] J-J Jheng F-H Tseng H-C Chao and L-D Chou ldquoA novelVM workload prediction using grey forecasting model incloud data centerrdquo in Proceedings of International Conferenceon Information Networking 2014 (ICOIN2014) pp 40ndash45IEEE Phuket +ailand February 2014

[12] A Beloglazov and R Buyya ldquoManaging overloaded hosts fordynamic consolidation of virtual machines in cloud datacenters under quality of service constraintsrdquo IEEE Trans-actions on Parallel and Distributed Systems vol 24 no 7pp 1366ndash1379 2013

[13] M Dabbagh B Hamdaoui M Guizani and A Rayes ldquoAnenergy-efficient VM prediction and migration framework forovercommitted cloudsrdquo IEEE Transactions on Cloud Com-puting vol 6 no 4 pp 955ndash966 2018

[14] D Minarolli A Mazrekaj and B Freisleben ldquoTackling un-certainty in long-term predictions for host overload andunderload detection in cloud computingrdquo Journal of CloudComputing vol 6 no 1 p 4 2017

[15] K Mason M Duggan E Barrett J Duggan and E HowleyldquoPredicting host CPU utilization in the cloud using evolu-tionary neural networksrdquo Future Generation Computer Sys-tems vol 86 pp 162ndash173 2018

[16] D Magalhatildees R N Calheiros R Buyya and D G GomesldquoWorkload modeling for resource usage analysis and simu-lation in cloud computingrdquo Computers amp Electrical Engi-neering vol 47 pp 69ndash81 2015

[17] C Tian Y Wang Y Luo et al ldquoMinimizing content re-organization and tolerating imperfect workload prediction forcloud-based video-on-demand servicesrdquo IEEE Transactionson Services Computing vol 9 no 6 pp 926ndash939 2016

[18] M Verma G R Gangadharan V Ravi and N NarendraldquoResource demand prediction in multi-tenant service cloudsrdquoin Proceedings of 2013 IEEE International Conference on CloudComputing in Emerging Markets (CCEM) pp 1ndash8 IEEEBangalore India October 2013

[19] W Zhang Y Shi L Liu L Cui and Y Zheng ldquoPerformanceand resource prediction at high utilization for n-tier servicesystems in cloud an experiment driven approachrdquo in Pro-ceedings of 2015 IEEE International Conference on Computerand Information Technology Ubiquitous Computing andCommunications Dependable Autonomic and Secure Com-puting Pervasive Intelligence and Computing pp 843ndash848IEEE Liverpool UK October 2015

[20] G Kecskemeti A Kertesz and Z Nemeth ldquoCloud workloadprediction by means of simulationsrdquo in Proceedings of theComputing Frontiers Conference on ZZZ (CFrsquo17) pp 279ndash282ACM Siena Italy May 2017

[21] Y Chen and Z-A Jiang ldquoDynamically predicting the qualityof service batch online and hybrid algorithmsrdquo Journal ofElectrical and Computer Engineering vol 2017 Article ID9547869 10 pages 2017

[22] A Khan X Yan S Tao and N Anerousis ldquoWorkloadcharacterization and prediction in the cloud a multiple timeseries approachrdquo in Proceedings of 2012 IEEE Network Op-erations and Management Symposium pp 1287ndash1294 IEEEMaui HI USA April 2012

[23] A K Mishra J L Hellerstein W Cirne and C R DasldquoTowards characterizing cloud backend workloadsrdquo ACMSIGMETRICS Performance Evaluation Review vol 37 no 4pp 34ndash41 2010

[24] D Gmach J Rolia L Cherkasova and A KemperldquoWorkload analysis and demand prediction of enterprise datacenter applicationsrdquo in Proceedings of 2007 IEEE 10th In-ternational Symposium on Workload Characterizationpp 171ndash180 IEEE Boston MA USA September 2007

[25] F-H Tseng X Wang L-D Chou H-C Chao andV C M Leung ldquoDynamic resource prediction and allocationfor cloud data center using the multiobjective genetic algo-rithmrdquo IEEE Systems Journal vol 12 no 2 pp 1688ndash16992018

[26] G K Shyam and S S Manvi ldquoVirtual resource prediction incloud environment a Bayesian approachrdquo Journal of Networkand Computer Applications vol 65 pp 144ndash154 2016

[27] Y Lu J Panneerselvam L Liu and Y Wu ldquoRVLBPNN aworkload forecasting model for smart cloud computingrdquoScientific Programming vol 2016 Article ID 5635673 9 pages2016

[28] K Rajaram and M P Malarvizhi ldquoUtilization based pre-diction model for resource provisioningrdquo in Proceedings of2017 International Conference on Computer Communicationand Signal Processing (ICCCSP) pp 1ndash6 IEEE ChennaiIndia January 2017

Journal of Electrical and Computer Engineering 13

[29] L Li and A Zhang ldquoResource demand optimization com-bined prediction under cloud computing environment basedon IOWGA operatorrdquo International Journal of Grid andDistributed Computing vol 8 no 3 pp 77ndash86 2015

[30] D Minarolli and B Freisleben ldquoCross-correlation predictionof resource demand for virtual machine resource allocation incloudsrdquo in Proceedings of 2014 Sixth International Conferenceon Computational Intelligence Communication Systems andNetworks pp 119ndash124 IEEE Tetova Macedonia May 2014

[31] W Zhang P Duan L T Yang et al ldquoResource requestsprediction in the cloud computing environment with a deepbelief networkrdquo Software Practice and Experience vol 47no 3 pp 473ndash488 2017

[32] H-B Mi H-M Wang G Yin D-X Shi Y-F Zhou andL Yuan ldquoResource on-demand reconfiguration method forvirtualized data centersrdquo Journal of Software vol 22 no 9pp 2193ndash2205 2011

[33] R N Calheiros E Masoumi R Ranjan and R BuyyaldquoWorkload prediction using ARIMAmodel and its impact oncloud applicationsrsquo QoSrdquo IEEE Transactions on CloudComputing vol 3 no 4 pp 449ndash458 2015

[34] Y Meng R Rao X Zhang and P Hong ldquoCRUPA a con-tainer resource utilization prediction algorithm for auto-scaling based on time series analysisrdquo in Proceedings of2016 International Conference on Progress in Informatics andComputing (PIC) pp 468ndash472 IEEE Shanghai China De-cember 2016

[35] E Dhib N Zangar N Tabbane and K Boussetta ldquoImpact ofseasonal ARIMA workload prediction model on QoE formassively multiplayers online gamingrdquo in Proceedings of 20165th International Conference on Multimedia Computing andSystems (ICMCS) pp 737ndash741 IEEE Marrakech MoroccoSeptember 2016

[36] A Ganapathi Y Chen A Fox R Katz and D PattersonldquoStatistics-driven workload modeling for the cloudrdquo inProceedings of 2010 IEEE 26th International Conference onData EngineeringWorkshops (ICDEW 2010) pp 87ndash92 IEEELong Beach CA USA March 2010

[37] V G Tran V Debusschere and S Bacha ldquoHourly serverworkload forecasting up to 168 hours ahead using seasonalARIMA modelrdquo in Proceedings of 2012 IEEE InternationalConference on Industrial Technology pp 1127ndash1131 IEEEAthens Greece March 2012

[38] D Xu S Yang and H Luo ldquoResearch on generalized fuzzysoft sets theory based combined model for demanded cloudcomputing resource predictionrdquo Chinese Journal of Man-agement Science vol 23 no 5 pp 56ndash64 2015

[39] S Li Y Wang X Qiu D Wang and L Wang ldquoA workloadprediction-basedmulti-VM provisioningmechanism in cloudcomputingrdquo in Proceedings of 2013 15th Asia-Pacific NetworkOperations andManagement Symposium (APNOMS) pp 1ndash6IEEE Hiroshima Japan September 2013

[40] X Fu and C Zhou ldquoPredicted affinity based virtual machineplacement in cloud computing environmentsrdquo IEEE Trans-actions on Cloud Computing vol 99 p 1 2017

[41] Y Jiang C Perng T Li and R Chang ldquoASAP a self-adaptiveprediction system for instant cloud resource demand pro-visioningrdquo in Proceedings of 2011 IEEE 11th InternationalConference on Data Mining pp 1104ndash1109 IEEE VancouverBC Canada December 2011

[42] Z Wu and N E Huang ldquoEnsemble empirical mode de-composition a noise-assisted data analysis methodrdquo Ad-vances in Adaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[43] H Zang L Fan M Guo Z Wei G Sun and L ZhangldquoShort-term wind power interval forecasting based on anEEMD-RT-RVMmodelrdquo Advances in Meteorology vol 2016Article ID 8760780 10 pages 2016

[44] N Safari C Y Chung and G C D Price ldquoNovel multi-stepshort-term wind power prediction framework based onchaotic time series analysis and singular spectrum analysisrdquoIEEE Transactions on Power Systems vol 33 no 1 pp 590ndash601 2018

[45] X Chen H Wang J Huang and H Ren ldquoAPU degradationprediction based on EEMD and Gaussian process regressionrdquoin Proceedings of 2017 International Conference on SensingDiagnostics Prognostics and Control (SDPC) pp 98ndash104IEEE Shanghai China August 2017

[46] C Yan C Yi XWu et al ldquoTurbine fault trend prediction thatbased on EEMD and ARIMA modelsrdquo Journal of GansuSciences vol 28 no 4 pp 100ndash106 2016

[47] M Jin P Li L Zhang et al ldquoA signal feature method and itsapplication based on EEMD fuzzy entropy and GK cluster-ingrdquo ACTA Metrologica Sinica vol 26 no 5 pp 501ndash5052015

[48] E H Norden Z Shen R L Steven et al ldquo+e empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysisrdquoin Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 no 1971 pp 903ndash995 RoyalSociety London UK March 1998

[49] Alibaba cluster-trace-v2017 httpsgithubcomalibabaclusterdata

14 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: A Hybrid Method for Short-Term Host Utilization Prediction ...downloads.hindawi.com/journals/jece/2019/2782349.pdf · terminesthe migratedVMs according to powersavings and workloadbalance.Dabbaghetal.[13]proposedaprediction

RTvalue is larger For instance hosts 839 109 and 1162 withsmaller RT values obtain lower MAPE values using theEEMD-RT-ARIMA method than the EEMD-ARIMAmethod while hosts 22 424 1060 and 237 with largerRT values obtain higher MAPE values using the EEMD-RT-ARIMA method than the EEMD-ARIMA method

Furthermore the difference in theMAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods decreaseswith the increase in the RTvalues from host 839 to host 1162+eir difference changes to negative from host 22 whichindicates that the EEMD-ARIMA method has higher pre-diction accuracy than the EEMD-RT-ARIMA method+en their difference becomes larger as the RT values in-crease +e 12-point host CPU utilization prediction illus-trates this situation For example host 839 with an RTvalueof 2 has a MAPE of 503 for 12-point prediction by usingthe EEMD-RT-ARIMA method which is 545 lower thanthe 1048 of the EEMD-ARIMA method +e MAPE valueof the EEMD-RT-ARIMA method is only 419 lower thanthat of the EEMD-ARIMAmethod for host 1162 with an RTvalue of 10 For host 22 with an RT value of 16 the EEMD-RT-ARIMA method has a slightly higher MAPE of 537than 533 of the EEMD-ARIMAmethodWith the increaseof the RT value the differences of MAPE values betweenEEMD-RT-ARIMA and EEMD-ARIMA methods furtherincrease to 087 196 and 463 for hosts 424 1060 and237 respectively+is indicates that the EEMD-RT-ARIMAmethod is less effective than the EEMD-ARIMA method inCPU utilization prediction for these hosts Undoubtedlythe ARIMA prediction of each component decomposed by

the EEMD method generates a certain error +e super-position of the prediction results of each component causeserror accumulation +e EEMD-RT-ARIMA method re-duces the error accumulation by selecting and recon-structing the efficient IMF components into fewercomponents so it achieves better prediction accuracy thanthe EEMD-ARIMA method for hosts 839 109 and 1162Certainly the selection and reconstruction of efficient IMFcomponents also cause a certain prediction error due to theabsence of nonefficient components especially for non-stationary host utilization sequences When this kind ofprediction error exceeds the error accumulation of ARIMAprediction of all components in the EEMD-ARIMAmethod the EEMD-RT-ARIMA method is no more ef-fective than the EEMD-ARIMA method for the non-stationary CPU utilization prediction of some hosts suchas hosts 22 424 1060 and 237

62 Effectiveness Analysis To verify the effectiveness of ourmethod in short-term prediction we select the experimentalresults of hosts 839 22 and 237 with the minimum middleand maximum RT values for further analysis Figure 7 showsthe prediction results of the EEMD-RT-ARIMAARIMA andEEMD-ARIMA methods We find that the future resourceutilization of host 839 decreases below 11 According to apredefined policy host 839 is underloaded and can be closedto save energy Figure 7 shows that our method is moreaccurate and effective than the ARIMA model In particularour method tends to change with the trend of data variationwhile the ARIMAmodel cannot keep up with it Our methodis more suitable for handling nonstationary time series thanthe ARIMAmodel Additionally the predicting data using theEEMD-RT-ARIMAmethod are closer to the testing data thanthose of the EEMD-ARIMA method for host 839 +ese

Table 2 MAPE values of host utilization prediction

Host ID Prediction length ARIMA () EEMD-ARIMA () EEMD-RT-ARIMA ()

8396-point 977 473 46112-point 1378 1048 50324-point 2310 2153 882

1096-point 1685 606 50512-point 1108 1013 54624-point 4134 2468 836

11626-point 2445 776 64512-point 1781 1314 89524-point 2193 4141 3345

226-point 1066 531 54212-point 822 533 53724-point 1141 1844 1477

4246-point 3754 1612 176512-point 2174 1659 174624-point 8868 7739 1157

10606-point 3571 1005 139412-point 3760 1744 194024-point 7472 5482 9193

2376-point 2229 785 117812-point 2551 1130 159324-point 12611 15843 18951

Table 3 RT values of each host utilizationHost ID 839 109 1162 22 424 1060 237RT value 2 6 10 16 22 24 38

10 Journal of Electrical and Computer Engineering

results show that the EEMD-RT-ARIMA method is moreeffective than the EEMD-ARIMAmethod for CPU utilizationsequences with weak fluctuations When the host utilizationsequence shows stronger fluctuation the absence of non-efficient IMF components will greatly influence the predictionresults +e EEMD-RT-ARIMA method is no more effectivethan the EEMD-ARIMA method for CPU utilization pre-diction of host 237

To further analyze the effectiveness of our method wecalculate the positive and negative errors for 6-point and 12-point predictions of these hosts shown in Table 4 When thenegative error is smaller the prediction method is moresuitable for cloud resource provision because of avoidingunderprediction It can be observed that most of the pre-diction results of the ARIMAmodel are underpredicted (thecells of positive error are all ldquonullrdquo for hosts 839 and 22)Furthermore the negative errors of the ARIMA model are

all far higher than those of other methods for host 237 Forinstance the ARIMA model has a high negative error of upto 2751 for the 12-point prediction of host 237 while theEEMD-ARIMA and EEMD-RT-ARIMA methods only havenegative errors of 800 and 892 respectively If theARIMA model is used to predict future host utilization itcan cause resource underprovision which cannot ensureapplicationsrsquo QoS +e EEMD-RT-ARIMA method achievessmaller negative errors than the EEMD-ARIMA method forhosts 839 and 22 while it has a larger negative error for hosts237 For instance the EEMD-ARIMA method achieves thenegative error of 1071 for the 12-point prediction of host839 while EEMD-RT-ARIMA only achieves the negativeerror of 474 Similarly the EEMD-ARIMA method ob-tains a negative error of 609 for the 12-point prediction ofhost 22 while the EEMD-RT-ARIMA method achieves alower value of only 505 However the situation changes

6789

101112

0 1 2 3 4 5 6 7Host 839 6-point prediction

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(a)

6789

101112

0 2 4 6 8 10 12

CPU

util

izat

ion

()

Host 839 12-point prediction

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(b)

262830323436

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 6-point prediction

(c)

262830323436

0 2 4 6 8 10 12CP

U u

tiliz

atio

n (

)

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 12-point prediction

(d)

1214161820222426

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 6-point prediction

(e)

0 2 4 6 8 10 121214161820222426

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 12-point prediction

(f )

Figure 7 Prediction of the results of different methods

Journal of Electrical and Computer Engineering 11

for host 237 +e EEMD-ARIMA method has a smallernegative error than the EEMD-RT-ARIMA method Forinstance the EEMD-RT-ARIMA method obtains negativeerrors of 1137 and 892 for 6-point and 12-point pre-dictions while the EEMD-ARIMAmethod only has negativeerrors of 462 and 800

63 Time-Cost Analysis To verify the applicability of ourmethod we further compare the time cost of these methodsin Figure 8 +e running time of the EEMD-ARIMAmethod is the largest by over 180 s while the ARIMAmodel takes the least time at less than 50 s +e EEMD-RT-ARIMA method time cost is between 70 s and 117 s whichdecreases the time cost by 40ndash80 compared with theEEMD-ARIMA method For example the running time ofthe EEMD-RT-ARIMA method is 6937 s far less than the33720 s of the EEMD-ARIMA method for the 6-pointprediction of host 22 Our method saves up to 80 of thetime cost For the CPU utilization sequence of host 237with strong variability it requires 19046 s to predict thefuture 6-point values using the EEMD-ARIMA methodwhile it only takes 11364 s using the EEMD-RT-ARIMAmethod +e running time is reduced by approximately40 Considering the prediction accuracy effectivenessand time cost our EEMD-RT-ARIMA method is morecost-effective for short-term host utilization prediction incloud computing

7 Conclusions

Host utilization is an indicator of host performancewhose prediction can promote effective resource sched-uling in cloud computing However host utilizationdemonstrates strong randomness and instability caused byusersrsquo random and various resource demands It is difficultto improve prediction accuracy In this paper we proposea hybrid and cost-effective method EEMD-RT-ARIMAfor short-term host utilization prediction in cloud com-puting +e EEMD method is first used to decompose thenonstationary host utilization sequence into a few rela-tively stationary IMF components and an R component+en we calculate the correlation coefficient between eachIMF component and the original data to select efficientIMF components and use RTvalues and average periods toreconstruct these components into three new componentsto reduce error accumulation and time cost Finally three

new components are predicted by the ARIMA model andtheir prediction results are superposed to form the overallprediction results We use the real host utilization tracesfrom a cloud platform to conduct the experiments andcompare our EEMD-RT-ARIMA method with theARIMA model and EEMD-ARIMA method in terms oferror effectiveness and time-cost analysis +e resultsshow that our method is cost-effective and is more suitablefor short-term host utilization prediction in cloudcomputing

Data Availability

+e running example and experimental data used to supportthe findings of this study have been deposited in the Figsharerepository (httpsdoiorg106084m9figshare7679594)

Conflicts of Interest

+e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

+is work was supported by the Shandong ProvincialNatural Science Foundation (ZR2016FM41)

Table 4 Positive and negative error analysis

HostID

Predictionlength

ARIMA EEMD-ARIMA EEMD-RT-ARIMAPositive error

()Negative error

()Positive error

()Negative error

()Positive error

()Negative error

()

839 6-point Null 977 316 585 492 41512-point Null 1378 343 1071 480 474

22 6-point Null 1066 391 670 412 63412-point Null 972 426 609 539 505

237 6-point 039 2668 1431 462 1381 113712-point 039 2751 1744 800 1790 892

0

50

100

150

200

250

300

350

12-point

Tim

e cos

t (s)

ARIMAEEMD-ARIMA EEMD-RT-ARIMA

6-pointHost 839

12-point6-pointHost 22

12-point6-pointHost 237

Figure 8 Time cost of different prediction methods

12 Journal of Electrical and Computer Engineering

References

[1] J J Prevost K Nagothu B Kelley and M Jamshidi ldquoPre-diction of cloud data center networks loads using stochasticand neural modelsrdquo in Proceedings of 2011 6th InternationalConference on System of Systems Engineering pp 276ndash281IEEE Albuquerque NM USA June 2011

[2] M Borkowski S Schulte and C Hochreiner ldquoPredictingcloud resource utilizationrdquo in Proceedings of 9th InternationalConference on Utility and Cloud Computing (UCC) pp 37ndash42IEEE Shanghai China December 2016

[3] M Barati and S Sharifian ldquoA hybrid heuristic-based tunedsupport vector regression model for cloud load predictionrdquoJournal of Supercomputing vol 71 no 11 pp 4235ndash42592015

[4] Z Chen Y Zhu Y Di S Feng and J Geng ldquoA high-accuracyself-adaptive resource demands predicting method in IaaScloud environmentrdquo Neural Network World vol 25 no 5pp 519ndash540 2015

[5] J Chen and Y Wang ldquoA resource demand prediction methodbased on EEMD in cloud computingrdquo Procedia ComputerScience vol 131 pp 116ndash123 2018

[6] H Toumi Z Brahmi Z Benarfa and M M GammoudildquoServer load prediction using stream miningrdquo in Proceedingsof 2017 International Conference on Information Networking(ICOIN) pp 653ndash661 IEEE Da Nang Vietnam January2017

[7] S Di D Kondo and W Cirne ldquoHost load prediction in aGoogle compute cloud with a Bayesian modelrdquo in Proceedingsof 2012 International Conference on High PerformanceComputing Networking Storage and Analysis (SCrsquo12)pp 1ndash11 IEEE Salt Lake City UT USA November 2012

[8] N K Gondhi and P Kailu ldquoPrediction based energy efficientvirtual machine consolidation in cloud computingrdquo in Pro-ceedings of 2015 Second International Conference on Advancesin Computing and Communication Engineering pp 437ndash441IEEE Dehradun India May 2015

[9] A Verma G Dasgupta T K Nayak P De and R KotharildquoServer workload analysis for power minimization usingconsolidationrdquo in Proceedings of the 2009 Conference onUSENIX Annual Technical Conference p 28 USENIX As-sociation San Diego CA USA June 2009

[10] B Song Y Yu Y Zhou Z Wang and S Du ldquoHost loadprediction with long short-term memory in cloud comput-ingrdquo Journal of Supercomputing vol 74 no 12 pp 6554ndash6568 2018

[11] J-J Jheng F-H Tseng H-C Chao and L-D Chou ldquoA novelVM workload prediction using grey forecasting model incloud data centerrdquo in Proceedings of International Conferenceon Information Networking 2014 (ICOIN2014) pp 40ndash45IEEE Phuket +ailand February 2014

[12] A Beloglazov and R Buyya ldquoManaging overloaded hosts fordynamic consolidation of virtual machines in cloud datacenters under quality of service constraintsrdquo IEEE Trans-actions on Parallel and Distributed Systems vol 24 no 7pp 1366ndash1379 2013

[13] M Dabbagh B Hamdaoui M Guizani and A Rayes ldquoAnenergy-efficient VM prediction and migration framework forovercommitted cloudsrdquo IEEE Transactions on Cloud Com-puting vol 6 no 4 pp 955ndash966 2018

[14] D Minarolli A Mazrekaj and B Freisleben ldquoTackling un-certainty in long-term predictions for host overload andunderload detection in cloud computingrdquo Journal of CloudComputing vol 6 no 1 p 4 2017

[15] K Mason M Duggan E Barrett J Duggan and E HowleyldquoPredicting host CPU utilization in the cloud using evolu-tionary neural networksrdquo Future Generation Computer Sys-tems vol 86 pp 162ndash173 2018

[16] D Magalhatildees R N Calheiros R Buyya and D G GomesldquoWorkload modeling for resource usage analysis and simu-lation in cloud computingrdquo Computers amp Electrical Engi-neering vol 47 pp 69ndash81 2015

[17] C Tian Y Wang Y Luo et al ldquoMinimizing content re-organization and tolerating imperfect workload prediction forcloud-based video-on-demand servicesrdquo IEEE Transactionson Services Computing vol 9 no 6 pp 926ndash939 2016

[18] M Verma G R Gangadharan V Ravi and N NarendraldquoResource demand prediction in multi-tenant service cloudsrdquoin Proceedings of 2013 IEEE International Conference on CloudComputing in Emerging Markets (CCEM) pp 1ndash8 IEEEBangalore India October 2013

[19] W Zhang Y Shi L Liu L Cui and Y Zheng ldquoPerformanceand resource prediction at high utilization for n-tier servicesystems in cloud an experiment driven approachrdquo in Pro-ceedings of 2015 IEEE International Conference on Computerand Information Technology Ubiquitous Computing andCommunications Dependable Autonomic and Secure Com-puting Pervasive Intelligence and Computing pp 843ndash848IEEE Liverpool UK October 2015

[20] G Kecskemeti A Kertesz and Z Nemeth ldquoCloud workloadprediction by means of simulationsrdquo in Proceedings of theComputing Frontiers Conference on ZZZ (CFrsquo17) pp 279ndash282ACM Siena Italy May 2017

[21] Y Chen and Z-A Jiang ldquoDynamically predicting the qualityof service batch online and hybrid algorithmsrdquo Journal ofElectrical and Computer Engineering vol 2017 Article ID9547869 10 pages 2017

[22] A Khan X Yan S Tao and N Anerousis ldquoWorkloadcharacterization and prediction in the cloud a multiple timeseries approachrdquo in Proceedings of 2012 IEEE Network Op-erations and Management Symposium pp 1287ndash1294 IEEEMaui HI USA April 2012

[23] A K Mishra J L Hellerstein W Cirne and C R DasldquoTowards characterizing cloud backend workloadsrdquo ACMSIGMETRICS Performance Evaluation Review vol 37 no 4pp 34ndash41 2010

[24] D Gmach J Rolia L Cherkasova and A KemperldquoWorkload analysis and demand prediction of enterprise datacenter applicationsrdquo in Proceedings of 2007 IEEE 10th In-ternational Symposium on Workload Characterizationpp 171ndash180 IEEE Boston MA USA September 2007

[25] F-H Tseng X Wang L-D Chou H-C Chao andV C M Leung ldquoDynamic resource prediction and allocationfor cloud data center using the multiobjective genetic algo-rithmrdquo IEEE Systems Journal vol 12 no 2 pp 1688ndash16992018

[26] G K Shyam and S S Manvi ldquoVirtual resource prediction incloud environment a Bayesian approachrdquo Journal of Networkand Computer Applications vol 65 pp 144ndash154 2016

[27] Y Lu J Panneerselvam L Liu and Y Wu ldquoRVLBPNN aworkload forecasting model for smart cloud computingrdquoScientific Programming vol 2016 Article ID 5635673 9 pages2016

[28] K Rajaram and M P Malarvizhi ldquoUtilization based pre-diction model for resource provisioningrdquo in Proceedings of2017 International Conference on Computer Communicationand Signal Processing (ICCCSP) pp 1ndash6 IEEE ChennaiIndia January 2017

Journal of Electrical and Computer Engineering 13

[29] L Li and A Zhang ldquoResource demand optimization com-bined prediction under cloud computing environment basedon IOWGA operatorrdquo International Journal of Grid andDistributed Computing vol 8 no 3 pp 77ndash86 2015

[30] D Minarolli and B Freisleben ldquoCross-correlation predictionof resource demand for virtual machine resource allocation incloudsrdquo in Proceedings of 2014 Sixth International Conferenceon Computational Intelligence Communication Systems andNetworks pp 119ndash124 IEEE Tetova Macedonia May 2014

[31] W Zhang P Duan L T Yang et al ldquoResource requestsprediction in the cloud computing environment with a deepbelief networkrdquo Software Practice and Experience vol 47no 3 pp 473ndash488 2017

[32] H-B Mi H-M Wang G Yin D-X Shi Y-F Zhou andL Yuan ldquoResource on-demand reconfiguration method forvirtualized data centersrdquo Journal of Software vol 22 no 9pp 2193ndash2205 2011

[33] R N Calheiros E Masoumi R Ranjan and R BuyyaldquoWorkload prediction using ARIMAmodel and its impact oncloud applicationsrsquo QoSrdquo IEEE Transactions on CloudComputing vol 3 no 4 pp 449ndash458 2015

[34] Y Meng R Rao X Zhang and P Hong ldquoCRUPA a con-tainer resource utilization prediction algorithm for auto-scaling based on time series analysisrdquo in Proceedings of2016 International Conference on Progress in Informatics andComputing (PIC) pp 468ndash472 IEEE Shanghai China De-cember 2016

[35] E Dhib N Zangar N Tabbane and K Boussetta ldquoImpact ofseasonal ARIMA workload prediction model on QoE formassively multiplayers online gamingrdquo in Proceedings of 20165th International Conference on Multimedia Computing andSystems (ICMCS) pp 737ndash741 IEEE Marrakech MoroccoSeptember 2016

[36] A Ganapathi Y Chen A Fox R Katz and D PattersonldquoStatistics-driven workload modeling for the cloudrdquo inProceedings of 2010 IEEE 26th International Conference onData EngineeringWorkshops (ICDEW 2010) pp 87ndash92 IEEELong Beach CA USA March 2010

[37] V G Tran V Debusschere and S Bacha ldquoHourly serverworkload forecasting up to 168 hours ahead using seasonalARIMA modelrdquo in Proceedings of 2012 IEEE InternationalConference on Industrial Technology pp 1127ndash1131 IEEEAthens Greece March 2012

[38] D Xu S Yang and H Luo ldquoResearch on generalized fuzzysoft sets theory based combined model for demanded cloudcomputing resource predictionrdquo Chinese Journal of Man-agement Science vol 23 no 5 pp 56ndash64 2015

[39] S Li Y Wang X Qiu D Wang and L Wang ldquoA workloadprediction-basedmulti-VM provisioningmechanism in cloudcomputingrdquo in Proceedings of 2013 15th Asia-Pacific NetworkOperations andManagement Symposium (APNOMS) pp 1ndash6IEEE Hiroshima Japan September 2013

[40] X Fu and C Zhou ldquoPredicted affinity based virtual machineplacement in cloud computing environmentsrdquo IEEE Trans-actions on Cloud Computing vol 99 p 1 2017

[41] Y Jiang C Perng T Li and R Chang ldquoASAP a self-adaptiveprediction system for instant cloud resource demand pro-visioningrdquo in Proceedings of 2011 IEEE 11th InternationalConference on Data Mining pp 1104ndash1109 IEEE VancouverBC Canada December 2011

[42] Z Wu and N E Huang ldquoEnsemble empirical mode de-composition a noise-assisted data analysis methodrdquo Ad-vances in Adaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[43] H Zang L Fan M Guo Z Wei G Sun and L ZhangldquoShort-term wind power interval forecasting based on anEEMD-RT-RVMmodelrdquo Advances in Meteorology vol 2016Article ID 8760780 10 pages 2016

[44] N Safari C Y Chung and G C D Price ldquoNovel multi-stepshort-term wind power prediction framework based onchaotic time series analysis and singular spectrum analysisrdquoIEEE Transactions on Power Systems vol 33 no 1 pp 590ndash601 2018

[45] X Chen H Wang J Huang and H Ren ldquoAPU degradationprediction based on EEMD and Gaussian process regressionrdquoin Proceedings of 2017 International Conference on SensingDiagnostics Prognostics and Control (SDPC) pp 98ndash104IEEE Shanghai China August 2017

[46] C Yan C Yi XWu et al ldquoTurbine fault trend prediction thatbased on EEMD and ARIMA modelsrdquo Journal of GansuSciences vol 28 no 4 pp 100ndash106 2016

[47] M Jin P Li L Zhang et al ldquoA signal feature method and itsapplication based on EEMD fuzzy entropy and GK cluster-ingrdquo ACTA Metrologica Sinica vol 26 no 5 pp 501ndash5052015

[48] E H Norden Z Shen R L Steven et al ldquo+e empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysisrdquoin Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 no 1971 pp 903ndash995 RoyalSociety London UK March 1998

[49] Alibaba cluster-trace-v2017 httpsgithubcomalibabaclusterdata

14 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 11: A Hybrid Method for Short-Term Host Utilization Prediction ...downloads.hindawi.com/journals/jece/2019/2782349.pdf · terminesthe migratedVMs according to powersavings and workloadbalance.Dabbaghetal.[13]proposedaprediction

results show that the EEMD-RT-ARIMA method is moreeffective than the EEMD-ARIMAmethod for CPU utilizationsequences with weak fluctuations When the host utilizationsequence shows stronger fluctuation the absence of non-efficient IMF components will greatly influence the predictionresults +e EEMD-RT-ARIMA method is no more effectivethan the EEMD-ARIMA method for CPU utilization pre-diction of host 237

To further analyze the effectiveness of our method wecalculate the positive and negative errors for 6-point and 12-point predictions of these hosts shown in Table 4 When thenegative error is smaller the prediction method is moresuitable for cloud resource provision because of avoidingunderprediction It can be observed that most of the pre-diction results of the ARIMAmodel are underpredicted (thecells of positive error are all ldquonullrdquo for hosts 839 and 22)Furthermore the negative errors of the ARIMA model are

all far higher than those of other methods for host 237 Forinstance the ARIMA model has a high negative error of upto 2751 for the 12-point prediction of host 237 while theEEMD-ARIMA and EEMD-RT-ARIMA methods only havenegative errors of 800 and 892 respectively If theARIMA model is used to predict future host utilization itcan cause resource underprovision which cannot ensureapplicationsrsquo QoS +e EEMD-RT-ARIMA method achievessmaller negative errors than the EEMD-ARIMA method forhosts 839 and 22 while it has a larger negative error for hosts237 For instance the EEMD-ARIMA method achieves thenegative error of 1071 for the 12-point prediction of host839 while EEMD-RT-ARIMA only achieves the negativeerror of 474 Similarly the EEMD-ARIMA method ob-tains a negative error of 609 for the 12-point prediction ofhost 22 while the EEMD-RT-ARIMA method achieves alower value of only 505 However the situation changes

6789

101112

0 1 2 3 4 5 6 7Host 839 6-point prediction

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(a)

6789

101112

0 2 4 6 8 10 12

CPU

util

izat

ion

()

Host 839 12-point prediction

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

(b)

262830323436

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 6-point prediction

(c)

262830323436

0 2 4 6 8 10 12CP

U u

tiliz

atio

n (

)

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 22 12-point prediction

(d)

1214161820222426

0 1 2 3 4 5 6 7

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 6-point prediction

(e)

0 2 4 6 8 10 121214161820222426

CPU

util

izat

ion

()

ARIMAEEMD-ARIMA

EEMD-RT-ARIMATesting data

Host 237 12-point prediction

(f )

Figure 7 Prediction of the results of different methods

Journal of Electrical and Computer Engineering 11

for host 237 +e EEMD-ARIMA method has a smallernegative error than the EEMD-RT-ARIMA method Forinstance the EEMD-RT-ARIMA method obtains negativeerrors of 1137 and 892 for 6-point and 12-point pre-dictions while the EEMD-ARIMAmethod only has negativeerrors of 462 and 800

63 Time-Cost Analysis To verify the applicability of ourmethod we further compare the time cost of these methodsin Figure 8 +e running time of the EEMD-ARIMAmethod is the largest by over 180 s while the ARIMAmodel takes the least time at less than 50 s +e EEMD-RT-ARIMA method time cost is between 70 s and 117 s whichdecreases the time cost by 40ndash80 compared with theEEMD-ARIMA method For example the running time ofthe EEMD-RT-ARIMA method is 6937 s far less than the33720 s of the EEMD-ARIMA method for the 6-pointprediction of host 22 Our method saves up to 80 of thetime cost For the CPU utilization sequence of host 237with strong variability it requires 19046 s to predict thefuture 6-point values using the EEMD-ARIMA methodwhile it only takes 11364 s using the EEMD-RT-ARIMAmethod +e running time is reduced by approximately40 Considering the prediction accuracy effectivenessand time cost our EEMD-RT-ARIMA method is morecost-effective for short-term host utilization prediction incloud computing

7 Conclusions

Host utilization is an indicator of host performancewhose prediction can promote effective resource sched-uling in cloud computing However host utilizationdemonstrates strong randomness and instability caused byusersrsquo random and various resource demands It is difficultto improve prediction accuracy In this paper we proposea hybrid and cost-effective method EEMD-RT-ARIMAfor short-term host utilization prediction in cloud com-puting +e EEMD method is first used to decompose thenonstationary host utilization sequence into a few rela-tively stationary IMF components and an R component+en we calculate the correlation coefficient between eachIMF component and the original data to select efficientIMF components and use RTvalues and average periods toreconstruct these components into three new componentsto reduce error accumulation and time cost Finally three

new components are predicted by the ARIMA model andtheir prediction results are superposed to form the overallprediction results We use the real host utilization tracesfrom a cloud platform to conduct the experiments andcompare our EEMD-RT-ARIMA method with theARIMA model and EEMD-ARIMA method in terms oferror effectiveness and time-cost analysis +e resultsshow that our method is cost-effective and is more suitablefor short-term host utilization prediction in cloudcomputing

Data Availability

+e running example and experimental data used to supportthe findings of this study have been deposited in the Figsharerepository (httpsdoiorg106084m9figshare7679594)

Conflicts of Interest

+e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

+is work was supported by the Shandong ProvincialNatural Science Foundation (ZR2016FM41)

Table 4 Positive and negative error analysis

HostID

Predictionlength

ARIMA EEMD-ARIMA EEMD-RT-ARIMAPositive error

()Negative error

()Positive error

()Negative error

()Positive error

()Negative error

()

839 6-point Null 977 316 585 492 41512-point Null 1378 343 1071 480 474

22 6-point Null 1066 391 670 412 63412-point Null 972 426 609 539 505

237 6-point 039 2668 1431 462 1381 113712-point 039 2751 1744 800 1790 892

0

50

100

150

200

250

300

350

12-point

Tim

e cos

t (s)

ARIMAEEMD-ARIMA EEMD-RT-ARIMA

6-pointHost 839

12-point6-pointHost 22

12-point6-pointHost 237

Figure 8 Time cost of different prediction methods

12 Journal of Electrical and Computer Engineering

References

[1] J J Prevost K Nagothu B Kelley and M Jamshidi ldquoPre-diction of cloud data center networks loads using stochasticand neural modelsrdquo in Proceedings of 2011 6th InternationalConference on System of Systems Engineering pp 276ndash281IEEE Albuquerque NM USA June 2011

[2] M Borkowski S Schulte and C Hochreiner ldquoPredictingcloud resource utilizationrdquo in Proceedings of 9th InternationalConference on Utility and Cloud Computing (UCC) pp 37ndash42IEEE Shanghai China December 2016

[3] M Barati and S Sharifian ldquoA hybrid heuristic-based tunedsupport vector regression model for cloud load predictionrdquoJournal of Supercomputing vol 71 no 11 pp 4235ndash42592015

[4] Z Chen Y Zhu Y Di S Feng and J Geng ldquoA high-accuracyself-adaptive resource demands predicting method in IaaScloud environmentrdquo Neural Network World vol 25 no 5pp 519ndash540 2015

[5] J Chen and Y Wang ldquoA resource demand prediction methodbased on EEMD in cloud computingrdquo Procedia ComputerScience vol 131 pp 116ndash123 2018

[6] H Toumi Z Brahmi Z Benarfa and M M GammoudildquoServer load prediction using stream miningrdquo in Proceedingsof 2017 International Conference on Information Networking(ICOIN) pp 653ndash661 IEEE Da Nang Vietnam January2017

[7] S Di D Kondo and W Cirne ldquoHost load prediction in aGoogle compute cloud with a Bayesian modelrdquo in Proceedingsof 2012 International Conference on High PerformanceComputing Networking Storage and Analysis (SCrsquo12)pp 1ndash11 IEEE Salt Lake City UT USA November 2012

[8] N K Gondhi and P Kailu ldquoPrediction based energy efficientvirtual machine consolidation in cloud computingrdquo in Pro-ceedings of 2015 Second International Conference on Advancesin Computing and Communication Engineering pp 437ndash441IEEE Dehradun India May 2015

[9] A Verma G Dasgupta T K Nayak P De and R KotharildquoServer workload analysis for power minimization usingconsolidationrdquo in Proceedings of the 2009 Conference onUSENIX Annual Technical Conference p 28 USENIX As-sociation San Diego CA USA June 2009

[10] B Song Y Yu Y Zhou Z Wang and S Du ldquoHost loadprediction with long short-term memory in cloud comput-ingrdquo Journal of Supercomputing vol 74 no 12 pp 6554ndash6568 2018

[11] J-J Jheng F-H Tseng H-C Chao and L-D Chou ldquoA novelVM workload prediction using grey forecasting model incloud data centerrdquo in Proceedings of International Conferenceon Information Networking 2014 (ICOIN2014) pp 40ndash45IEEE Phuket +ailand February 2014

[12] A Beloglazov and R Buyya ldquoManaging overloaded hosts fordynamic consolidation of virtual machines in cloud datacenters under quality of service constraintsrdquo IEEE Trans-actions on Parallel and Distributed Systems vol 24 no 7pp 1366ndash1379 2013

[13] M Dabbagh B Hamdaoui M Guizani and A Rayes ldquoAnenergy-efficient VM prediction and migration framework forovercommitted cloudsrdquo IEEE Transactions on Cloud Com-puting vol 6 no 4 pp 955ndash966 2018

[14] D Minarolli A Mazrekaj and B Freisleben ldquoTackling un-certainty in long-term predictions for host overload andunderload detection in cloud computingrdquo Journal of CloudComputing vol 6 no 1 p 4 2017

[15] K Mason M Duggan E Barrett J Duggan and E HowleyldquoPredicting host CPU utilization in the cloud using evolu-tionary neural networksrdquo Future Generation Computer Sys-tems vol 86 pp 162ndash173 2018

[16] D Magalhatildees R N Calheiros R Buyya and D G GomesldquoWorkload modeling for resource usage analysis and simu-lation in cloud computingrdquo Computers amp Electrical Engi-neering vol 47 pp 69ndash81 2015

[17] C Tian Y Wang Y Luo et al ldquoMinimizing content re-organization and tolerating imperfect workload prediction forcloud-based video-on-demand servicesrdquo IEEE Transactionson Services Computing vol 9 no 6 pp 926ndash939 2016

[18] M Verma G R Gangadharan V Ravi and N NarendraldquoResource demand prediction in multi-tenant service cloudsrdquoin Proceedings of 2013 IEEE International Conference on CloudComputing in Emerging Markets (CCEM) pp 1ndash8 IEEEBangalore India October 2013

[19] W Zhang Y Shi L Liu L Cui and Y Zheng ldquoPerformanceand resource prediction at high utilization for n-tier servicesystems in cloud an experiment driven approachrdquo in Pro-ceedings of 2015 IEEE International Conference on Computerand Information Technology Ubiquitous Computing andCommunications Dependable Autonomic and Secure Com-puting Pervasive Intelligence and Computing pp 843ndash848IEEE Liverpool UK October 2015

[20] G Kecskemeti A Kertesz and Z Nemeth ldquoCloud workloadprediction by means of simulationsrdquo in Proceedings of theComputing Frontiers Conference on ZZZ (CFrsquo17) pp 279ndash282ACM Siena Italy May 2017

[21] Y Chen and Z-A Jiang ldquoDynamically predicting the qualityof service batch online and hybrid algorithmsrdquo Journal ofElectrical and Computer Engineering vol 2017 Article ID9547869 10 pages 2017

[22] A Khan X Yan S Tao and N Anerousis ldquoWorkloadcharacterization and prediction in the cloud a multiple timeseries approachrdquo in Proceedings of 2012 IEEE Network Op-erations and Management Symposium pp 1287ndash1294 IEEEMaui HI USA April 2012

[23] A K Mishra J L Hellerstein W Cirne and C R DasldquoTowards characterizing cloud backend workloadsrdquo ACMSIGMETRICS Performance Evaluation Review vol 37 no 4pp 34ndash41 2010

[24] D Gmach J Rolia L Cherkasova and A KemperldquoWorkload analysis and demand prediction of enterprise datacenter applicationsrdquo in Proceedings of 2007 IEEE 10th In-ternational Symposium on Workload Characterizationpp 171ndash180 IEEE Boston MA USA September 2007

[25] F-H Tseng X Wang L-D Chou H-C Chao andV C M Leung ldquoDynamic resource prediction and allocationfor cloud data center using the multiobjective genetic algo-rithmrdquo IEEE Systems Journal vol 12 no 2 pp 1688ndash16992018

[26] G K Shyam and S S Manvi ldquoVirtual resource prediction incloud environment a Bayesian approachrdquo Journal of Networkand Computer Applications vol 65 pp 144ndash154 2016

[27] Y Lu J Panneerselvam L Liu and Y Wu ldquoRVLBPNN aworkload forecasting model for smart cloud computingrdquoScientific Programming vol 2016 Article ID 5635673 9 pages2016

[28] K Rajaram and M P Malarvizhi ldquoUtilization based pre-diction model for resource provisioningrdquo in Proceedings of2017 International Conference on Computer Communicationand Signal Processing (ICCCSP) pp 1ndash6 IEEE ChennaiIndia January 2017

Journal of Electrical and Computer Engineering 13

[29] L Li and A Zhang ldquoResource demand optimization com-bined prediction under cloud computing environment basedon IOWGA operatorrdquo International Journal of Grid andDistributed Computing vol 8 no 3 pp 77ndash86 2015

[30] D Minarolli and B Freisleben ldquoCross-correlation predictionof resource demand for virtual machine resource allocation incloudsrdquo in Proceedings of 2014 Sixth International Conferenceon Computational Intelligence Communication Systems andNetworks pp 119ndash124 IEEE Tetova Macedonia May 2014

[31] W Zhang P Duan L T Yang et al ldquoResource requestsprediction in the cloud computing environment with a deepbelief networkrdquo Software Practice and Experience vol 47no 3 pp 473ndash488 2017

[32] H-B Mi H-M Wang G Yin D-X Shi Y-F Zhou andL Yuan ldquoResource on-demand reconfiguration method forvirtualized data centersrdquo Journal of Software vol 22 no 9pp 2193ndash2205 2011

[33] R N Calheiros E Masoumi R Ranjan and R BuyyaldquoWorkload prediction using ARIMAmodel and its impact oncloud applicationsrsquo QoSrdquo IEEE Transactions on CloudComputing vol 3 no 4 pp 449ndash458 2015

[34] Y Meng R Rao X Zhang and P Hong ldquoCRUPA a con-tainer resource utilization prediction algorithm for auto-scaling based on time series analysisrdquo in Proceedings of2016 International Conference on Progress in Informatics andComputing (PIC) pp 468ndash472 IEEE Shanghai China De-cember 2016

[35] E Dhib N Zangar N Tabbane and K Boussetta ldquoImpact ofseasonal ARIMA workload prediction model on QoE formassively multiplayers online gamingrdquo in Proceedings of 20165th International Conference on Multimedia Computing andSystems (ICMCS) pp 737ndash741 IEEE Marrakech MoroccoSeptember 2016

[36] A Ganapathi Y Chen A Fox R Katz and D PattersonldquoStatistics-driven workload modeling for the cloudrdquo inProceedings of 2010 IEEE 26th International Conference onData EngineeringWorkshops (ICDEW 2010) pp 87ndash92 IEEELong Beach CA USA March 2010

[37] V G Tran V Debusschere and S Bacha ldquoHourly serverworkload forecasting up to 168 hours ahead using seasonalARIMA modelrdquo in Proceedings of 2012 IEEE InternationalConference on Industrial Technology pp 1127ndash1131 IEEEAthens Greece March 2012

[38] D Xu S Yang and H Luo ldquoResearch on generalized fuzzysoft sets theory based combined model for demanded cloudcomputing resource predictionrdquo Chinese Journal of Man-agement Science vol 23 no 5 pp 56ndash64 2015

[39] S Li Y Wang X Qiu D Wang and L Wang ldquoA workloadprediction-basedmulti-VM provisioningmechanism in cloudcomputingrdquo in Proceedings of 2013 15th Asia-Pacific NetworkOperations andManagement Symposium (APNOMS) pp 1ndash6IEEE Hiroshima Japan September 2013

[40] X Fu and C Zhou ldquoPredicted affinity based virtual machineplacement in cloud computing environmentsrdquo IEEE Trans-actions on Cloud Computing vol 99 p 1 2017

[41] Y Jiang C Perng T Li and R Chang ldquoASAP a self-adaptiveprediction system for instant cloud resource demand pro-visioningrdquo in Proceedings of 2011 IEEE 11th InternationalConference on Data Mining pp 1104ndash1109 IEEE VancouverBC Canada December 2011

[42] Z Wu and N E Huang ldquoEnsemble empirical mode de-composition a noise-assisted data analysis methodrdquo Ad-vances in Adaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[43] H Zang L Fan M Guo Z Wei G Sun and L ZhangldquoShort-term wind power interval forecasting based on anEEMD-RT-RVMmodelrdquo Advances in Meteorology vol 2016Article ID 8760780 10 pages 2016

[44] N Safari C Y Chung and G C D Price ldquoNovel multi-stepshort-term wind power prediction framework based onchaotic time series analysis and singular spectrum analysisrdquoIEEE Transactions on Power Systems vol 33 no 1 pp 590ndash601 2018

[45] X Chen H Wang J Huang and H Ren ldquoAPU degradationprediction based on EEMD and Gaussian process regressionrdquoin Proceedings of 2017 International Conference on SensingDiagnostics Prognostics and Control (SDPC) pp 98ndash104IEEE Shanghai China August 2017

[46] C Yan C Yi XWu et al ldquoTurbine fault trend prediction thatbased on EEMD and ARIMA modelsrdquo Journal of GansuSciences vol 28 no 4 pp 100ndash106 2016

[47] M Jin P Li L Zhang et al ldquoA signal feature method and itsapplication based on EEMD fuzzy entropy and GK cluster-ingrdquo ACTA Metrologica Sinica vol 26 no 5 pp 501ndash5052015

[48] E H Norden Z Shen R L Steven et al ldquo+e empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysisrdquoin Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 no 1971 pp 903ndash995 RoyalSociety London UK March 1998

[49] Alibaba cluster-trace-v2017 httpsgithubcomalibabaclusterdata

14 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 12: A Hybrid Method for Short-Term Host Utilization Prediction ...downloads.hindawi.com/journals/jece/2019/2782349.pdf · terminesthe migratedVMs according to powersavings and workloadbalance.Dabbaghetal.[13]proposedaprediction

for host 237 +e EEMD-ARIMA method has a smallernegative error than the EEMD-RT-ARIMA method Forinstance the EEMD-RT-ARIMA method obtains negativeerrors of 1137 and 892 for 6-point and 12-point pre-dictions while the EEMD-ARIMAmethod only has negativeerrors of 462 and 800

63 Time-Cost Analysis To verify the applicability of ourmethod we further compare the time cost of these methodsin Figure 8 +e running time of the EEMD-ARIMAmethod is the largest by over 180 s while the ARIMAmodel takes the least time at less than 50 s +e EEMD-RT-ARIMA method time cost is between 70 s and 117 s whichdecreases the time cost by 40ndash80 compared with theEEMD-ARIMA method For example the running time ofthe EEMD-RT-ARIMA method is 6937 s far less than the33720 s of the EEMD-ARIMA method for the 6-pointprediction of host 22 Our method saves up to 80 of thetime cost For the CPU utilization sequence of host 237with strong variability it requires 19046 s to predict thefuture 6-point values using the EEMD-ARIMA methodwhile it only takes 11364 s using the EEMD-RT-ARIMAmethod +e running time is reduced by approximately40 Considering the prediction accuracy effectivenessand time cost our EEMD-RT-ARIMA method is morecost-effective for short-term host utilization prediction incloud computing

7 Conclusions

Host utilization is an indicator of host performancewhose prediction can promote effective resource sched-uling in cloud computing However host utilizationdemonstrates strong randomness and instability caused byusersrsquo random and various resource demands It is difficultto improve prediction accuracy In this paper we proposea hybrid and cost-effective method EEMD-RT-ARIMAfor short-term host utilization prediction in cloud com-puting +e EEMD method is first used to decompose thenonstationary host utilization sequence into a few rela-tively stationary IMF components and an R component+en we calculate the correlation coefficient between eachIMF component and the original data to select efficientIMF components and use RTvalues and average periods toreconstruct these components into three new componentsto reduce error accumulation and time cost Finally three

new components are predicted by the ARIMA model andtheir prediction results are superposed to form the overallprediction results We use the real host utilization tracesfrom a cloud platform to conduct the experiments andcompare our EEMD-RT-ARIMA method with theARIMA model and EEMD-ARIMA method in terms oferror effectiveness and time-cost analysis +e resultsshow that our method is cost-effective and is more suitablefor short-term host utilization prediction in cloudcomputing

Data Availability

+e running example and experimental data used to supportthe findings of this study have been deposited in the Figsharerepository (httpsdoiorg106084m9figshare7679594)

Conflicts of Interest

+e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

+is work was supported by the Shandong ProvincialNatural Science Foundation (ZR2016FM41)

Table 4 Positive and negative error analysis

HostID

Predictionlength

ARIMA EEMD-ARIMA EEMD-RT-ARIMAPositive error

()Negative error

()Positive error

()Negative error

()Positive error

()Negative error

()

839 6-point Null 977 316 585 492 41512-point Null 1378 343 1071 480 474

22 6-point Null 1066 391 670 412 63412-point Null 972 426 609 539 505

237 6-point 039 2668 1431 462 1381 113712-point 039 2751 1744 800 1790 892

0

50

100

150

200

250

300

350

12-point

Tim

e cos

t (s)

ARIMAEEMD-ARIMA EEMD-RT-ARIMA

6-pointHost 839

12-point6-pointHost 22

12-point6-pointHost 237

Figure 8 Time cost of different prediction methods

12 Journal of Electrical and Computer Engineering

References

[1] J J Prevost K Nagothu B Kelley and M Jamshidi ldquoPre-diction of cloud data center networks loads using stochasticand neural modelsrdquo in Proceedings of 2011 6th InternationalConference on System of Systems Engineering pp 276ndash281IEEE Albuquerque NM USA June 2011

[2] M Borkowski S Schulte and C Hochreiner ldquoPredictingcloud resource utilizationrdquo in Proceedings of 9th InternationalConference on Utility and Cloud Computing (UCC) pp 37ndash42IEEE Shanghai China December 2016

[3] M Barati and S Sharifian ldquoA hybrid heuristic-based tunedsupport vector regression model for cloud load predictionrdquoJournal of Supercomputing vol 71 no 11 pp 4235ndash42592015

[4] Z Chen Y Zhu Y Di S Feng and J Geng ldquoA high-accuracyself-adaptive resource demands predicting method in IaaScloud environmentrdquo Neural Network World vol 25 no 5pp 519ndash540 2015

[5] J Chen and Y Wang ldquoA resource demand prediction methodbased on EEMD in cloud computingrdquo Procedia ComputerScience vol 131 pp 116ndash123 2018

[6] H Toumi Z Brahmi Z Benarfa and M M GammoudildquoServer load prediction using stream miningrdquo in Proceedingsof 2017 International Conference on Information Networking(ICOIN) pp 653ndash661 IEEE Da Nang Vietnam January2017

[7] S Di D Kondo and W Cirne ldquoHost load prediction in aGoogle compute cloud with a Bayesian modelrdquo in Proceedingsof 2012 International Conference on High PerformanceComputing Networking Storage and Analysis (SCrsquo12)pp 1ndash11 IEEE Salt Lake City UT USA November 2012

[8] N K Gondhi and P Kailu ldquoPrediction based energy efficientvirtual machine consolidation in cloud computingrdquo in Pro-ceedings of 2015 Second International Conference on Advancesin Computing and Communication Engineering pp 437ndash441IEEE Dehradun India May 2015

[9] A Verma G Dasgupta T K Nayak P De and R KotharildquoServer workload analysis for power minimization usingconsolidationrdquo in Proceedings of the 2009 Conference onUSENIX Annual Technical Conference p 28 USENIX As-sociation San Diego CA USA June 2009

[10] B Song Y Yu Y Zhou Z Wang and S Du ldquoHost loadprediction with long short-term memory in cloud comput-ingrdquo Journal of Supercomputing vol 74 no 12 pp 6554ndash6568 2018

[11] J-J Jheng F-H Tseng H-C Chao and L-D Chou ldquoA novelVM workload prediction using grey forecasting model incloud data centerrdquo in Proceedings of International Conferenceon Information Networking 2014 (ICOIN2014) pp 40ndash45IEEE Phuket +ailand February 2014

[12] A Beloglazov and R Buyya ldquoManaging overloaded hosts fordynamic consolidation of virtual machines in cloud datacenters under quality of service constraintsrdquo IEEE Trans-actions on Parallel and Distributed Systems vol 24 no 7pp 1366ndash1379 2013

[13] M Dabbagh B Hamdaoui M Guizani and A Rayes ldquoAnenergy-efficient VM prediction and migration framework forovercommitted cloudsrdquo IEEE Transactions on Cloud Com-puting vol 6 no 4 pp 955ndash966 2018

[14] D Minarolli A Mazrekaj and B Freisleben ldquoTackling un-certainty in long-term predictions for host overload andunderload detection in cloud computingrdquo Journal of CloudComputing vol 6 no 1 p 4 2017

[15] K Mason M Duggan E Barrett J Duggan and E HowleyldquoPredicting host CPU utilization in the cloud using evolu-tionary neural networksrdquo Future Generation Computer Sys-tems vol 86 pp 162ndash173 2018

[16] D Magalhatildees R N Calheiros R Buyya and D G GomesldquoWorkload modeling for resource usage analysis and simu-lation in cloud computingrdquo Computers amp Electrical Engi-neering vol 47 pp 69ndash81 2015

[17] C Tian Y Wang Y Luo et al ldquoMinimizing content re-organization and tolerating imperfect workload prediction forcloud-based video-on-demand servicesrdquo IEEE Transactionson Services Computing vol 9 no 6 pp 926ndash939 2016

[18] M Verma G R Gangadharan V Ravi and N NarendraldquoResource demand prediction in multi-tenant service cloudsrdquoin Proceedings of 2013 IEEE International Conference on CloudComputing in Emerging Markets (CCEM) pp 1ndash8 IEEEBangalore India October 2013

[19] W Zhang Y Shi L Liu L Cui and Y Zheng ldquoPerformanceand resource prediction at high utilization for n-tier servicesystems in cloud an experiment driven approachrdquo in Pro-ceedings of 2015 IEEE International Conference on Computerand Information Technology Ubiquitous Computing andCommunications Dependable Autonomic and Secure Com-puting Pervasive Intelligence and Computing pp 843ndash848IEEE Liverpool UK October 2015

[20] G Kecskemeti A Kertesz and Z Nemeth ldquoCloud workloadprediction by means of simulationsrdquo in Proceedings of theComputing Frontiers Conference on ZZZ (CFrsquo17) pp 279ndash282ACM Siena Italy May 2017

[21] Y Chen and Z-A Jiang ldquoDynamically predicting the qualityof service batch online and hybrid algorithmsrdquo Journal ofElectrical and Computer Engineering vol 2017 Article ID9547869 10 pages 2017

[22] A Khan X Yan S Tao and N Anerousis ldquoWorkloadcharacterization and prediction in the cloud a multiple timeseries approachrdquo in Proceedings of 2012 IEEE Network Op-erations and Management Symposium pp 1287ndash1294 IEEEMaui HI USA April 2012

[23] A K Mishra J L Hellerstein W Cirne and C R DasldquoTowards characterizing cloud backend workloadsrdquo ACMSIGMETRICS Performance Evaluation Review vol 37 no 4pp 34ndash41 2010

[24] D Gmach J Rolia L Cherkasova and A KemperldquoWorkload analysis and demand prediction of enterprise datacenter applicationsrdquo in Proceedings of 2007 IEEE 10th In-ternational Symposium on Workload Characterizationpp 171ndash180 IEEE Boston MA USA September 2007

[25] F-H Tseng X Wang L-D Chou H-C Chao andV C M Leung ldquoDynamic resource prediction and allocationfor cloud data center using the multiobjective genetic algo-rithmrdquo IEEE Systems Journal vol 12 no 2 pp 1688ndash16992018

[26] G K Shyam and S S Manvi ldquoVirtual resource prediction incloud environment a Bayesian approachrdquo Journal of Networkand Computer Applications vol 65 pp 144ndash154 2016

[27] Y Lu J Panneerselvam L Liu and Y Wu ldquoRVLBPNN aworkload forecasting model for smart cloud computingrdquoScientific Programming vol 2016 Article ID 5635673 9 pages2016

[28] K Rajaram and M P Malarvizhi ldquoUtilization based pre-diction model for resource provisioningrdquo in Proceedings of2017 International Conference on Computer Communicationand Signal Processing (ICCCSP) pp 1ndash6 IEEE ChennaiIndia January 2017

Journal of Electrical and Computer Engineering 13

[29] L Li and A Zhang ldquoResource demand optimization com-bined prediction under cloud computing environment basedon IOWGA operatorrdquo International Journal of Grid andDistributed Computing vol 8 no 3 pp 77ndash86 2015

[30] D Minarolli and B Freisleben ldquoCross-correlation predictionof resource demand for virtual machine resource allocation incloudsrdquo in Proceedings of 2014 Sixth International Conferenceon Computational Intelligence Communication Systems andNetworks pp 119ndash124 IEEE Tetova Macedonia May 2014

[31] W Zhang P Duan L T Yang et al ldquoResource requestsprediction in the cloud computing environment with a deepbelief networkrdquo Software Practice and Experience vol 47no 3 pp 473ndash488 2017

[32] H-B Mi H-M Wang G Yin D-X Shi Y-F Zhou andL Yuan ldquoResource on-demand reconfiguration method forvirtualized data centersrdquo Journal of Software vol 22 no 9pp 2193ndash2205 2011

[33] R N Calheiros E Masoumi R Ranjan and R BuyyaldquoWorkload prediction using ARIMAmodel and its impact oncloud applicationsrsquo QoSrdquo IEEE Transactions on CloudComputing vol 3 no 4 pp 449ndash458 2015

[34] Y Meng R Rao X Zhang and P Hong ldquoCRUPA a con-tainer resource utilization prediction algorithm for auto-scaling based on time series analysisrdquo in Proceedings of2016 International Conference on Progress in Informatics andComputing (PIC) pp 468ndash472 IEEE Shanghai China De-cember 2016

[35] E Dhib N Zangar N Tabbane and K Boussetta ldquoImpact ofseasonal ARIMA workload prediction model on QoE formassively multiplayers online gamingrdquo in Proceedings of 20165th International Conference on Multimedia Computing andSystems (ICMCS) pp 737ndash741 IEEE Marrakech MoroccoSeptember 2016

[36] A Ganapathi Y Chen A Fox R Katz and D PattersonldquoStatistics-driven workload modeling for the cloudrdquo inProceedings of 2010 IEEE 26th International Conference onData EngineeringWorkshops (ICDEW 2010) pp 87ndash92 IEEELong Beach CA USA March 2010

[37] V G Tran V Debusschere and S Bacha ldquoHourly serverworkload forecasting up to 168 hours ahead using seasonalARIMA modelrdquo in Proceedings of 2012 IEEE InternationalConference on Industrial Technology pp 1127ndash1131 IEEEAthens Greece March 2012

[38] D Xu S Yang and H Luo ldquoResearch on generalized fuzzysoft sets theory based combined model for demanded cloudcomputing resource predictionrdquo Chinese Journal of Man-agement Science vol 23 no 5 pp 56ndash64 2015

[39] S Li Y Wang X Qiu D Wang and L Wang ldquoA workloadprediction-basedmulti-VM provisioningmechanism in cloudcomputingrdquo in Proceedings of 2013 15th Asia-Pacific NetworkOperations andManagement Symposium (APNOMS) pp 1ndash6IEEE Hiroshima Japan September 2013

[40] X Fu and C Zhou ldquoPredicted affinity based virtual machineplacement in cloud computing environmentsrdquo IEEE Trans-actions on Cloud Computing vol 99 p 1 2017

[41] Y Jiang C Perng T Li and R Chang ldquoASAP a self-adaptiveprediction system for instant cloud resource demand pro-visioningrdquo in Proceedings of 2011 IEEE 11th InternationalConference on Data Mining pp 1104ndash1109 IEEE VancouverBC Canada December 2011

[42] Z Wu and N E Huang ldquoEnsemble empirical mode de-composition a noise-assisted data analysis methodrdquo Ad-vances in Adaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[43] H Zang L Fan M Guo Z Wei G Sun and L ZhangldquoShort-term wind power interval forecasting based on anEEMD-RT-RVMmodelrdquo Advances in Meteorology vol 2016Article ID 8760780 10 pages 2016

[44] N Safari C Y Chung and G C D Price ldquoNovel multi-stepshort-term wind power prediction framework based onchaotic time series analysis and singular spectrum analysisrdquoIEEE Transactions on Power Systems vol 33 no 1 pp 590ndash601 2018

[45] X Chen H Wang J Huang and H Ren ldquoAPU degradationprediction based on EEMD and Gaussian process regressionrdquoin Proceedings of 2017 International Conference on SensingDiagnostics Prognostics and Control (SDPC) pp 98ndash104IEEE Shanghai China August 2017

[46] C Yan C Yi XWu et al ldquoTurbine fault trend prediction thatbased on EEMD and ARIMA modelsrdquo Journal of GansuSciences vol 28 no 4 pp 100ndash106 2016

[47] M Jin P Li L Zhang et al ldquoA signal feature method and itsapplication based on EEMD fuzzy entropy and GK cluster-ingrdquo ACTA Metrologica Sinica vol 26 no 5 pp 501ndash5052015

[48] E H Norden Z Shen R L Steven et al ldquo+e empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysisrdquoin Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 no 1971 pp 903ndash995 RoyalSociety London UK March 1998

[49] Alibaba cluster-trace-v2017 httpsgithubcomalibabaclusterdata

14 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 13: A Hybrid Method for Short-Term Host Utilization Prediction ...downloads.hindawi.com/journals/jece/2019/2782349.pdf · terminesthe migratedVMs according to powersavings and workloadbalance.Dabbaghetal.[13]proposedaprediction

References

[1] J J Prevost K Nagothu B Kelley and M Jamshidi ldquoPre-diction of cloud data center networks loads using stochasticand neural modelsrdquo in Proceedings of 2011 6th InternationalConference on System of Systems Engineering pp 276ndash281IEEE Albuquerque NM USA June 2011

[2] M Borkowski S Schulte and C Hochreiner ldquoPredictingcloud resource utilizationrdquo in Proceedings of 9th InternationalConference on Utility and Cloud Computing (UCC) pp 37ndash42IEEE Shanghai China December 2016

[3] M Barati and S Sharifian ldquoA hybrid heuristic-based tunedsupport vector regression model for cloud load predictionrdquoJournal of Supercomputing vol 71 no 11 pp 4235ndash42592015

[4] Z Chen Y Zhu Y Di S Feng and J Geng ldquoA high-accuracyself-adaptive resource demands predicting method in IaaScloud environmentrdquo Neural Network World vol 25 no 5pp 519ndash540 2015

[5] J Chen and Y Wang ldquoA resource demand prediction methodbased on EEMD in cloud computingrdquo Procedia ComputerScience vol 131 pp 116ndash123 2018

[6] H Toumi Z Brahmi Z Benarfa and M M GammoudildquoServer load prediction using stream miningrdquo in Proceedingsof 2017 International Conference on Information Networking(ICOIN) pp 653ndash661 IEEE Da Nang Vietnam January2017

[7] S Di D Kondo and W Cirne ldquoHost load prediction in aGoogle compute cloud with a Bayesian modelrdquo in Proceedingsof 2012 International Conference on High PerformanceComputing Networking Storage and Analysis (SCrsquo12)pp 1ndash11 IEEE Salt Lake City UT USA November 2012

[8] N K Gondhi and P Kailu ldquoPrediction based energy efficientvirtual machine consolidation in cloud computingrdquo in Pro-ceedings of 2015 Second International Conference on Advancesin Computing and Communication Engineering pp 437ndash441IEEE Dehradun India May 2015

[9] A Verma G Dasgupta T K Nayak P De and R KotharildquoServer workload analysis for power minimization usingconsolidationrdquo in Proceedings of the 2009 Conference onUSENIX Annual Technical Conference p 28 USENIX As-sociation San Diego CA USA June 2009

[10] B Song Y Yu Y Zhou Z Wang and S Du ldquoHost loadprediction with long short-term memory in cloud comput-ingrdquo Journal of Supercomputing vol 74 no 12 pp 6554ndash6568 2018

[11] J-J Jheng F-H Tseng H-C Chao and L-D Chou ldquoA novelVM workload prediction using grey forecasting model incloud data centerrdquo in Proceedings of International Conferenceon Information Networking 2014 (ICOIN2014) pp 40ndash45IEEE Phuket +ailand February 2014

[12] A Beloglazov and R Buyya ldquoManaging overloaded hosts fordynamic consolidation of virtual machines in cloud datacenters under quality of service constraintsrdquo IEEE Trans-actions on Parallel and Distributed Systems vol 24 no 7pp 1366ndash1379 2013

[13] M Dabbagh B Hamdaoui M Guizani and A Rayes ldquoAnenergy-efficient VM prediction and migration framework forovercommitted cloudsrdquo IEEE Transactions on Cloud Com-puting vol 6 no 4 pp 955ndash966 2018

[14] D Minarolli A Mazrekaj and B Freisleben ldquoTackling un-certainty in long-term predictions for host overload andunderload detection in cloud computingrdquo Journal of CloudComputing vol 6 no 1 p 4 2017

[15] K Mason M Duggan E Barrett J Duggan and E HowleyldquoPredicting host CPU utilization in the cloud using evolu-tionary neural networksrdquo Future Generation Computer Sys-tems vol 86 pp 162ndash173 2018

[16] D Magalhatildees R N Calheiros R Buyya and D G GomesldquoWorkload modeling for resource usage analysis and simu-lation in cloud computingrdquo Computers amp Electrical Engi-neering vol 47 pp 69ndash81 2015

[17] C Tian Y Wang Y Luo et al ldquoMinimizing content re-organization and tolerating imperfect workload prediction forcloud-based video-on-demand servicesrdquo IEEE Transactionson Services Computing vol 9 no 6 pp 926ndash939 2016

[18] M Verma G R Gangadharan V Ravi and N NarendraldquoResource demand prediction in multi-tenant service cloudsrdquoin Proceedings of 2013 IEEE International Conference on CloudComputing in Emerging Markets (CCEM) pp 1ndash8 IEEEBangalore India October 2013

[19] W Zhang Y Shi L Liu L Cui and Y Zheng ldquoPerformanceand resource prediction at high utilization for n-tier servicesystems in cloud an experiment driven approachrdquo in Pro-ceedings of 2015 IEEE International Conference on Computerand Information Technology Ubiquitous Computing andCommunications Dependable Autonomic and Secure Com-puting Pervasive Intelligence and Computing pp 843ndash848IEEE Liverpool UK October 2015

[20] G Kecskemeti A Kertesz and Z Nemeth ldquoCloud workloadprediction by means of simulationsrdquo in Proceedings of theComputing Frontiers Conference on ZZZ (CFrsquo17) pp 279ndash282ACM Siena Italy May 2017

[21] Y Chen and Z-A Jiang ldquoDynamically predicting the qualityof service batch online and hybrid algorithmsrdquo Journal ofElectrical and Computer Engineering vol 2017 Article ID9547869 10 pages 2017

[22] A Khan X Yan S Tao and N Anerousis ldquoWorkloadcharacterization and prediction in the cloud a multiple timeseries approachrdquo in Proceedings of 2012 IEEE Network Op-erations and Management Symposium pp 1287ndash1294 IEEEMaui HI USA April 2012

[23] A K Mishra J L Hellerstein W Cirne and C R DasldquoTowards characterizing cloud backend workloadsrdquo ACMSIGMETRICS Performance Evaluation Review vol 37 no 4pp 34ndash41 2010

[24] D Gmach J Rolia L Cherkasova and A KemperldquoWorkload analysis and demand prediction of enterprise datacenter applicationsrdquo in Proceedings of 2007 IEEE 10th In-ternational Symposium on Workload Characterizationpp 171ndash180 IEEE Boston MA USA September 2007

[25] F-H Tseng X Wang L-D Chou H-C Chao andV C M Leung ldquoDynamic resource prediction and allocationfor cloud data center using the multiobjective genetic algo-rithmrdquo IEEE Systems Journal vol 12 no 2 pp 1688ndash16992018

[26] G K Shyam and S S Manvi ldquoVirtual resource prediction incloud environment a Bayesian approachrdquo Journal of Networkand Computer Applications vol 65 pp 144ndash154 2016

[27] Y Lu J Panneerselvam L Liu and Y Wu ldquoRVLBPNN aworkload forecasting model for smart cloud computingrdquoScientific Programming vol 2016 Article ID 5635673 9 pages2016

[28] K Rajaram and M P Malarvizhi ldquoUtilization based pre-diction model for resource provisioningrdquo in Proceedings of2017 International Conference on Computer Communicationand Signal Processing (ICCCSP) pp 1ndash6 IEEE ChennaiIndia January 2017

Journal of Electrical and Computer Engineering 13

[29] L Li and A Zhang ldquoResource demand optimization com-bined prediction under cloud computing environment basedon IOWGA operatorrdquo International Journal of Grid andDistributed Computing vol 8 no 3 pp 77ndash86 2015

[30] D Minarolli and B Freisleben ldquoCross-correlation predictionof resource demand for virtual machine resource allocation incloudsrdquo in Proceedings of 2014 Sixth International Conferenceon Computational Intelligence Communication Systems andNetworks pp 119ndash124 IEEE Tetova Macedonia May 2014

[31] W Zhang P Duan L T Yang et al ldquoResource requestsprediction in the cloud computing environment with a deepbelief networkrdquo Software Practice and Experience vol 47no 3 pp 473ndash488 2017

[32] H-B Mi H-M Wang G Yin D-X Shi Y-F Zhou andL Yuan ldquoResource on-demand reconfiguration method forvirtualized data centersrdquo Journal of Software vol 22 no 9pp 2193ndash2205 2011

[33] R N Calheiros E Masoumi R Ranjan and R BuyyaldquoWorkload prediction using ARIMAmodel and its impact oncloud applicationsrsquo QoSrdquo IEEE Transactions on CloudComputing vol 3 no 4 pp 449ndash458 2015

[34] Y Meng R Rao X Zhang and P Hong ldquoCRUPA a con-tainer resource utilization prediction algorithm for auto-scaling based on time series analysisrdquo in Proceedings of2016 International Conference on Progress in Informatics andComputing (PIC) pp 468ndash472 IEEE Shanghai China De-cember 2016

[35] E Dhib N Zangar N Tabbane and K Boussetta ldquoImpact ofseasonal ARIMA workload prediction model on QoE formassively multiplayers online gamingrdquo in Proceedings of 20165th International Conference on Multimedia Computing andSystems (ICMCS) pp 737ndash741 IEEE Marrakech MoroccoSeptember 2016

[36] A Ganapathi Y Chen A Fox R Katz and D PattersonldquoStatistics-driven workload modeling for the cloudrdquo inProceedings of 2010 IEEE 26th International Conference onData EngineeringWorkshops (ICDEW 2010) pp 87ndash92 IEEELong Beach CA USA March 2010

[37] V G Tran V Debusschere and S Bacha ldquoHourly serverworkload forecasting up to 168 hours ahead using seasonalARIMA modelrdquo in Proceedings of 2012 IEEE InternationalConference on Industrial Technology pp 1127ndash1131 IEEEAthens Greece March 2012

[38] D Xu S Yang and H Luo ldquoResearch on generalized fuzzysoft sets theory based combined model for demanded cloudcomputing resource predictionrdquo Chinese Journal of Man-agement Science vol 23 no 5 pp 56ndash64 2015

[39] S Li Y Wang X Qiu D Wang and L Wang ldquoA workloadprediction-basedmulti-VM provisioningmechanism in cloudcomputingrdquo in Proceedings of 2013 15th Asia-Pacific NetworkOperations andManagement Symposium (APNOMS) pp 1ndash6IEEE Hiroshima Japan September 2013

[40] X Fu and C Zhou ldquoPredicted affinity based virtual machineplacement in cloud computing environmentsrdquo IEEE Trans-actions on Cloud Computing vol 99 p 1 2017

[41] Y Jiang C Perng T Li and R Chang ldquoASAP a self-adaptiveprediction system for instant cloud resource demand pro-visioningrdquo in Proceedings of 2011 IEEE 11th InternationalConference on Data Mining pp 1104ndash1109 IEEE VancouverBC Canada December 2011

[42] Z Wu and N E Huang ldquoEnsemble empirical mode de-composition a noise-assisted data analysis methodrdquo Ad-vances in Adaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[43] H Zang L Fan M Guo Z Wei G Sun and L ZhangldquoShort-term wind power interval forecasting based on anEEMD-RT-RVMmodelrdquo Advances in Meteorology vol 2016Article ID 8760780 10 pages 2016

[44] N Safari C Y Chung and G C D Price ldquoNovel multi-stepshort-term wind power prediction framework based onchaotic time series analysis and singular spectrum analysisrdquoIEEE Transactions on Power Systems vol 33 no 1 pp 590ndash601 2018

[45] X Chen H Wang J Huang and H Ren ldquoAPU degradationprediction based on EEMD and Gaussian process regressionrdquoin Proceedings of 2017 International Conference on SensingDiagnostics Prognostics and Control (SDPC) pp 98ndash104IEEE Shanghai China August 2017

[46] C Yan C Yi XWu et al ldquoTurbine fault trend prediction thatbased on EEMD and ARIMA modelsrdquo Journal of GansuSciences vol 28 no 4 pp 100ndash106 2016

[47] M Jin P Li L Zhang et al ldquoA signal feature method and itsapplication based on EEMD fuzzy entropy and GK cluster-ingrdquo ACTA Metrologica Sinica vol 26 no 5 pp 501ndash5052015

[48] E H Norden Z Shen R L Steven et al ldquo+e empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysisrdquoin Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 no 1971 pp 903ndash995 RoyalSociety London UK March 1998

[49] Alibaba cluster-trace-v2017 httpsgithubcomalibabaclusterdata

14 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 14: A Hybrid Method for Short-Term Host Utilization Prediction ...downloads.hindawi.com/journals/jece/2019/2782349.pdf · terminesthe migratedVMs according to powersavings and workloadbalance.Dabbaghetal.[13]proposedaprediction

[29] L Li and A Zhang ldquoResource demand optimization com-bined prediction under cloud computing environment basedon IOWGA operatorrdquo International Journal of Grid andDistributed Computing vol 8 no 3 pp 77ndash86 2015

[30] D Minarolli and B Freisleben ldquoCross-correlation predictionof resource demand for virtual machine resource allocation incloudsrdquo in Proceedings of 2014 Sixth International Conferenceon Computational Intelligence Communication Systems andNetworks pp 119ndash124 IEEE Tetova Macedonia May 2014

[31] W Zhang P Duan L T Yang et al ldquoResource requestsprediction in the cloud computing environment with a deepbelief networkrdquo Software Practice and Experience vol 47no 3 pp 473ndash488 2017

[32] H-B Mi H-M Wang G Yin D-X Shi Y-F Zhou andL Yuan ldquoResource on-demand reconfiguration method forvirtualized data centersrdquo Journal of Software vol 22 no 9pp 2193ndash2205 2011

[33] R N Calheiros E Masoumi R Ranjan and R BuyyaldquoWorkload prediction using ARIMAmodel and its impact oncloud applicationsrsquo QoSrdquo IEEE Transactions on CloudComputing vol 3 no 4 pp 449ndash458 2015

[34] Y Meng R Rao X Zhang and P Hong ldquoCRUPA a con-tainer resource utilization prediction algorithm for auto-scaling based on time series analysisrdquo in Proceedings of2016 International Conference on Progress in Informatics andComputing (PIC) pp 468ndash472 IEEE Shanghai China De-cember 2016

[35] E Dhib N Zangar N Tabbane and K Boussetta ldquoImpact ofseasonal ARIMA workload prediction model on QoE formassively multiplayers online gamingrdquo in Proceedings of 20165th International Conference on Multimedia Computing andSystems (ICMCS) pp 737ndash741 IEEE Marrakech MoroccoSeptember 2016

[36] A Ganapathi Y Chen A Fox R Katz and D PattersonldquoStatistics-driven workload modeling for the cloudrdquo inProceedings of 2010 IEEE 26th International Conference onData EngineeringWorkshops (ICDEW 2010) pp 87ndash92 IEEELong Beach CA USA March 2010

[37] V G Tran V Debusschere and S Bacha ldquoHourly serverworkload forecasting up to 168 hours ahead using seasonalARIMA modelrdquo in Proceedings of 2012 IEEE InternationalConference on Industrial Technology pp 1127ndash1131 IEEEAthens Greece March 2012

[38] D Xu S Yang and H Luo ldquoResearch on generalized fuzzysoft sets theory based combined model for demanded cloudcomputing resource predictionrdquo Chinese Journal of Man-agement Science vol 23 no 5 pp 56ndash64 2015

[39] S Li Y Wang X Qiu D Wang and L Wang ldquoA workloadprediction-basedmulti-VM provisioningmechanism in cloudcomputingrdquo in Proceedings of 2013 15th Asia-Pacific NetworkOperations andManagement Symposium (APNOMS) pp 1ndash6IEEE Hiroshima Japan September 2013

[40] X Fu and C Zhou ldquoPredicted affinity based virtual machineplacement in cloud computing environmentsrdquo IEEE Trans-actions on Cloud Computing vol 99 p 1 2017

[41] Y Jiang C Perng T Li and R Chang ldquoASAP a self-adaptiveprediction system for instant cloud resource demand pro-visioningrdquo in Proceedings of 2011 IEEE 11th InternationalConference on Data Mining pp 1104ndash1109 IEEE VancouverBC Canada December 2011

[42] Z Wu and N E Huang ldquoEnsemble empirical mode de-composition a noise-assisted data analysis methodrdquo Ad-vances in Adaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[43] H Zang L Fan M Guo Z Wei G Sun and L ZhangldquoShort-term wind power interval forecasting based on anEEMD-RT-RVMmodelrdquo Advances in Meteorology vol 2016Article ID 8760780 10 pages 2016

[44] N Safari C Y Chung and G C D Price ldquoNovel multi-stepshort-term wind power prediction framework based onchaotic time series analysis and singular spectrum analysisrdquoIEEE Transactions on Power Systems vol 33 no 1 pp 590ndash601 2018

[45] X Chen H Wang J Huang and H Ren ldquoAPU degradationprediction based on EEMD and Gaussian process regressionrdquoin Proceedings of 2017 International Conference on SensingDiagnostics Prognostics and Control (SDPC) pp 98ndash104IEEE Shanghai China August 2017

[46] C Yan C Yi XWu et al ldquoTurbine fault trend prediction thatbased on EEMD and ARIMA modelsrdquo Journal of GansuSciences vol 28 no 4 pp 100ndash106 2016

[47] M Jin P Li L Zhang et al ldquoA signal feature method and itsapplication based on EEMD fuzzy entropy and GK cluster-ingrdquo ACTA Metrologica Sinica vol 26 no 5 pp 501ndash5052015

[48] E H Norden Z Shen R L Steven et al ldquo+e empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysisrdquoin Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 no 1971 pp 903ndash995 RoyalSociety London UK March 1998

[49] Alibaba cluster-trace-v2017 httpsgithubcomalibabaclusterdata

14 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 15: A Hybrid Method for Short-Term Host Utilization Prediction ...downloads.hindawi.com/journals/jece/2019/2782349.pdf · terminesthe migratedVMs according to powersavings and workloadbalance.Dabbaghetal.[13]proposedaprediction

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom