Research Article A Hybrid Least Square Support Vector ...Research Article A Hybrid Least Square...

8
Research Article A Hybrid Least Square Support Vector Machine Model with Parameters Optimization for Stock Forecasting Jian Chai, 1 Jiangze Du, 2 Kin Keung Lai, 1,2 and Yan Pui Lee 3 1 International Business School, Shaanxi Normal University, Xian 710062, China 2 Department of Management Sciences, City University of Hong Kong, Hong Kong 3 School of Business, Tung Wah College, Hong Kong Correspondence should be addressed to Kin Keung Lai; [email protected] Received 30 May 2014; Accepted 20 August 2014 Academic Editor: Shifei Ding Copyright Β© 2015 Jian Chai et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. is paper proposes an EMD-LSSVM (empirical mode decomposition least squares support vector machine) model to analyze the CSI 300 index. A WD-LSSVM (wavelet denoising least squares support machine) is also proposed as a benchmark to compare with the performance of EMD-LSSVM. Since parameters selection is vital to the performance of the model, different optimization methods are used, including simplex, GS (grid search), PSO (particle swarm optimization), and GA (genetic algorithm). Experimental results show that the EMD-LSSVM model with GS algorithm outperforms other methods in predicting stock market movement direction. 1. Introduction Stock market is one of the most sophisticated and challenging financial markets since many factors affect its movement, including government policy, global economic situation, investors’ expectations, and even correlations with other markets [1]. References [2, 3] described financial time series as essentially noisy, dynamic, and deterministically chaotic data sequences. Hence, a precise prediction of stock index movement can help investors make decisions to take or shed positions in the stock market at the right time and make profits. Many works have been published by researchers to maximize investment profits and minimize risk. erefore, predicting stock market is quite important and significant. Neural networks have been successfully applied in fore- casting of financial time series during the past two decades [4–6]. Neural networks are general function approximations which can approximate many nonlinear functions regardless of the properties of time series data [7]. Besides, neural networks are able to learn dynamic systems which make them a more powerful tool for studying financial time series compared with traditional models [8–10]. However, there are a couple of weaknesses when neural networks are used in forecasting financial time series. For instance, when the typical back-propagation neural network is applied, a huge number of parameters are required to be controlled for. is makes the solution unstable and causes overfitting. e overfitting problem results in poor performance and becomes a critical issue for researchers. Accordingly, [11] proposed a support vector machine (SVM) model. According to [12–14], there are two advantages of using SVM rather than neural networks. One is that SVM has a better performance in terms of generalization. Unlike the empirical risk minimization principle in tradi- tional neural networks, SVM reduces generalization error bounds based on the structural risk minimization principle. SVM seeks to achieve an optimal structure through finding out a balance between generalization errors and Vapnik- Chervonenkis (VC) confidence interval. Another advantage is that SVM prevents the model from getting stuck into local minima. Since the introduction of SVM, it has been developed rapidly in the real world. ere are mainly two ways for applying SVM: one is classification and the other is regres- sion. For classification, [15] constructed a SVM based model to accurately evaluate the consumers’ credit score and solve Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2015, Article ID 231394, 7 pages http://dx.doi.org/10.1155/2015/231394

Transcript of Research Article A Hybrid Least Square Support Vector ...Research Article A Hybrid Least Square...

  • Research ArticleA Hybrid Least Square Support Vector Machine Model withParameters Optimization for Stock Forecasting

    Jian Chai,1 Jiangze Du,2 Kin Keung Lai,1,2 and Yan Pui Lee3

    1 International Business School, Shaanxi Normal University, Xian 710062, China2Department of Management Sciences, City University of Hong Kong, Hong Kong3School of Business, Tung Wah College, Hong Kong

    Correspondence should be addressed to Kin Keung Lai; [email protected]

    Received 30 May 2014; Accepted 20 August 2014

    Academic Editor: Shifei Ding

    Copyright Β© 2015 Jian Chai et al.This is an open access article distributed under the Creative CommonsAttribution License, whichpermits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

    This paper proposes an EMD-LSSVM (empirical mode decomposition least squares support vector machine) model to analyzethe CSI 300 index. A WD-LSSVM (wavelet denoising least squares support machine) is also proposed as a benchmark tocompare with the performance of EMD-LSSVM. Since parameters selection is vital to the performance of the model, differentoptimization methods are used, including simplex, GS (grid search), PSO (particle swarm optimization), and GA (geneticalgorithm). Experimental results show that the EMD-LSSVMmodel with GS algorithm outperforms other methods in predictingstock market movement direction.

    1. Introduction

    Stockmarket is one of themost sophisticated and challengingfinancial markets since many factors affect its movement,including government policy, global economic situation,investors’ expectations, and even correlations with othermarkets [1]. References [2, 3] described financial time seriesas essentially noisy, dynamic, and deterministically chaoticdata sequences. Hence, a precise prediction of stock indexmovement can help investors make decisions to take or shedpositions in the stock market at the right time and makeprofits. Many works have been published by researchers tomaximize investment profits and minimize risk. Therefore,predicting stock market is quite important and significant.

    Neural networks have been successfully applied in fore-casting of financial time series during the past two decades[4–6]. Neural networks are general function approximationswhich can approximate many nonlinear functions regardlessof the properties of time series data [7]. Besides, neuralnetworks are able to learn dynamic systems which makethem a more powerful tool for studying financial time seriescompared with traditional models [8–10]. However, thereare a couple of weaknesses when neural networks are used

    in forecasting financial time series. For instance, when thetypical back-propagation neural network is applied, a hugenumber of parameters are required to be controlled for.This makes the solution unstable and causes overfitting. Theoverfitting problem results in poor performance and becomesa critical issue for researchers.

    Accordingly, [11] proposed a support vector machine(SVM)model. According to [12–14], there are two advantagesof using SVM rather than neural networks. One is thatSVM has a better performance in terms of generalization.Unlike the empirical risk minimization principle in tradi-tional neural networks, SVM reduces generalization errorbounds based on the structural risk minimization principle.SVM seeks to achieve an optimal structure through findingout a balance between generalization errors and Vapnik-Chervonenkis (VC) confidence interval. Another advantageis that SVM prevents the model from getting stuck into localminima.

    Since the introduction of SVM, it has been developedrapidly in the real world. There are mainly two ways forapplying SVM: one is classification and the other is regres-sion. For classification, [15] constructed a SVM based modelto accurately evaluate the consumers’ credit score and solve

    Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2015, Article ID 231394, 7 pageshttp://dx.doi.org/10.1155/2015/231394

  • 2 Mathematical Problems in Engineering

    classification problems. Also, SVM is widely used in thearea of forecasting. Reference [16] used SVM to predict thedirection of daily stock price in the Korea composite stockprice index (KOSPI). More recently, [17] applies the SupportVector Regression to forecast the Nikki 225 opening indexand TAIEX closing index after detecting and removing thenoise by independent component analysis (ICA).

    However, the performance of SVM mainly depends onthe input data and is sensitive to parameters. Recent empiricalstudies have demonstrated that properties of the modelperformance are influenced by two aspects: low level of signalto noise ratio (SNR) and instability of model specificationduring the estimation process. For example, [18] investigatesthe hyperparameters selection for support vector machinewith different noise distributions to compare the modelperformance. Moreover, [19] applied wavelet to denoise thebearing vibration signals by improving the SNR and thenfigure out the best model according to the performances ofANN and SVM.

    To improve the classification and forecasting accuracy,several researchers including [20, 21] have proved that thecombined classifying and forecasting models perform betterthan any individual model. Also, [22] showed that theensemble empirical model decomposition (EEMD) can beintegrated with extreme learning machine (ELM) to an effec-tive forecasting model for computer products sales. In thispaper, we propose a hybrid EMD-LSSVM (empirical modedecomposition least squares support vector machine) withdifferent parameters optimization algorithms. The experi-mental results prove that the EMD-LSSVM model has abetter performance than the WD-LSSVM (wavelet denois-ing least squares support vector machine) model. Firstly,we use the empirical mode decomposition and waveletdenoising algorithm to deal with the original input data.Secondly, parameters of SVM are optimized by differentmethods, including simplex, grid search (GS), particle swarmoptimization (PSO), and genetic algorithm (GA). Resultsfrom empirical studies show that the hybrid model EMD-LSSVM with GS parameter optimization outperforms theother model.

    2. EMD-LSSVM Model and WD-LSSVM

    2.1. Empirical Mode Decomposition (EMD). References [23,24] proposed empirical mode decomposition (EMD) whichdecomposes data series into a number of intrinsic modefunctions (IMFs). It was designed for nonstationary andnonlinear data sets. In order to apply EMD, time series dataset must satisfy the following two conditions.

    (1) The sum of local maxima and local minima mustequate to the total number of zero crossings or thedifference between them is 1. In other words, for everylocal maxima and local minima, there must be onezero crossing following up.

    (2) The local average is zero, which means that meanvalue of the upper envelope (defined by localmaxima)and lower envelope (defined by localminima)must bezero.

    Thus, if a function is an IMF, it represents a signalsymmetric to local mean zero. An IMF is a simple oscillatorymode which is more general than the simple harmonicfunction and the frequency and amplitude of the IMF canbe variable. Then, data series π‘₯(𝑑) (𝑑 = 1, 2, . . . , 𝑛) can bedecomposed by the following sifting procedure.

    (1) Find all local maxima and minima in π‘₯(𝑑). Then usethe cubic spline line to connect all local maxima togenerate upper envelope π‘₯up(𝑑) and connect all localminima to generate lower envelop π‘₯low(𝑑).

    (2) According to the upper and lower envelopes obtainedin Step (1), calculate the envelope meanπ‘š

    1(𝑑):

    π‘š1(𝑑) =

    (π‘₯up (𝑑) + π‘₯low (𝑑))

    2

    .(1)

    (3) Data series π‘₯(𝑑)minus envelope meanπ‘š1(𝑑) gives the

    first component 𝑑1(𝑑):

    𝑑1(𝑑) = π‘₯ (𝑑) βˆ’ π‘š

    1(𝑑) . (2)

    (4) Check if 𝑑1(𝑑) satisfies the IMF requirements; if

    𝑑1(𝑑) does not satisfy them, go back to Step (1) and

    replace π‘₯(𝑑) with 𝑑1(𝑑) to conduct the second sifting

    procedure; that is, 𝑑2(𝑑) = 𝑑

    1(𝑑) βˆ’ π‘š

    2(𝑑). Repeat the

    sifting procedure π‘˜ times π‘‘π‘˜(𝑑) = 𝑑

    π‘˜βˆ’1(𝑑)βˆ’π‘š

    π‘˜(𝑑) until

    the following stop criterion is satisfied:

    𝑇

    βˆ‘

    𝑑 = 1

    [𝑑𝑗(𝑑) βˆ’ 𝑑

    𝑗+1(𝑑)]

    2

    𝑑2

    𝑗(𝑑)

    < SC, (3)

    where SC is the stopping condition. Normally, it isset between 0.2 and 0.3. Then, we get the first IMFcomponent; that is, 𝑐

    1(𝑑) = 𝑑

    π‘˜(𝑑).

    (5) Subtract first IMF component 𝑐1(𝑑) fromdata sets π‘₯(𝑑)

    and get the residual π‘Ÿ1(𝑑) = π‘₯(𝑑) βˆ’ 𝑐

    1(𝑑).

    (6) Treat π‘Ÿ1(𝑑) as the new data series and repeat Steps (1)

    to (5).Then get the new residual π‘Ÿ2(𝑑). In thisway, after

    repeating 𝑛 times, we get

    π‘Ÿ2(𝑑) = π‘Ÿ

    1(𝑑) βˆ’ 𝑐

    2(𝑑) ,

    π‘Ÿ3(𝑑) = π‘Ÿ

    2(𝑑) βˆ’ 𝑐

    3(𝑑) ,

    .

    .

    .

    π‘Ÿπ‘›(𝑑) = π‘Ÿ

    π‘›βˆ’1(𝑑) βˆ’ 𝑐

    𝑛(𝑑) .

    (4)

    When the residual π‘Ÿπ‘›(𝑑) becomes a monotonic function,

    the data sets cannot be decomposed anymore. The wholeEMD is completed. The original date series can be describedas the combination of 𝑛 IMF components and a mean trendπ‘Ÿπ‘›(𝑑); that is,

    π‘₯ (𝑑) =

    𝑛

    βˆ‘

    𝑗 = 1

    𝑐𝑗(𝑑) + π‘Ÿ

    𝑛(𝑑) . (5)

  • Mathematical Problems in Engineering 3

    In this way, the original data series π‘₯(𝑑) can be decom-posed into 𝑛 IMFs and a mean trend function. Then, we usethe IMFs for instantaneous frequency analysis.

    The traditional Fourier transform decomposes a dataseries into a number of sine or cosine waves for the analysis.However, the EMD technique decomposes the data seriesinto several sinusoid-like signals with variable frequenciesand amean trend function.The EMD has several advantages.First, this method is relatively easy to understand and isalso widely applied since it avoids complex mathematicalalgorithms. Secondly, EMD is suitable to deal with nonlinearand nonstationary data series. Thirdly, EMD is more suitablefor analysing data series with trends such as weather andeconomic data. Finally, EMD is able to find the residual whichreveals the data series trends [25–27].

    2.2. Wavelet Denoising Algorithm. While the traditionalFourier analysis can only remove noise of certain patternsover the entire time horizon, wavelet analysis can deal withmultiscales and more detailed data and is more suitablefor financial time series. Wavelets are continuous functionswhich satisfy the unit energy and admissibility condition in

    πΆπœ‘= ∫

    ∞

    0

    πœ‘ (𝑓)

    𝑓

    𝑑𝑓 < ∞, ∫

    ∞

    βˆ’βˆž

    πœ“ (𝑑)

    2

    𝑑𝑑 = 1, (6)

    where πœ‘ is the Fourier transform of frequency𝑓. πœ“ is thewavelet transform.

    The continuous wavelet function can orthogonally trans-form the original data into subdata series in the waveletdomain. Consider

    π‘Š(𝑒, 𝑠) = ∫

    ∞

    βˆ’βˆž

    π‘₯ (𝑑)

    1

    βˆšπ‘ 

    πœ“(

    𝑑 βˆ’ 𝑒

    𝑠

    ) 𝑑𝑑, (7)

    where u is the dilation parameter and s is the translationparameter.

    The wavelet synthesis rebuilds the original data series,guaranteed by the properties of orthogonal transformation in

    π‘₯ (𝑑) =

    1

    πΆπœ“

    ∫

    ∞

    0

    ∫

    ∞

    βˆ’βˆž

    π‘Š(𝑒, 𝑠) πœ“π‘’,𝑠(𝑑) 𝑑𝑒

    𝑑𝑠

    𝑠2. (8)

    In wavelet analysis, the denoising technique separatesthe data and noise from the original data sets by selectinga threshold. The raw data series are first decomposed intosome data subsets. Then, based on a certain strategy ofselecting the threshold, the boundary between noises anddata is set. Depending on the boundary, smaller data pointsare eliminated and the remaining data are handled by settingcertain thresholds. Finally, these denoised data sets are rebuiltfrom the decomposed data points [28].

    2.3. LSSVM in Function Estimation. This section shows thebasic theory of the least squares support vector machine.Thesupport vector methodology has been used mainly in twoareas, that is, classification and function estimation. Consid-ering regression in the set of function 𝑓(π‘₯) = πœ”π‘‡πœ‘(π‘₯) + 𝑏with given training data inputs π‘₯

    π‘˜βˆˆ 𝑅𝑛 and outputs 𝑦

    π‘˜βˆˆ 𝑅,

    we apply πœ‘(π‘₯) to map π‘₯π‘˜from 𝑅𝑛 to π‘…π‘›β„Ž . Notice that πœ‘(π‘₯)

    can be of infinite dimensional and is defined only implicitly.Also, vector πœ” can also be infinite dimensional. Thus, theoptimization problem becomes

    min 𝐽𝑃(πœ”, πœ‰, πœ‰

    βˆ—) =

    1

    2

    πœ”π‘‡πœ” + 𝑐

    𝑁

    βˆ‘

    π‘˜ = 1

    (πœ‰ + πœ‰βˆ—) ,

    s.t. π‘¦π‘˜βˆ’ πœ”π‘‡πœ‘ (π‘₯π‘˜) βˆ’ 𝑏 ≀ πœ€ + πœ‰

    π‘˜, π‘˜ = 1, . . . , 𝑁,

    πœ”π‘‡πœ‘ (π‘₯π‘˜) + 𝑏 βˆ’ 𝑦

    π‘˜β‰€ πœ€ + πœ‰

    βˆ—

    π‘˜, π‘˜ = 1, . . . , 𝑁,

    πœ‰π‘˜, πœ‰βˆ—

    π‘˜β‰₯ 0, π‘˜ = 1, . . . , 𝑁.

    (9)

    The constant 𝑐 > 0 defines the tolerance of deviationsfrom the desired πœ€ accuracy. It defines the weight of theregularization term empirical risk. The larger the c is, themore important it is for the empirical risk to grow, comparedwith the regularization term. πœ€ is called the tube size andrepresents the accuracy required in training data points.

    By introducing Lagrange multipliers 𝛼, π›Όβˆ—, πœ‚, πœ‚βˆ— β‰₯ 0, weobtain the Lagrangian for this problem. Consider

    𝐿 (πœ”, 𝑏, πœ‰π‘˜, πœ‰βˆ—

    π‘˜; 𝛼, π›Όβˆ—, πœ‚, πœ‚βˆ—)

    =

    1

    2

    πœ”π‘‡πœ” + 𝑐

    𝑁

    βˆ‘

    π‘˜ = 1

    (πœ‰ + πœ‰βˆ—)

    βˆ’

    𝑁

    βˆ‘

    π‘˜ = 1

    π›Όπ‘˜(πœ€ + πœ‰

    π‘˜βˆ’ π‘¦π‘˜+ πœ”π‘‡πœ‘ (π‘₯π‘˜) + 𝑏)

    βˆ’

    𝑁

    βˆ‘

    π‘˜ = 1

    π›Όβˆ—

    π‘˜(πœ€ + πœ‰

    βˆ—

    π‘˜+ π‘¦π‘˜βˆ’ πœ”π‘‡πœ‘ (π‘₯π‘˜) βˆ’ 𝑏)

    βˆ’

    𝑁

    βˆ‘

    π‘˜ = 1

    (πœ‚π‘˜πœ‰π‘˜+ πœ‚βˆ—

    π‘˜πœ‰βˆ—

    π‘˜) .

    (10)

    The reason of introducing another Lagrangemultiplierπ›Όβˆ—π‘˜

    is that there are other slack variables πœ‰π‘˜, πœ‰βˆ—

    π‘˜. Bymaximizing the

    Lagrangian

    max𝛼,π›Όβˆ—,πœ‚,πœ‚βˆ—minπœ”,𝑏,πœ‰π‘˜,πœ‰

    βˆ—

    π‘˜

    𝐿 (πœ”, 𝑏, πœ‰π‘˜, πœ‰βˆ—

    π‘˜; 𝛼, π›Όβˆ—, πœ‚, πœ‚βˆ—) , (11)

    we obtain

    πœ•πΏ

    πœ•πœ”

    = 0 β†’ πœ” =

    𝑁

    βˆ‘

    π‘˜ = 1

    (π›Όπ‘˜βˆ’ π›Όβˆ—

    π‘˜) πœ‘ (π‘₯π‘˜) ,

    πœ•πΏ

    πœ•π‘

    = 0 β†’

    𝑁

    βˆ‘

    π‘˜ = 1

    (π›Όπ‘˜βˆ’ π›Όβˆ—

    π‘˜) = 0,

    πœ•πΏ

    πœ•πœ‰π‘˜

    = 0 β†’ 𝑐 βˆ’ π›Όπ‘˜βˆ’ πœ‚π‘˜= 0,

    πœ•πΏ

    πœ•πœ‰βˆ—

    π‘˜

    = 0 β†’ 𝑐 βˆ’ π›Όβˆ—

    π‘˜βˆ’ πœ‚βˆ—

    π‘˜= 0.

    (12)

  • 4 Mathematical Problems in Engineering

    Table 1: Input and output variables.

    Category Input variables selection Outputvariable SampleData

    number

    Training and validationtesting

    CSI 300, USDX, SHIBOR, REPO, CDS, PE,M2/mkt cap, short-mid not/mkt cap, New

    Loan/mkt capCSI 300 05/01/2009–23/08/201124/08/2011–20/01/2012

    643100

    Then we obtain the following dual problem:

    max𝛼,π›Όβˆ—

    𝐽𝐷(𝛼, π›Όβˆ—) = βˆ’

    1

    2

    𝑁

    βˆ‘

    π‘˜ = 1

    (π›Όπ‘˜βˆ’ π›Όβˆ—

    π‘˜) (π›Όπ‘™βˆ’ π›Όβˆ—

    𝑙)𝐾 (π‘₯

    π‘˜, π‘₯𝑙)

    βˆ’ πœ€

    𝑁

    βˆ‘

    π‘˜ = 1

    (π›Όπ‘˜+ π›Όβˆ—

    π‘˜) +

    𝑁

    βˆ‘

    π‘˜ = 1

    π‘¦π‘˜(π›Όπ‘˜βˆ’ π›Όβˆ—

    π‘˜) ,

    s.t.𝑁

    βˆ‘

    π‘˜ = 1

    (π›Όπ‘˜βˆ’ π›Όβˆ—

    π‘˜) = 0, 𝛼

    π‘˜, π›Όβˆ—

    π‘˜βˆˆ [0, 𝑐] .

    (13)

    Here we use the kernel function 𝐾(π‘₯π‘˜, π‘₯𝑙) = πœ‘(π‘₯

    π‘˜)π‘‡πœ‘(π‘₯𝑙)

    for π‘˜, 𝑙 = 1, . . . , 𝑁. Then the function estimation becomes

    𝑓 (π‘₯) =

    𝑁

    βˆ‘

    π‘˜ = 1

    (π›Όπ‘˜βˆ’ π›Όβˆ—

    π‘˜)𝐾 (π‘₯

    π‘˜, π‘₯𝑙) + 𝑏, (14)

    where π›Όπ‘˜, π›Όβˆ—

    π‘˜are solutions of the above quadratic program-

    ming problem and 𝑏 is obtained from the complementarityof KKT conditions. It is obvious that the decision functionis determined by the support vectors in which coefficients(π›Όπ‘˜βˆ’ π›Όβˆ—

    π‘˜) are not zero. In practice, a larger πœ€ results in a

    smaller number of support vectors and thus the sparser of thesolution. Also, the larger the πœ€ is, the worse the accuracy oftraining points will be. Hence, πœ€ can be applied to control thebalance between closeness to training data and sparseness ofthe solution.

    Kernel function can be obtained by seeking the functionwhich satisfies Mercer’s condition. Here are some popularkernel functions [14, 29, 30]:

    linear: 𝐾(π‘₯, π‘₯π‘˜) = π‘₯𝑇π‘₯π‘˜;

    polynomial: 𝐾(π‘₯, π‘₯π‘˜) = (π‘₯

    𝑇π‘₯𝑙+ 1)𝑑, where 𝑑 is the

    degree of the polynomial kernel;RBF kernel:𝐾(π‘₯, π‘₯

    π‘˜) = exp(βˆ’β€–π‘₯ βˆ’ π‘₯

    π‘˜β€–2/𝜎2), where𝜎2

    is the bandwidth of the Gaussian kernel.

    Parameters of the kernel function define the structure ofthe high dimensional feature space πœ‘(π‘₯) and also control theaccuracy of the final solution. Thus, they should be selectedcarefully.

    3. Empirical Study

    3.1. Data Description. The CSI 300 is chosen for empiricalanalysis and to examine the performance of the proposedmodel. This index comprises 179 stocks from Shanghai stock

    exchange and 121 stocks from Shenzhen stock exchange andis managed by the China Securities Index Company Ltd.

    Most researchers have chosen international indices in thepast, including S & P 500, NIKKEI 225, NASDAQ, DAX,and gold price as input variables. They have examined thecross relationship between stock market index and macroe-conomic variables. The potential input variables that canbe used for forecasting model mainly consist of the grossdomestic product (GDP), gross national product (GNP),short-term interest rate (ST), long-term interest rate (LT), andterm structure of interest rate (TS) [1, 31, 32].

    Although China has overtaken Japan to become theworld’s second largest economy and theChinese stockmarkethas developed into one of the most important markets in theglobal economy, Chinese consumption capacity is limited inthe domestic market. The movement of the stock market hasa close relationship with the money available of the investors,which is determined by the money supply and the interestrate. Considering that the Chinese stock market is affectedby the global economic situation as well as the domesticeconomic development, we choose US Dollar Index (USDX),Shanghai Interbank Offered Rate (SHIBOR), P/E ratio (PE),money supply (M2), repurchase agreement (REPO), ChinaCNY Monthly New Loan, market capitalization of the 300publicly traded companies (mkt cap), People’s Bank 5-yearCDS, and short-mid note as input variables.

    The lag of input variables is 3 days. We use the daily datato predict the CSI 300 index by nonlinear SVM regression.Since M2, short-mid note, and New Loan are published oncea month, we transform these variables into daily variables bydividing them by a daily variable. We divided all data setsinto two sections and used the first section as the trainingpart to find the optimal parameters for the LSSVM and avoidoverfitting by training and validating the model. The othersection is used for testing. As shown in Table 1, we choosenine variables as the input variables and one variable as theoutput variable including 643 daily data fromMay 1, 2009, toAugust 23, 2011, to train the parameters in the model. Oncewe obtain these parameters, we use the same input and outputvariables from August 24, 2011, to January 20, 2012, including100 daily data to examine the performance of different modelin the testing part.

    In the hybrid wavelet denoising least squares supportvector machine model (WD-LSSVM), we first denoise theCSI 300 index with wavelet denoising technique. As shownin Figure 1, the original data, which is depicted in the upperpart of the figure, is packed with irrelevant noise. Then thewavelet denoising algorithm is applied to reduce the noise inthe upper figure of Figure 1. The denoised data is depicted inthe lower part of Figure 1 and it is clear that the denoised data

  • Mathematical Problems in Engineering 5

    05/05/2009 28/12/2009 10/07/2010 21/01/2011 23/08/2010150020002500300035004000

    Date (dd/mm/yy)

    CSI 3

    00 in

    dex

    Original data

    05/05/2009 28/12/2009 10/07/2010 21/01/2011 23/08/2010150020002500300035004000

    Date (dd/mm/yy)

    CSI 3

    00 in

    dex

    Denoised data

    Figure 1: The original and denoised daily CSI 300 index.

    Table 2: Parameters setting for simplex, GS, GA, and PSO.

    Method Parameter settingSimplex Chi: 2, Gamma: 0.5, Rho: 1, Sigma: 0.5GS TolX: 0.001, maxFunEvals: 70, grain: 7, zoomfactor: 5

    GA Sizepop: 20, maxgen: 200, 𝑐min: 0, 𝑐max: 100, 𝑔min: 0,𝑔max: 1000

    PSO Sizepop: 20, maxgen: 200, 𝑐min: 0, 𝑐max: 100, 𝑔min: 0,𝑔max: 1000, π‘˜: 0.6

    can better reveal the trend of the index. Also, in both EMD-LSSVM and WD-LSSVM models, we preprocess the inputdata by scaling to the range of [0, 1] to prevent small numbersin the data sets from being overshadowed by large numbers,resulting in loss of information.

    3.2. Optimization Methods and Parameters Setting. In bothEMD-LSSVM models and WD-LSSVM, we try four kindsof search methods, that is, simplex, GS, GA, and PSO. Inthe simplex method, we define the parameters of expanding(Chi), contracting (Gamma), reflecting (Rho), and shrinking(Sigma) and get the optimal parameters for SVM throughiteration until the stopping criteria is satisfied. Also, bycalculating the objective function, we can get all points in thegrids, which are related to the range and the unit grid searchsize.

    The optimal parameters can be obtained from the pointwhich has the lowest cost. Another effective method tosolve optimization problem is the genetic algorithm. Thefirst step of this method is to randomly select parents fromthe population.Then, parents produce children continuously.Step by step, the population eventually develops and optimalsolution can be obtained when the stopping criteria are met.The PSO algorithm works by moving the candidate solution(particles) within the given search range. These particles aremoved by the best known positions of particles and the entire

    Table 3: Performance metrics and their calculations.

    Metrics Calculation

    NMSENMSE =

    βˆ‘π‘›

    𝑖=1(π‘Žπ‘–βˆ’ 𝑝𝑖)2

    𝛿2𝑛

    𝛿2=

    βˆ‘π‘›

    𝑖=1(π‘Žπ‘–βˆ’ π‘Ž)2

    𝑛 βˆ’ 1

    MAPE MAPE =1

    𝑛

    𝑛

    βˆ‘

    𝑖=1

    π‘Žπ‘–βˆ’ 𝑝𝑖

    𝑝𝑖

    Γ— 100%

    HRHR =

    βˆ‘π‘›

    𝑖=1𝑑𝑖

    𝑛

    𝑑𝑖=

    {

    {

    {

    1 if (π‘Žπ‘–βˆ’ π‘Žπ‘–βˆ’1) (π‘π‘–βˆ’ π‘π‘–βˆ’1) β‰₯ 0

    0 otherwise

    Table 4: Results of eight different forecasting models.

    Model NMSE MAPE HREMD-LSSVM (simplex) 0.0253 0.79222% 77.7778%EMD-LSSVM (GS) 0.0245 0.78834% 79.798%EMD-LSSVM (GA) 2.5749 9.0641% 42.4242%EMD-LSSVM (PSO) 9.4471 18.2733% 40.404%WD-LSSVM (simplex) 0.0521 1.1772% 65.6566%WD-LSSVM (GS) 0.0609 1.1357% 61.6162%WD-LSSVM (GA) 0.0657 1.2997% 62.6263%WD-LSSVM (PSO) 0.0910 1.5072% 63.6364%

    swarm in the search space. When the particles arrive at abetter position, they guide the swarm tomove.The procedureis repeated until the stopping criteria are satisfied. In ourexperiment, Table 2 shows the setting of each optimizationmethod.

    3.3. Performance Criteria. We evaluate the performance ofthese models using three measurement methods, that is,normalizedmean squared error (NMSE), mean absolute per-centage error (MAPE), and the hitting ratio (HR) (Table 3).NMSE and MAPE are designed to measure the deviationof predicted value from the actual value; smaller values ofNMSE and MAPE indicate better performance of the model.In the stock market, smaller values of MAPE and NMSE areable to control investment risk. We also introduce hittingrate to evaluate the model since the HR reveals accuracy ofprediction of theCSI 300, which is valuable for individual andinstitutional traders.

    3.4. Experiment Results. The experiments explore fourparameter selectionmethods in both EMD-LSSVMandWD-LSSVM. Results of the experiments are as in Table 4. Fromthe results, we can see that the hybrid model EMD-LSSVMwith GS parameter optimization method not only has thesmallest NMSE and MAPE but also gets the best hitting rate,which means it outperforms the other model with differentparameter search methods.

    From the experiment results, we can draw three conclu-sions.

  • 6 Mathematical Problems in Engineering

    (1) For overall accuracy, the EMD-LSSVM (GS) is thebest approach, followed by EMD-LSSVM (simplex),WD-LSSVM (simplex), WD-LSSVM (PSO), WD-LSSVM (GA), and WD-LSSVM (GS). Hitting ratesof the other approaches are below 60%. Predictionaccuracy of all methods is also related to the chosensample. So it is difficult to identify which model is thebest and performs the best. However, tests based onthe same samplemay help us identify which is the bestmodel.

    (2) According to the experiments, the PSO and GAneed more computational time to obtain the bestparameters for themodel compared with simplex andGS optimizationmethods. Although the PSO andGAalgorithm are relatively more complex than the othertwomethods, they do not perform better thanGS andsimplex.

    (3) Another interesting finding is that thresholds of thedenoising algorithm also influence the performanceof the model. When the threshold is too large, usefulinformation in the data gets damaged. Besides, a smallthreshold makes the denoising process insignificantfor handling noise. Therefore, we argue that theperformance of the wavelet denoising algorithm issensitive to the estimation method of the thresholdlevel.

    4. Conclusion

    We have examined the use of the hybrid EMD-LSSVM andWD-LSSVM models to predict financial time series by fourdifferent parameters selection methods in this paper. Thestudy shows that the hybrid EMD-LSSVM model providesa better way to forecast financial time series compared withWD-LSSVM. The key findings contain two aspects. First,empirical mode decomposition can serve as a potential toolfor removing noise from original data during the modelingprocess and improving the prediction accuracy. Second, wecompare four kinds of search methods for parameters in theexperiments. The results show that the EMD-LSSVM withGS parameter optimization method provides the best per-formance. Use of the GS algorithm reduces the computationtime and improves the prediction accuracy of the model forforecasting financial time series.

    Future research in this direction mainly includes gainingbetter understanding of the relationship between optimalloss function, noise distribution, and the number of trainingsamples. In this paper, we only consider applying differentalgorithm to denoise the original data without consideringthe distribution of the noise. The research on the densityof noise which will be reduced for the SVM model willattract the effort of us.Moreover, another interesting researchdirection is to figure out the minimum number of samplesbased on which a theoretically optimal loss function willindeed have superior generalization performance.

    Conflict of Interests

    The authors declare that there is no conflict of interestsregarding the publication of this paper.

    References

    [1] W. Huang, Y. Nakamori, and S.-Y. Wang, β€œForecasting stockmarket movement direction with support vector machine,”Computers and Operations Research, vol. 32, no. 10, pp. 2513–2522, 2005.

    [2] J. W. Hall, β€œAdaptive selection of U.S. stocks with neural nets,”in Trading on the Edge: Neural, Genetic, and Fuzzy Systems forChaotic Financial Markets, G. J. Deboeck, Ed., John Wiley &Sons, New York, NY, USA, 1994.

    [3] Y. S. Abu-Mostafa and A. F. Atiya, β€œIntroduction to financialforecasting,”Applied Intelligence, vol. 6, no. 3, pp. 205–213, 1996.

    [4] W. Cheng, L. Wagner, and C.-H. Lin, β€œForecasting the 30-yearUS treasury bond with a system of neural networks,” Journal ofComputational Intelligence in Finance, vol. 4, pp. 10–16, 1996.

    [5] R. Sharda and R. B. Patil, β€œA connectionist approach to timeseries prediction: an empirical test,” in Neural Networks inFinance and Investing: Using Artificial Intelligence to ImproveRealWorld Performance, R. R. Trippi and E. Turban, Eds., IrwinProfessional Publishing, Chicago, Ill, USA, 1996.

    [6] J. R. van Eyden, The Application of Neural Networks in theForecasting of Share Prices, Finance and Technology Publishing,Haymarket, Va, USA, 1996.

    [7] I. Kaastra and M. S. Boyd, β€œForecasting futures trading volumeusing neural networks,” Journal of Futures Markets, vol. 15, pp.853–970, 1995.

    [8] G. Zhang and M. Y. Hu, β€œNeural network forecasting of theBritish pound/US dollar exchange rate,” Omega, vol. 26, no. 4,pp. 495–506, 1998.

    [9] W.-C. Chiang, T. L. Urban, and G. W. Baldridge, β€œA neuralnetwork approach to mutual fund net asset value forecasting,”Omega, vol. 24, no. 2, pp. 205–215, 1996.

    [10] F. E. H. Tay and L. Cao, β€œApplication of support vectormachinesin financial time series forecasting,” Omega, vol. 29, no. 4, pp.309–317, 2001.

    [11] V.N. Vapnik,TheNature of Statistical LearningTheory, Springer,New York, NY, USA, 2nd edition, 2000.

    [12] K.-R. Muller, A. J. Smola, G. Ratsch, B. Scholkopf, J. Kohlmor-gen, and V. N. Vapnik, β€œPredicting time series with supportvector machines,” in Proceedings of the International Conferenceon Artificial Neural Networks, pp. 999–1004, Lausanne, Switzer-land, 1997.

    [13] S. Mukherjee, E. Osuna, and F. Girosi, β€œNonlinear predictionof chaotic time series using support vector machines,” inProceedings of the IEEEWorkshop on Neural Networks for SignalProcessing (NNSP ’97), pp. 511–520, Amelia Island, Fla, USA,September 1997.

    [14] V. N. Vapnik, S. E. Golowich, and A. Smola, β€œSupport vectormethod for function approximation, regression estimation, andsignal processing,” Advances in Neural Information ProcessingSystems, vol. 9, pp. 281–287, 1996.

    [15] C.-L. Huang, M.-C. Chen, and C.-J. Wang, β€œCredit scoring witha data mining approach based on support vector machines,”Expert Systems with Applications, vol. 33, no. 4, pp. 847–856,2007.

  • Mathematical Problems in Engineering 7

    [16] K.-J. Kim, β€œFinancial time series forecasting using supportvector machines,” Neurocomputing, vol. 55, no. 1-2, pp. 307–319,2003.

    [17] C.-J. Lu, T.-S. Lee, and C.-C. Chiu, β€œFinancial time seriesforecasting using independent component analysis and supportvector regression,” Decision Support Systems, vol. 47, no. 2, pp.115–125, 2009.

    [18] V. Cherkassky and Y. Ma, β€œPractical selection of SVM parame-ters and noise estimation for SVM regression,”Neural Networks,vol. 17, no. 1, pp. 113–126, 2004.

    [19] G. S. Vijay, H. S. Kumar, P. P. Srinivasa, N. S. Sriram, andR. B. K. N. Rao, β€œEvaluation of effectiveness of wavelet baseddenoising schemes using ANN and SVM for bearing conditionclassification,”Computational Intelligence andNeuroscience, vol.2012, Article ID 582453, 12 pages, 2012.

    [20] L. Zhou, K. K. Lai, and L. Yu, β€œLeast squares support vectormachines ensemble models for credit scoring,” Expert Systemswith Applications, vol. 37, no. 1, pp. 127–133, 2010.

    [21] Y. Bao, X. Zhang, L. Yu, K. K. Lai, and S. Wang, β€œAn integratedmodel using wavelet decomposition and least squares supportvector machines for monthly crude oil prices forecasting,” NewMathematics andNatural Computation, vol. 7, no. 2, pp. 299–311,2011.

    [22] C.-J. Lu and Y. E. Shao, β€œForecasting computer products salesby integrating ensemble empirical mode decomposition andextreme learningmachine,”Mathematical Problems in Engineer-ing, vol. 2012, Article ID 831201, 15 pages, 2012.

    [23] N. E. Huang, Z. Shen, S. R. Long et al., β€œThe empiricalmode decomposition and the Hilbert spectrum for nonlinearand non-stationary time series analysis,” The Royal Society ofLondon. Proceedings A: Mathematical, Physical and EngineeringSciences, vol. 454, no. 1971, pp. 903–995, 1998.

    [24] N. E. Huang, Z. Shen, and S. R. Long, β€œA new view of nonlinearwater waves: the Hilbert spectrum,” Annual Review of FluidMechanics, vol. 31, pp. 417–457, 1999.

    [25] L. Yu, S. Wang, and K. K. Lai, β€œAn EMD-based neural networkensemble learningmodel forworld crude oil spot price forecast-ing,” in Soft Computing Applications in Business, B. Prasad, Ed.,vol. 230 of Studies in Fuzziness and Soft Computing, pp. 261–271,Springer, 2008.

    [26] L. Yu, K. K. Lai, S. Wang, and K. He, β€œOil price forecasting withan EMD-based multiscale neural network learning paradigm,”in International Conference on Computational Science, pp. 925–932, 2007.

    [27] S. Zhou and K. K. Lai, β€œAn improved EMD online learning-based model for gold market forecasting,” in Proceedings of the3rd International Conference on Intelligent Decision Technolo-gies, pp. 75–84, 2011.

    [28] K. He, C. Xie, and K. K. Lai, β€œEstimating real estate value-at-risk using wavelet denoising and time series model,” inComputational Scienceβ€”ICCS 2008, vol. 5102 of Lecture Notesin Computer Science, pp. 494–503, Springer, Berlin, Germany,2008.

    [29] S. Zhou, K. K. Lai, and J. Yen, β€œA dynamic meta-learning rate-based model for gold market forecasting,” Expert Systems withApplications, vol. 39, no. 6, pp. 6168–6173, 2012.

    [30] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, A Practical Guide toSupport Vector Classification, Department of Computer Scienceand Information Engineering, University of National Taiwan,Taipei, Taiwan, 2003.

    [31] J. Lakonishok, A. Shleifer, and R. W. Vishny, β€œContrarianinvestment, extrapolation, and risk,” Journal of Finance, vol. 49,pp. 1541–1578, 1994.

    [32] M. T. Leung, H. Daouk, and A.-S. Chen, β€œForecasting stockindices: a comparison of classification and level estimationmodels,” International Journal of Forecasting, vol. 16, no. 2, pp.173–190, 2000.

  • Submit your manuscripts athttp://www.hindawi.com

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    MathematicsJournal of

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Mathematical Problems in Engineering

    Hindawi Publishing Corporationhttp://www.hindawi.com

    Differential EquationsInternational Journal of

    Volume 2014

    Applied MathematicsJournal of

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Probability and StatisticsHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Journal of

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Mathematical PhysicsAdvances in

    Complex AnalysisJournal of

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    OptimizationJournal of

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    CombinatoricsHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    International Journal of

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Operations ResearchAdvances in

    Journal of

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Function Spaces

    Abstract and Applied AnalysisHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    International Journal of Mathematics and Mathematical Sciences

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Algebra

    Discrete Dynamics in Nature and Society

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Decision SciencesAdvances in

    Discrete MathematicsJournal of

    Hindawi Publishing Corporationhttp://www.hindawi.com

    Volume 2014 Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Stochastic AnalysisInternational Journal of