Modeling and Forecasting Stock Index Returns using …813380/FULLTEXT01.pdf · This section will...

Modeling and Forecasting Stock Index Returns usingIntermarket Factor Models

Predicting Returns and Return Spreads using Multiple Regression and Classification

Emil Tingstrom

SA104X Degree Project in Mathematical Statistics

Department of Mathematical StatisticsRoyal Institute of Technology

Supervisor: Henrik HultStockholm 2015

Abstract

The purpose of this thesis is to examine the predictability of stock indices with regressionmodels based on intermarket factors. The underlying idea is that there is some correlationbetween past price changes and future price changes, and that models attempting to capturethis could be improved by including information derived from correlated assets to makepredictions of future price changes. The models are tested using the daily returns from Swedishstock indices and evaluated from a portfolio perspective and their statistical significance.Prediction of the direction of the price is also tested by Support vector machine classificationon the OMXS30 index. The results indicate that there is some predictability in the market, indisagreement with the random walk hypothesis.

Sammanfattning

Syftet med denna uppsats ar att undersoka forutsagbara tendenser hos aktieindex medregressionsmodeller baserade pa intermarket-faktorer. The bakomliggande iden ar att detexisterar en viss korrelation mellan foregaende prisrorelser och framtida prisrorelser, ochatt modeller som forsoker fanga det kan forbattras genom att inkludera information frankorrelerade tillgangar for att forutspa framtida prisforandringar. Modellerna testas med dagligadata pa svenska aktieindex och utvarderas fran ett portfoljperspektiv och deras statistiskasignifikans. Forutsagelser av riktningen hos priset testas ocksa genom klassifikation med enStodvektormaskin pa OMXS30-index. Resultaten indikerar att det finns vissa forutsagbaratendenser i motsats till hypotesen om slumpmassiga aktiepriser.

Acknowledgments. I would like to thank my supervisor at the Department of Mathematics,Henrik Hult for good feedback and supervision. I am also thankful to Margareta Olofsson andCAS/CAW for helpful suggestions during the writing process.

2

Contents

1 Introduction 11.1 Quantitative trading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Stock indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Initial Data Analysis 3

3 Model and Methodology 53.1 Ordinary Least Squares multiple regression . . . . . . . . . . . . . . . . . . . . . . 53.2 Coefficient of determination R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.3 Data snooping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.4 Akaike Information Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.5 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.6 Portfolio evaluation and Sharpe ratio . . . . . . . . . . . . . . . . . . . . . . . . . . 63.7 Testing for statistical significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Results and Analysis 84.1 Predicting the next day’s return for OMXS30 . . . . . . . . . . . . . . . . . . . . . 84.2 Using returns over several days as input . . . . . . . . . . . . . . . . . . . . . . . . 114.3 Prediction of deviations from OMXS30 . . . . . . . . . . . . . . . . . . . . . . . . . 114.4 Practical trading considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5 Support Vector Machine 135.1 Linear SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5.1.1 Dual form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2 Soft margin and kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5.2.1 Nonlinear kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.3 Results for SVM on OMXS30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5.3.1 Results for linear SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.3.2 Results with radial kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

6 Discussion and Conclusion 17

References 19

1 Introduction

This section will introduce the subject of quantitative trading and trading strategies and notesome prior research on the subject. The purpose of this paper will then be presented as well as theoutline of the report.

1.1 Quantitative trading

With the advent of information technology and computers the method of approaching the market byquantitative models have become common. Quantitative trading models makes use of mathematicaland statistical analysis to exploit predictable patterns within financial data to base trading decisionon.

Strategies based on quantitative trading models can usually be classified as either contrarian ortrend following in their approach. Contrarian strategies attempt to trade against price changes,seeking to capitalize when the price returns to its previous equilibrium level. In contrast trendfollowing strategies attempt to trade in the direction of previous price changes to capitalize onshifts in the balance of supply and demand. The success of these strategies depend on how wellpast price changes correlate with future price changes. If there was no correlation to exploit, thenthe logarithm of the price at a point in time Xt could be represented by

Xt = Xt−1 + εt (1)

with E[εt] = 0 and zero autocorrelation E[εtετ ] = 0 for t 6= τ . This is referred to as the random walkmodel and is consistent with the hypothesis that markets are efficient. Evidence from academicresearch has suggested that this models might be flawed and that stock index prices exhibit somelevel of correlation E[εtετ ] 6= 0 that would lead to predictability, however it might not be largeenough to produce a risk-adjusted return above the risk-free rate after accounting for transactioncosts [6].

1.2 Stock indices

The OMXS30 PI stock index is a price index that represent the 30 most heavily traded stockslisted in Stockholm on Nasdaq OMX. It is generally used to track broad market movements in theSwedish stock market since it accounts for a significant share of the total market capitalizationlisted in Sweden. Other indices seek to track the performance of specific business sectors used toselect stock or track the performance of a specific segment based on market capitalization such asstocks with a small capitalization.

1.3 Purpose

The purpose of this thesis is to investigate the predictability of Swedish stock indices. A previousstudy investigating contrarian strategies on the OMXS30 using daily data has found that the indexexhibit some tendency to regain short term losses and pullback after short term gains, indicating anegative autocorrelation in short term returns [2]. Another study examining autoregressive modelsto predict European stock indices found that while the performance using the past 1 to 10 daysreturn was generally poor, it improved by including the returns of other, correlated indices as inputto the model. The intuition behind this was that combining correlated variables would eliminatesome of the white noise, since a linear combination of the uncorrelated noise will offset each other[7].

The main idea underlying this paper is that prediction of stock indices could be improved byincluding correlated sector indices into the prediction model. This will serve two purposes, firstto eliminate some of the white noise included in a prediction using simple autoregressive modelbased on only the return of the predicted time series. By including the return for other correlatedtime series as input to the model performance will improve. The second purpose of includingsector indices is that there might exist lead or lag effects between the indices that is not driven by

1

autocorrelation but by intermarket relationships. For example capital flow into a riskier sectorcould be an indication of positive market sentiment and positive future price changes.

Based on this the attempt is made to predict the future return of an asset based on the return ofa group of correlated stock indices. Using different method of regression, future return of differentassets are predicted and the forecast is evaluated from a portfolio perspective and statisticalsignificance. The dataset analyzed here is consists of the closing price for main index for theSwedish stock market, OMXS30 PI, along with different sector indices for the period 2002-12-27 to2015-04-13. Numerical analysis is carried out using the programming language R with additionalpackages for the statistical analysis.

1.4 Outline

The report is divided into three parts. The first part (chapter 3) introduces the multivariateautoregressive model and the methods used in the analysis. The second part (chapter 4) makesuse of the same regression model to predict the broad market as well as deviations in the returnof sector indices from the broad market index and discusses some practical implications of realtrading. The third part (chapter 5) introduces machine learning with the Support Vector Machinefor classification as an alternative to multiple regression and tests the model on OMXS30.

2

2 Initial Data Analysis

The closing prices for all indices are collected from Nasdaq OMX and cleaned by removing dateswhere the price for any index was missing. The indices used are:

• OMXS30 PI

• OMXS Oil & Gas PI

• OMXS Financials PI

• OMXS Automobiles & Parts PI

• OMXS Health Care PI

• OMXS Industrials PI

• OMXS Consumer Services PI

• OMXS Consumer Goods PI

• OMXS Utilities PI

• OMXS Food Producers PI

• OMXS Basic Materials PI

• OMXS Travel & Leisure PI

• OMXS Technology PI

• OMXS Telecommunications PI

• OMXS Small Cap PI

• OMXS Mid Cap PI

From this the daily log return can be computed.

Xt = ln(Pt)− ln(Pt−1) = ln

(PtPt−1

)(2)

A total of 3084 data points per index is used. The correlation between the returns are displayed inTable 1.

The the trend and volatility of the sample are shown in Table 2.

3

OM

XS

30

Oil

&G

as

Fin

an

cial

s

Au

tom

obil

es&

Part

s

Hea

lth

Car

e

Ind

ust

rial

s

Con

sum

erS

ervic

e

Con

sum

erG

ood

s

Uti

liti

es

Food

Pro

du

cers

Basi

cM

ate

rial

s

Tra

vel

&L

eisu

re

Tec

hn

olo

gy

Tel

ecom

mu

nic

atio

ns

Sm

all

Cap

Mid

Cap

OMXS30 1 .55 .93 .68 .62 .93 .78 .83 .27 .43 .77 .56 .69 .71 .72 .82Oil & Gas 1 .53 .45 .37 .54 .42 .50 .19 .32 .55 .34 .31 .40 .53 .59Financials 1 .63 .55 .84 .68 .77 .26 .43 .74 .56 .55 .63 .70 .81Automobiles & Parts 1 .49 .70 .55 .67 .25 .41 .63 .50 .45 .47 .67 .74Health Care 1 .55 .50 .57 .22 .37 .51 .42 .41 .45 .57 .61Industrials 1 .69 .79 .27 .43 .79 .56 .56 .61 .71 .83Consumer Service 1 .66 .20 .39 .58 .52 .46 .52 .59 .69Consumer Goods 1 .26 .45 .71 .52 .51 .57 .67 .78Utilities 1 .21 .27 .21 .19 .18 .33 .32Food Producers 1 .39 .35 .26 .33 .52 .56Basic Materials 1 .51 .45 .51 .67 .78Travel & Leisure 1 .35 .40 .59 .69Technology 1 .43 .50 .52Telecommunications 1 .52 .57Small Cap 1 0.86Mid Cap 1

Table 1: Return correlation matrix.

Mean Standard deviationOMXS30 0.086 0.223Oil & Gas 0.062 0.415Financials 0.097 0.25Automobiles & Parts 0.115 0.217Health Care 0.069 0.179Industrials 0.117 0.257Consumer Service 0.101 0.211Consumer Goods 0.070 0.199Utilities -0.103 0.357Food Producers 0.103 0.204Basic Materials 0.070 0.283Travel & Leisure 0.038 0.26Technology 0.050 0.313Telecommunications 0.026 0.229Small Cap 0.102 0.139Mid Cap 0.127 0.173

Table 2: Mean and standard deviation of daily log returns annualized.

4

3 Model and Methodology

The main model tested in this paper is a multivariate autoregressive (AR) model written as

Yt =∑k

βkXkt−1 + εt (3)

where Y is the return of the asset to be predicted and Xk represents the returns of each sectorindex from the previous day. The model is based on the hypothesis that the next day’s returnof an asset (for example OMXS30) can be partially described by the previous day’s return of agroup of sector indices. The coefficients need to be estimated before the model can be used, and toimprove the accuracy of the models predictions the variables Y and Xk can be transformed andrefined by excluding irrelevant variables.

3.1 Ordinary Least Squares multiple regression

The method of regression used for estimating the coefficients in the model is by Ordinary LeastSquares (OLS). In (3) the coefficients βk are obtained by projecting the vector for the dependentvariable Y onto the space spanned by the vectors of the covariates Xk and taking the resultingcoefficients. In matrix form the solution is

β = (XᵀX)−1XᵀY (4)

which will minimize the sum of the squared residual∑i ε

2i .

3.2 Coefficient of determination R2

The coefficient of determination, or R2, is a number that indicates how well data fit a regressionmodel. This is defined as

R2 = 1−∑ε2i∑

(Yi − Y )2(5)

where Y is the mean value of Yi. The coefficient ranges from 0 to 1 and can be interpreted as thepercentage of the variations explained by the model.

3.3 Data snooping

When estimating the coefficients for the regression model the resulting predictions cannot be usedto validate the model on the same set of data. This is commonly known as data snooping, where thehypothesis tested on a sample is also the one suggested by the same sample. To ensure performanceout-of-sample going forward in time the prediction must be made using estimates made on theavailable data at the time. Also, due to the non-stationary properties of the underlying processthe estimates will likely change over time. Taking this into account, the model will use a rollingwindow for estimating the coefficients and only include the past N tradings days in the regression.Longer look-back period will reduce the noise in the estimates, but will also be less responsivewhen the estimates change.

3.4 Akaike Information Criterion

One problem when specifying the regression model is choosing which covariates that should beincluded. A common way of determining the specification is by Akaike Information Criterion (AIC).The AIC value of a model is given by

nln(∑

ε2t ) + 2k (6)

where k is the number of covariates in the model and n is the number of observations. Givena set of candidate models, the preferred model is the one that minimizes the AIC value. AIC

5

rewards models with small error, but at the same time penalizes models with many parameters todiscourage over-fitting the data [5].

Finding the model with the lowest AIC value is done by a stepwise algorithm, which startswith the full model and then in steps removes covariates to find improvements [8].

3.5 Normalization

The daily returns for the stock market are very dependent on the specific regime they fall into.The model specified in (3) does not include an intercept and therefore assumes the mean of Yt isclose to zero. This is a necessary restriction since any estimate of the long term mean return willbe highly dependent on the period in the sample and likely not a robust estimate of future returns.Also since the volatility will vary in the sample, the residuals of (3) will also be dependent on time,E(ε2t ) = σt i.e. there will be heteroskedasticity in the sample.

To account for this and improve the ability of the model to generalize out-of-sample, the dailylog returns will be normalized by percentile ranking the return Xk

t with returns for the past 252days (roughly one trading year), {Xk

t , Xkt−1, . . . , X

kt−251}. The percentile ranking is done by giving

the raw log return Xkt a rank according to its value in an ordered list, e.g. the lowest return for the

past year is given rank 1 and the highest rank 252 with ties being given the lowest, then calculating

the percentile for the return as Rt = Rank(Xt)−1252−1 . This is then bounded between -1 and 1 by the

transformation200%×Rt − 100% (7)

The scaling will center the percentile for the median (0.5) at zero, so that each variable is distributedaround zero. The same normalization procedure is applied to the independent variable Yt andthe dependent variables Xk

t . This will hopefully reduce some of the heteroskedasticity and centerthe mean close to zero. It will also remove the restriction of the linear model since any monotonedependency between the predicted variable and a covariate will be fitted in the model.

3.6 Portfolio evaluation and Sharpe ratio

The models will be evaluated by simulating a trading portfolio holding a 100% or -100% exposureto the return of the asset when the predicted next day returns is positive or negative respectively.The value of the portfolio is given by

V (t) = exp(

t∑i

Yi × sign(Yi)) (8)

where Yi is the actual raw log return of the asset and Yi is the return predicted by the model. Thisexpression gives the value of a portfolio with compounded returns since the sum of logarithms isequal to the logarithm of the product, ln(a) + ln(b) = ln(ab). When the variables are normalizedthis means that the next day’s return is expected to be positive when the predicted normalizedreturn Yi is positive. Since the normalized returns are centered with zero at the median of returnsfor the past 252 days this means that in the implementation the median is expected to be close tozero. This is a necessary assumption since it is difficult to predict what the median will be forfuture returns and reasonable given that it is likely small compared to other errors in the model’sprediction.

The performance of the portfolio will be evaluated using the Sharpe ratio which is a standardmeasure for calculating the risk-adjusted return of a portfolio. The Sharpe ratio is defined as

S =CAGR

σ(9)

where CAGR stands for Compound annual growth rate

CAGR =V (n)

V (0)

252n

− 1 (10)

6

which is the annual geometric percentage return of the portfolio and σ is the annualized standarddeviation of the portfolios return. The Sharpe ratio gives the average return earn per unit of risk,where risk is defined by the volatility of the portfolio. A high Sharpe ratio indicates that years withnegative return will be rare and that the value of the portfolio will tend upwards with a smoothpath.

3.7 Testing for statistical significance

While accurate predictions from the model will give a high expected Sharpe ratio, positive returnscould be obtain by chance alone due to the randomness in the data sample. This means that thecalculated Sharpe ratio is only an estimate of the true value and will contain errors that need to beaccounted for in the analysis. A common method to infer statistical significance of the estimatedvalues is by hypothesis testing. In statistical hypothesis testing a default position, called the nullhypothesis, is defined as the assumption that there is no phenomenon but random chance thataffect the results. The alternative hypothesis is that there is a phenomenon to be observed in thesample. From this a p-value is calculated, corresponding to the probability of observing resultsas extreme as those observed if the null hypothesis was true. Commonly a p-value lower than0.05 is considered significant favoring the alternative hypothesis over the null hypothesis since it isunlikely (less than 1 in 20) to observe such results if it was true.

To determine the significance of the Sharpe ratio the p-value will be calculated using a MonteCarlo method. If the model could not predict the next day’s return with any accuracy the expectedSharpe ratio would be equivalent to what would be obtained from random chance. The nullhypothesis H0 is therefore that the return and Sharpe ratio of the portfolio is equivalent to thoseobtain by chance by trading the asset at random with the same net exposure to the returns ofthe asset traded. In order to calculate the p-value corresponding to the probability of observinga Sharpe ratio as high or higher than the observed if the models predictions are random, thedistribution of Sharpe ratios for random predictions must be created. This is done by reorderingthe trading exposure for each day as given by sign(Yi) at random and calculating the resultingShape ratio S∗ for this random portfolio using equation (9). Repeating this process a sufficientlylarge number of times will give a distribution of Sharpe ratios that would be observed if the nullhypothesis was true. The p-value can now be calculated from the resampled distribution of Sharperatios according to P (S∗ ≥ S|H0), which will be the percentage of random Sharpe ratios that arehigher that or equal to the observed Sharpe ratio [1].

7

4 Results and Analysis

This section will give the results of the model explained in the previous section when applied toreturns of stock indices.

4.1 Predicting the next day’s return for OMXS30

The model is tested on OMXS30 using the previous days returns for all 16 indices, includingOMXS30 and the 15 sector indices. Each day the coefficients are estimated by a regression onthe past N = (500, 1000, 2000) days and the returns for the sector indices are used to make aprediction of the next day’s return for OMXS30. The resulting predictions are then used to simulatea portfolio that trades the OMXS30 based on the predicted direction for each day, taking a positiveor negative portfolio exposure to the OMXS30 index depending on the predicted direction. Theperformance of the portfolio is then evaluated by calculating its Sharpe ratio and the associatedp-value of this ratio for the period tested.

The results are tested using as input variables both raw returns, normalized returns andnormalized returns combined with AIC to select the most appropriate covariates each day.

N = 500 N = 1000 N = 2000Raw Returns -0.0939 (0.6071) 0.1137 (0.3936) -0.2957 (0.7068)Normalized Returns 0.1428 (0.3271) 0.5962 (0.0662) 0.7492 (0.1089)Normalized Returns & AIC 0.2593 (0.2322) 0.7195 (0.0372) 1.1327 (0.0304)

Table 3: Performance on OMXS30, Sharpe ratio and the associated p-value.

Table 3 summarizes the Sharpe ratio with different combinations for the model along with thep-value for the Sharpe ratio inside the parentheses. The Sharpe ratios increase with each addedspecification to the model. A notable improvement is the use of normalized returns instead of raw,which allows for better generalization out-of-sample. With normalized variables the Sharpe ratioswere positive for each value of N, but only borderline significant for N = 1000. The results weresignificant (p-value bellow 0.05) for N = 1000 and N = 2000 using predictions from the model withboth normalized variables and stepwise AIC selection. This is in agreement with the hypothesisthat AIC is useful to remove irrelevant covariates in the model, since including some sector indicesmight be redundant or introduce more noise in the predictions.

The R2 for each regression is fairly low, around 0.005 to 0.03. This is expected since anypredictable component will likely be small and noise will dominate. The return for OMXS30is negatively correlated with the predicted return for the next day, indicating that negativeautocorrelation is a component in the predictions of model.

The plots for the value of the portfolios with normalization and AIC during the period areshown in figures 1 to 3. These show the growth of the three portfolios trading based on thepredictions made by the models with normalized variables and AIC selection. Figure 1 show thatthe portfolio based on a 500 day regression has a fairly volatile growth, while the portfolio based ona 2000 day regression is smoother. This is indicated by table 3 where the Sharpe ratio is higher forthe N = 2000 than for N = 500. Longer periods used for the estimation of the coefficients will givemore robust estimates, however it will be less responsive when there is a change in the underlyingprocess.

8

Figure 1: Portfolio with normalization and AIC.


9


10

4.2 Using returns over several days as input

The previous model used only the one day returns as inputs and better performance could possiblybe obtained by using the return over more than one day. By redefining the input variables as

Xt = ln

(PtPt−n

)(11)

and normalizing the result in the same way as before the model with AIC is tested on the samesample data, with the results shown in Table 4.

N = 500 N = 1000 N = 2000n = 2 0.6655 (0.0293) 0.9756 (0.0096) 0.7128 (0.1188)n = 3 0.6428 (0.0362) 0.7965 (0.0267) 0.4240 (0.2187)n = 4 0.1678 (0.3085) 0.1869 (0.3222) 0.7677 (0.1099)n = 5 0.2803 (0.2094) 0.8389 (0.0217) 0.9719 (0.0528)

Table 4: Sharpe ratio and the associated p-value on OMXS30.

The performance is similar to the model with one day’s return as input, with no significantimprovement. However, with longer term returns as inputs, the portfolio adjustments will beless frequent. This is an advantage from a trading perspective as less turnover will mean lesstransaction costs. Portfolio plots are included in the appendix.

4.3 Prediction of deviations from OMXS30

The previous section examined the performance of the model on when predicting daily returns forthe broad market index. However, the returns series will contain a lot of white noise that is notcaptured by the model. One way to reduce the noise and get better predictions from the modelcould be to use the return of a sector index and subtract the return of the broad market, predictingthe relative return of a sector. The independent variable is calculated as

Yt = ln

(PtPt−1

)− ln

(POMXS30t

POMXS30t−1

)(12)

where Pt is the price of a sector index. The returns are then ranked as a percentile of the 252previous days returns as described in 3.4.

The performance of the model is tested in the same way as the previous section, by simulatingtrading a synthetic asset with the daily return given by Yt in equation (12) with normalizedvariables and AIC selection for each regression.

Table 5 displays the results for each sector index in the data set. The results are generallysignificant and stable for different values of N, however varies depending on the specific sector.This lead to the question of the practical implications of the results.

11

N = 500 N = 1000 N = 2000Oil & Gas 0.5846 (0.0512) 0.9355 (0.0132) 0.4582 (0.2118)Financials 1.1349 (0.0006) 1.1145 (0.002) 0.8566 (0.0811)Automobiles & Parts 2.0158 (<0.0001) 2.2351 (<0.0001) 1.4071 (0.0123)Health Care 0.0499 (0.4407) 0.3605 (0.1684) -0.3146 (0.7186)Industrials 0.8743 (0.0052) 0.9446 (0.0084) 1.8143 (0.0011)Consumer Service -0.1072 (0.6292) -0.1988 (0.7078) 0.5747 (0.1558)Consumer Goods 0.9478 (0.0031) 1.2454 (0.0009) 1.4162 (0.0081)Utilities 0.8361 (0.0159) 1.6338 (0.0002) 1.5401 (0.0083)Food Producers 1.6500 (<0.0001) 1.5951 (0.0002) 1.9845 (0.0009)Basic Materials 1.3218 (0.0002) 1.4859 (0.0002) 0.5265 (0.1779)Travel & Leisure 1.0400 (0.0022) 1.4865 (0.0003) 1.1418 (0.0280)Technology 0.3136 (0.1743) 0.5102 (0.0970) -0.4474 (0.7951)Telecommunications -0.1204 (0.6404) 0.0535 (0.4472) -0.2625 (0.6891)Small Cap 2.7341 (<0.0001) 2.2689 (<0.0001) 2.2845 (0.0002)Mid Cap 3.4078 (<0.0001) 3.7197 (<0.0001) 3.3986 (<0.0001)

Table 5: Sharpe ratio and the associated p-value for portfolios trading the relative return of thesector versus OMXS30.

4.4 Practical trading considerations

For accurate evaluation of the results some of the assumptions made in the simulation need to beconsidered. The first is that stock indices are not a tradable asset. To get exposure to the pricechanges of a stock index a trader need to either invest in the stocks that make up the componentsor in a derivative with the index as its underlying. The other consideration is that transactioncosts will degrade the returns. Unless the transaction costs per trade are low, with a daily tradingfrequency the returns will be significantly affected when these are considered.

Another perhaps more important consideration that need to be made is that trading in lessliquid assets will introduce significant slippage. For example the price of a stock with low turnoverwill move if a large buy order arrives at the market. The difference between the expected priceand the price received is what is referred to as slippage. From table 5 it can be seen that theperformance is generally better for less liquid sector indices, such as the mid and small cap indices.The liquidity issue will restrict the size and profit of a portfolio trading the sector. This is less of aproblem for simulations on OMXS30 due to the high liquidity of its component stocks and relatedfuture contracts.

12

5 Support Vector Machine

The previous models all used linear regression to determine the next day return and then used thepredicted sign for testing. This section examines a more advanced method for predicting just thesign by classification using the Support vector machine (SVM).

5.1 Linear SVM

Given some training data, with N points of the form {(xi, yi)}Ni=1, where xi ∈ Rp is a p-dimensionalvector and yi ∈ {−1, 1} is the associated label. The training data is said to linearly separable ifthere exists a vector w and a scalar b such that

w · xi + b ≥ 1 if yi = 1 (13)

w · xi + b ≤ −1 if yi = −1 (14)

holds for all i. The inequalities can be rewritten in one equation as

yi(w · xi + b) ≥ 1 (15)

so that the training data is separated by the hyperplane w · x + b = 0 with the margin of theseparation given by the distance between the two hyperplanes

w · xi + b = 1 (16)

w · xi + b = −1 (17)

The distance between the hyperplanes is 2|w| , meaning that the best separation of the training data

is obtained by minimizing 12 |w|

2 (using the square and the factor 12 for mathematical convenience).

The optimization problem is therefore

arg minw,b

1

2|w|2 (18)

subject toyi(w · xi + b) ≥ 1 (19)

for i = 1, . . . ,N. This can be solved using solution methods for quadratic programming.

5.1.1 Dual form

Since w is determined by the hyperplane that allows for perfect separation of the data pointsaccording to yi, it will depend on the points xi that lie precisely on the margin. These vectors xi

are called Support vectors and satisfy yi(w · xi + b) = 1. Writing w as a linear combination of thetraining vectors

w =∑i

αiyixi

for some constants αi ≥ 0, non-zero only for the corresponding support vectors. Using |w|2 = wᵀ ·wand deriving the Lagrangian, it is possible to show that the optimizations problem has a dual formof

arg maxαi≥0

N∑i=1

αi −1

2

N∑j,k

αjαkyjykk(xj,xk)

(20)

subject toN∑i=1

αiyy = 0 (21)

where k(xj,xk) = xjᵀ · xk is the inner product in the Euclidean space, here called the kernel [4].

The explicit use of the inner product allows for transformation of the Euclidean space into otherproduct spaces by applying a kernal method. This will be explained in the following sections.

13

Figure 4: The hyperplane with the support vector.

5.2 Soft margin and kernels

If the training data is not linearly separable no hyperplane exists that can split the sample. Thismeans that in order to find a solution some misclassification must be allowed. Cortes and Vapniksuggested a modification of the margin called the Soft Margin method by introducing a non-negativeslack variable ξi [4]. The slack variable ξi gives a measure of the degree of error when classifyingthe point i with a hyperplane. The inequalities that define the separating hyperplanes can berewritten as

yi(w · xi + b) ≥ 1− ξi (22)

for i = 1, . . . ,N. The objective function for the optimization problem will now need to includea way to penalize large values for ξi. Using a linear penalty function for the slack variables theoptimization problem is

w,b,ξ

{1

2|w|2 + C

N∑i=1

ξi

}(23)

subject toyi(w · xi + b) ≥ 1− ξi, ξi ≥ 0 (24)

for i = 1, . . . ,N. Here C ≥ 0 is a constant which gives the cost of misclassification during thetraining. The Soft margin gives the dual optimization problem in the form

arg maxαi≥0

N∑i=1

αi −1

2

N∑j,k

αjαkyjykk(xj,xk)

(25)

subject toN∑i=1

αiyy = 0, 0 ≤ αi ≤ C ∀i (26)

The dual form can then be solved using the Sequential minimal optimization (SMO) algorithmwhich iteratively breaks the problem into a series of small sub-problem which are then solvedanalytically. The software package used in this paper, LIBSVM, implements the SMO algorithmto train the Support vector machine [3].

14

5.2.1 Nonlinear kernels

While linear in its original formulation, SVM can be used as a nonlinear classifier by replacing thekernel in the optimization problem with a nonlinear kernel function. This will allow the algorithmto fit a hyperplane which separates the data points in the transformed feature space, which maybe nonlinear in the original input space. An example of a kernel commonly used is the Gaussianradial basis function

k(xj,xk) = exp(−γ|xj − xk|2) (27)

with some constant γ ≥ 0. The corresponding feature space is a Hilbert space of infinite dimensions,mapping each data point by ϕ(xj) where ϕ is defined by k(xj,xk) = ϕ(xj)·ϕ(xk). The SVM does notneed to calculate ϕ(xj) to classify the data points, only the dot product w ·ϕ(x) =

∑i αiyik(xi,x).

Other examples of kernels are the polynomial function k(xj,xk) = (xj · xj + γ)d and thehyperbolic tangent k(xj,xk) = tanh(xj · xj + γ).

5.3 Results for SVM on OMXS30

5.3.1 Results for linear SVM

The SVM classification method is tested in the same way as in previous sections. From thenormalized returns of OMXS30 the sign is taken as yi to be used to train the SVM with thenormalized returns from the 16 sector indices as the data points x. The SVM is trained using datafrom the past N days and the predicted sign of the following day is used as the position whensimulating a portfolio trading the OMXS30. The Sharpe ratio and the p-values of the Sharpe ratiocalculated by the monte carlo method is used to evaluate the portfolios.

Since the soft margin method includes a variable for the penalty function, the cost of misclassi-fication C, a parameter has to be decided in advance. The effectiveness of SVM will depend on theselection of parameter C, however interpretation of the best choice of setting is difficult. For largevalues of C the optimization will choose a hyperplane with smaller margin but few misclassification,and conversely small values of C will cause the optimization to choose larger-margin separatinghyperplanes even though the hyperplane misclassifies more points. Selecting the parameter usingthe in-sample data could be done by a parameter sweep, and testing an exponentially growingsequence of C, for example C ∈

{2−5, 2−3, . . . , 211, 213

}. The result of each parameter value could

then be evaluated using cross-validation by excluding some segments of the in-sample data duringthe training and selecting the value with the highest accuracy on the excluded data points. However,this will be very computationally expensive when testing a model that continually updates on pastdata.

To evaluate the impact of the cost parameter the model will be tested using different valueswith C ∈

{10−2, 10−1, 100, 101, 102

}. The results of the SVM model using a linear kernel are shown

in Tabel 6.

N = 500 N = 1000 N = 2000C = 0.01 0.0368 (0.4665) 0.7176 (0.0389) 1.0293 (0.0405)C = 0.1 0.3001 (0.1914) 0.4251 (0.1428) 1.0158 (0.0439)C = 1 0.7927 (0.0133) 0.5562 (0.0803) 1.2260 (0.0211)C = 10 0.7793 (0.0146) 0.5420 (0.0853) 1.0354 (0.0442)C = 100 0.7858 (0.0148) 0.5765 (0.0737) 1.0626 (0.0413)

Table 6: Sharpe ratio and the associated p-value on OMXS30 using linear SVM.

For each value of C the best performance was obtained with a lookback period of N = 2000days. The Sharpe ratio of the portfolios trading based on a SVM trained on the past 2000 dayswas significant for each value of C tested. The results are in line with the results achieved by theportfolios based on linear regression, however with different methods for predicting the next day’sprice direction.

15

5.3.2 Results with radial kernel

The SVM with a radial kernel is also tested. This requires a second parameter for γ, and thereforeincreases the number of possible variations. The results with γ ∈

{10−2, 10−1, 100, 101, 102

}and

the different values for C are summaries in table 7.

N = 500 N = 1000 N = 2000C = 0.01, γ = 0.01 -0.0066 (0.5051) 0.2312 (0.2416) -0.4875 (0.6969)C = 0.01, γ = 0.1 0.0321 (0.4579) 0.2104 (0.2618) -0.4634 (0.6877)C = 0.01, γ = 1 0.0173 (0.4702) 0.1364 (0.3204) -0.4707 (0.6199)C = 0.01, γ = 10 0.0311 (0.4508) 0.1462 (0.3054) -0.4888 (0.6358)C = 0.01, γ = 100 0.0354 (0.4390) 0.3091 (0.1863) -0.4802 (0.7591)C = 0.1, γ = 0.01 0.0896 (0.3801) 0.3558 (0.1762) 0.6052 (0.1464)C = 0.1, γ = 0.1 0.0421 (0.4401) 0.1359 (0.3551) 0.7505 (0.1277)C = 0.1, γ = 1 0.0173 (0.4688) 0.1364 (0.3216) -0.4707 (0.6230)C = 0.1, γ = 10 0.0311 (0.4472) 0.1462 (0.3016) -0.4888 (0.6341)C = 0.1, γ = 100 0.0354 (0.4391) 0.3091 (0.1865) -0.4802 (0.7537)C = 1, γ = 0.01 -0.2360 (0.7741) 0.7173 (0.0400) 0.6957 (0.1438)C = 1, γ = 0.1 0.2044 (0.2798) 0.4489 (0.1338) 0.4389 (0.2839)C = 1, γ = 1 0.2312 (0.2393) 0.5361 (0.0699) -0.3090 (0.5367)C = 1, γ = 10 -0.0202 (0.5142) 0.1118 (0.3493) -0.8368 (0.7834)C = 1, γ = 100 0.0886 (0.3836) 0.2879 (0.2021) -0.4802 (0.7573)C = 10, γ = 0.01 0.1259 (0.3536) 0.6407 (0.0581) 0.4578 (0.2612)C = 10, γ = 0.1 0.1865 (0.3005) 0.5607 (0.0793) 0.1870 (0.34016)C = 10, γ = 1 -0.2055 (0.7299) -0.2523 (0.7274) -0.1666 (0.4506)C = 10, γ = 10 0.0785 (0.3935) 0.1522 (0.3002) -0.9093 (0.8338)C = 10, γ = 100 0.0479 (0.4297) 0.2735 (0.2135) -0.4771 (0.7589)C = 100, γ = 0.01 0.2033 (0.2855) 0.4149 (0.1504) -0.1155 (0.6888)C = 100, γ = 0.1 -0.0627 (0.5814) 0.6739 (0.0468) 0.5415 (0.1820)C = 100, γ = 1 -0.2283 (0.7556) -0.0156 (0.4782) -0.2906 (0.5159)C = 100, γ = 10 0.0785 (0.3920) 0.1662 (0.2916) -0.9153 (0.8340)C = 100, γ = 100 0.0479 (0.4313) 0.2735 (0.2115) -0.4771 (0.7538)

Table 7: Sharpe ratio and the associated p-value on OMXS30 using SVM with a radial kernel.

The results are generally poor, however positive for N = 1000. The results for N = 2000 aregenerally negative. This could be because the SVM is optimized only to predict the sign of thereturn without regards to the magnitude. This could introduce a bias in the net exposure, herenegative, which given the general positive sentiment of period leads to negative return. Looking atthe mean of all predicted signs supports this for N = 2000, which is generally negative. In contrastto SVM with a linear kernel, the nonlinear SVM produces poor classification out-of-sample. Thisis likely due to overfitting by the more complicated method of classification.

16

6 Discussion and Conclusion

The purpose of this thesis was to test the predictability of stock indices with regression modelsusing intermarket factors. The first models used multiple linear regression to predict the dailyreturn of the OMXS30 index with the returns from 16 different sector indices as covariates. Toimprove performance, the model was also tested with normalized covariates and then refined bymodel selection with AIC. The results were generally positive and to some degree supports thehypothesis that past returns could give an indication of future returns. Some combinations of themodel were able to generate statistically significant risk-adjusted returns when tested on historicaldata.

The multiple regression model was also used to test the predictability of the relative returnbetween a sector index and the main index OMXS30. The results were generally positive and insome cases highly significant, however practical use of the models would need to consider importantissues in the implementation.

A classification method by a support vector machine was also tested to predict the directionof the OMXS30 index. The results for linear classification were just good as those for the linearregression model with AIC, however nonlinear classification with a radial kernel failed to generateconsistent results.

In conclusion there is evidence for some predictability in the stock indices. Future researchtopics could include the investigation of the models with data of higher frequency, such as intra daydata, as well as different transformations of the inputs to improve performance. The use of intradaydata could take into account liquidity considerations for a more realistic trading simulation. Otherimprovements could be the use of a selection method for the input variables in the SVM, and theuse of regression with support vectors to make better predictions for trading.

17

Appendix

Figure 5: Portfolio with n=2. Figure 6: Portfolio with n=2. Figure 7: Portfolio with n=2.




18

References

[1] David Aronson. Evidence-Based Technical Analysis: Applying the Scientific Method andStatistical Inference to Trading Signals. Wiley, 1 edition.

[2] Anna Bergfast. Automated trading using a dip searching strategy. Master’s thesis, RoyalInstitute of Technology, Stockholm, 2009.

[3] Chih-Chung Chang and Chih-Jen Lin. Libsvm: A library for support vector machines. 2001,updated March 4, 2013. URL: http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf,[Online; accessed 2015-04-30].

[4] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Technical report, AT&TLabs-Research, USA.

[5] Harald Lang. Elements of Regression Analysis. Royal Institute of Technology, Stockholm.

[6] Andrew W. Lo and A. Craig MacKinlay. A Non-Random Walk Down Wall Street (5th ed.).Princeton University Press, Princeton, 2002.

[7] On Prediction and Filtering of Stock Index Returns. Royal institute of technology, stockholm.Master’s thesis, Fredrik Hallgren, 2011.

[8] W. N. Venables and B. D. Ripley. Choose a model by aic in a stepwise algorithm. URL http:

//stat.ethz.ch/R-manual/R-devel/library/MASS/html/stepAIC.html. [Online; accessed2015-04-15].

19

http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf

http://stat.ethz.ch/R-manual/R-devel/library/MASS/html/stepAIC.html

http://stat.ethz.ch/R-manual/R-devel/library/MASS/html/stepAIC.html

Modeling and Forecasting Stock Index Returns using …813380/FULLTEXT01.pdf · This section will...

Documents

Transcript of Modeling and Forecasting Stock Index Returns using …813380/FULLTEXT01.pdf · This section will...