Factors Affecting the Number of Trades in ETPs on Nordic ...

INOM EXAMENSARBETE TEKNIK,GRUNDNIVÅ, 15 HP

, STOCKHOLM SVERIGE 2020

Factors Affecting the Number of Trades in ETPs on Nordic Derivatives Exchange

SIMON CARLSSON

ERIK ALLGÅRDH

KTHSKOLAN FÖR TEKNIKVETENSKAP

Factors Affecting the Number of Trades in ETPs on Nordic Derivatives Exchange Erik Allgårdh Simon Carlsson ROYAL

Degree Projects in Applied Mathematics and Industrial Economics (15 hp) Degree Programme in Industrial Engineering and Management (300 hp) KTH Royal Institute of Technology year 2020 Supervisor at KTH: Mykola Shykula Examiner at KTH: Sigrid Källblad Nordin

TRITA-SCI-GRU 2020:120 MAT-K 2020:021

Royal Institute of Technology School of Engineering Sciences KTH SCI SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

Abstract

This thesis examines which factors that affect the number of trades inexchange-traded products (ETPs) on Nordic Derivatives Exchange. Mul-tiple linear regression is used to model the relationship between the num-ber of trades and 65 initially chosen predictor variables. The predictorvariables include various indices, commodities, stocks, and volatility mea-sures.

Two models are presented, one of which includes a lagged dependentvariable. These models explain 89% and 92% of the variance within thedata. Foremost, the results confirm previous research advocating thevolatility to play a significant role on the number of trades, but now alsoshown for ETPs. Currency exchange rates, equity indices and palladiumare also shown to be statistically significant. In addition, interpretationsof the results are given and suggestions for further research.

Keywords: regression analysis, trading volume, number of trades, ap-plied mathematics, exchange-traded products, bachelor thesis, NDX

I

Faktorer som paverkar antalet avslut i ETP:er pa NordicDerivatives Exchange

Sammanfattning

Den har uppsatsen undersoker vilka faktorer som paverkar antalet avslut iborshandlade produkter (ETP:er) pa Nordic Derivatives Exchange. Mul-tipel linjar regression anvands for att undersoka relationen mellan antaletavslut och 65 pa forhand valda regressionsvariabler som vi anser intres-santa att undersoka. Dessa regressionsvariabler bestar av bland annatolika index, ravaror, aktier samt volatilitetsmatt.

Tva modeller presenteras, varav en inkluderar en laggad beroende vari-abel. Dessa tva modeller forklarar 89% respektive 92% av variationen idatan. Resultatet visar att volatiliteten har en signifikant paverkan medavseende pa antal avslut vilket bekraftar tidigare forskning, men visas nugalla aven for ETPer. Valutakurser, aktieindex och palladium visas varasignifikanta. Vidare ges tolkning av resultatet och forslag pa framtidaforskning.

Nyckelord: regressionsanalys, handelsvolym, antal avslut, tillampad matem-atik, borshandlade produkter, kandidatexamensarbete, NDX

II

Acknowledgements

We would like to thank our supervisor in applied mathematics, MykolaShykula, for giving us feedback and guidance. In addition, we would liketo express our gratitude to Oscar Britse and Markus Ramstrom at NGMfor providing us with data.

III

Contents

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Previous Research . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Purpose and Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Financial Background 42.1 Exchange-Traded Products . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Speculation . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.2 Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Mechanics of Derivatives Trading . . . . . . . . . . . . . . . . . . 62.2.1 Derivatives Exchanges . . . . . . . . . . . . . . . . . . . . 62.2.2 Brokerage Firms . . . . . . . . . . . . . . . . . . . . . . . 62.2.3 Issuer of Derivatives . . . . . . . . . . . . . . . . . . . . . 72.2.4 Market Maker . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4 Number of Trades . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Mathematical Theory 93.1 The Multiple Linear Regression Model . . . . . . . . . . . . . . . 93.2 Ordinary Least Squares . . . . . . . . . . . . . . . . . . . . . . . 9

3.2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 93.2.2 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2.3 Properties of the OLS coefficients . . . . . . . . . . . . . . 10

3.3 Residual Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3.1 Graphical Residual Analysis . . . . . . . . . . . . . . . . . 11

3.4 Autocorrelation of Errors . . . . . . . . . . . . . . . . . . . . . . 123.4.1 Lag Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.4.2 Durbin-Watson Test . . . . . . . . . . . . . . . . . . . . . 133.4.3 Lagged Dependent Variable Models . . . . . . . . . . . . 13

3.5 Influential Observations . . . . . . . . . . . . . . . . . . . . . . . 143.5.1 Deletion Diagnostics . . . . . . . . . . . . . . . . . . . . . 14

3.6 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.6.1 Box-Cox Method . . . . . . . . . . . . . . . . . . . . . . . 15

3.7 Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.7.1 Variance Inflation Factor . . . . . . . . . . . . . . . . . . 15

3.8 Variable Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.8.1 Backward Elimination . . . . . . . . . . . . . . . . . . . . 163.8.2 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . 16

3.9 Model adequacy . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Methodology 184.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.3 Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.4 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.5 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.5.1 Initial Variable Treatments . . . . . . . . . . . . . . . . . 18

IV

4.5.2 Number of Trades . . . . . . . . . . . . . . . . . . . . . . 204.5.3 Equity Indices . . . . . . . . . . . . . . . . . . . . . . . . 204.5.4 Stocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.5.5 Commodities . . . . . . . . . . . . . . . . . . . . . . . . . 214.5.6 Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.5.7 Currencies . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 Results 255.1 Initial Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.1.1 Altered Model . . . . . . . . . . . . . . . . . . . . . . . . 255.2 Residual Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.3 Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.3.1 An Additional Model . . . . . . . . . . . . . . . . . . . . 275.4 Leverage and Influential Points . . . . . . . . . . . . . . . . . . . 275.5 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.6 Variable Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.7 Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.8 Final Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.8.1 Model A . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.8.2 Model B . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6 Discussion 356.1 Model Adequacy . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.2 Financial Interpretations . . . . . . . . . . . . . . . . . . . . . . . 35

6.2.1 Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.2.2 Palladium . . . . . . . . . . . . . . . . . . . . . . . . . . . 366.2.3 Currencies . . . . . . . . . . . . . . . . . . . . . . . . . . . 366.2.4 Equity Indices . . . . . . . . . . . . . . . . . . . . . . . . 366.2.5 Nokia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.3 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7 Conclusion 39

A Appendix 40A.1 An Introduction to Financial Derivatives . . . . . . . . . . . . . . 40

A.1.1 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40A.1.2 Futures Contracts . . . . . . . . . . . . . . . . . . . . . . 41

A.2 Omitted Trading Days . . . . . . . . . . . . . . . . . . . . . . . . 42

References 43

V

1 Introduction

1.1 Background

The oldest recorded evidence of financial derivatives is found in the Code ofHammurabi, the code-of-law of Ancient Babylon [1, ch. 1]. It dates back toabout 1754 BC and consists of 282 laws, one of which deals with farmers andtheir mortgages:

”48. If any one owe a debt for a loan, and a storm prostrates thegrain, or the harvest fail, or the grain does not grow for lack of water;in that year he need not give his creditor any grain, he washes hisdebt-tablet in water and pays no rent for the year” [2].

It states that in event of crop failure, an indebted farmer does not need to payinterest to the mortgagor. This decree functioned as an asset-or-nothing putoption for the farmers [1, ch. 1].

A mere millennium and a half later, Aristotle tells the story about thephilosopher Thales, who predicted an unusually plentiful olive harvest the com-ing fall. He purchased the right, but not the obligation, to hire all the olivepresses in the region when fall came. Indeed, fall came and the olive harvestwas abundant, resulting in a soaring demand for olive presses. Thales thenleased the presses at a substantial premium and made a fortune [1, ch. 1].

Fast forward another two millennia to the year of 1848 when the world’sfirst futures and options exchange was founded, the Chicago Board of Trade [1,ch. 1]. This was the start of organized and centralized derivatives trading.

The move beyond the agricultural origins of derivatives markets was slow.The first trading in non-agricultural commodities on exchanges began in 1933with futures contracts on silver. Years elapsed and the pace of innovation inderivatives gained traction. The internet, electronic trading and fiber opticshave laid a proper foundation for today’s high-speed, high-volume and complexderivatives markets.

Financial derivatives have been exchange-traded in Sweden since 1985, whenOptionsmarknaden began trading options. In the beginning, only call optionswith a mere six stocks as underlying assets were available for trading [3]. In 2003,Nordic Derivatives Exchange (NDX) was founded by Nordic Growth Market AB(NGM) [4]. NDX is a Swedish regulated market for exchange-traded products(ETPs), structured products, bonds, and exchange-traded funds. As of year2020, around 16,000 ETPs are listed for trading on NDX [5], and the countlesstypes of derivative contracts are overwhelming.

In this era of proliferation of financial derivatives, it is interesting to under-stand which factors that drive derivatives trading. Therefore, this thesis aimsto investigate the relationship between the number of traded ETPs and a set of65 chosen variables using multiple linear regression.

1.2 Previous Research

Previous research has shown that factors such as earnings reporting, bid-askspread, and calendar effects affect the trading volume in single stocks [6]. Onestudy showed that the bid-ask spread has a negative relationship with trading

1

volumes in futures markets [7]. Another study found that futures trading vol-umes are affected by the underlying market characteristics, such as the pricevolatility [8].

In 1994, Jones et al. showed that there exists a strong relation between thenumber of trades and volatility in stock markets [9], further strengthened byother studies [10]. Studies have shown that there exist positive interrelationsbetween exchange rate volatility and currency option trading volumes [11, 12].Another study suggests that the relation between trading volume and volatilityis steeper for positive returns than for non-positive returns [13]. McInish andWood [14] found that:

”return activity in a period is associated with the level of tradingfrequency in a subsequent period and also with the number of sharesin a subsequent period. This is consistent with small traders reactingto returns while professional traders largely ignore previous returnsin their trading.”

The findings of McInish and Wood are of great interest for this thesis, sincethe majority of the trades on NDX are executed by small traders1 and they -according to McInish and Wood - trade as a reaction to changes in returns.

An important contributor on the topic of trading volumes is behavioraleconomist Hersh Shefrin, who has included behavioral and psychological as-pects as key factors affecting trading volumes. Investor overconfidence andheterogeneity in beliefs and expectations are some factors that affect tradingvolumes [15, p. 136-137,499-500]. However, as such factors are not easily quan-tifiable, they were completely omitted from this analysis.

1.3 Purpose and Aim

This thesis aims to examine and determine which factors affect the number oftrades in exchange-traded products on Nordic Derivatives Exchange. This is ofinterest for three main reasons.

Firstly, it lies in every corporation’s best interest to readily understand itsrevenue streams. Many corporations operate on derivatives markets, and asolid understanding of which factors affect the number of trades is vital forunderstanding their own businesses and to develop suitable strategies to increaseprofitability. For such companies, the insight into which factors that drive thetrading is valuable.

Secondly, investors might try to capitalize and use this insight as an invest-ment tool. There exist studies which suggest that increased trading volumes(relative to trend) are associated with negative skewness in stock returns overthe subsequent six months [16]. Thus, understanding what factors that consti-tutes the fluctuation of trading volumes could yield a better basis for analysisof price movements and resulting in more adequate decision making.

Thirdly, there is a clear lack of research on trading volumes / number oftrades in ETPs. Previous research has mostly revolved around trading volumesin stocks and futures. That the same findings apply to ETPs is not certain.This thesis aims to make a contribution to fill this void of knowledge.

1Established in section 2.4.

2

1.4 Research Question

The research question is formulated as:

Which factors affect the number of trades in exchange-traded prod-ucts on Nordic Derivatives Exchange?

3

2 Financial Background

This section covers the essential financial theory for this thesis. Firstly, thereader is acquainted with exchange-traded products. Secondly, the mechanicsof derivatives trading are discussed, as well as the essential parties which enablederivatives trading. Thirdly, volatility is discussed, which previous studies havesuggested greatly affects the number of trades. Lastly, this section ends witha brief discussion about the term number of trades. Before proceeding, werecommend the reader unacquainted with financial instruments, options, andfutures contracts to read section A.1 in the appendix.

2.1 Exchange-Traded Products

Exchange-traded products, ETPs, are a group of financial derivatives which aretraded on exchanges [17]. Different types of ETPs can differ greatly in the un-derlying mechanics. This section aims to give an introduction to ETPs, as wellas to explain the characteristics and properties of the various types of ETPswhich are listed on NDX. Mutual characteristics of ETPs include that they arecash-settled, carry a management fee, and are associated with a limited invest-ment risk (that is, an investor cannot lose more than the initial investment).The last-named property is extremely attractive from an investor-perspective.ETPs can be benchmarked to stocks, commodities, indices, currencies, or in-terest rates. Most ETPs have some leverage to enhance price movements inthe underlying asset. The various types of ETPs on NDX are tracker certifi-cates, constant leverage certificates, mini futures, plain vanilla warrants, turbowarrants, and unlimited turbos.

Tracker Certificates. A tracker certificate follows the underlying asset’sprice movements 1:1 and is thus not leveraged. All tracker certificates on NDXcarry a long position. Tracker certificates offer a simple and cost-effective meansto invest in exotic and otherwise inaccessible underlying assets.2

Constant Leverage Certificates. Constant leverage certificates (commonlyreferred to as bull and bear certificates) are financial derivatives with a fixeddaily leverage. Bull certificates carry a long position, and bear certificates carrya short position. The leverage and the certificate’s reference price are settledand recalculated on a daily basis to remain the constant leverage. The leverageis created by the issuer of the certificate, who borrows capital and buys (in caseof a bull certificate) the underlying asset in the market.

Mini Futures. Mini futures are essentially leveraged, cash-settled futurescontracts with no predetermined expiration date. However, mini futures haveone particularly attractive property which regular futures contracts lack. Lever-aged futures contracts are extremely risky since it is possible to lose more thanone’s initial investment. Mini futures mitigate this risk through the use of a

2For instance, on NDX one can invest in exotic indices such as the Vontobel Belt and RoadIndex, which tracks the price movements of companies that stand to profit from the realizationof the Belt and Road Initiative, and the Solactive 5G Technology Performance Index, whichtracks the performance companies with significant business engagement in the areas of 5Gtechnology.

4

barrier, also stop loss level, similar to barrier options. In case of a mini futurelong, the barrier is set equal to or greater than the leveraged component ofthe mini future.3 However, contrary to a barrier option, the barrier of a minifuture is not fixed during the course of a mini future. Instead, the barrier isdecided on a regular basis by the issuer. Further, similarly to how a leveragedfutures contract works, the investor has to pay interest on the leveraged com-ponent. For a mini future long, the leveraged component therefore increaseswith time to replicate interest paid by the investor. For a mini future short, thenon-leveraged component increases with time to replicate interest paid to theinvestor [18, 19, 20]. If the stop loss level and the leveraged component differ,a salvage value is paid out to the investor after knock-out.

Plain Vanilla Warrants. Plain vanilla warrants greatly resemble Europeanoptions [21], and thus carry leverage naturally. To illustrate this, consider along plain vanilla warrant (i.e. essentially a European call option), which isat-the-money4 with a strike price of 100 EUR. Suppose that given the maturity,volatility, and risk-free interest rate,5 the price of the warrant is 1 EUR. Further,suppose that the price of the underlying asset increases to 102 EUR at maturity,and the option is thus worth 2 EUR. A 2% increase in the underlying asset hasthus doubled the value of the warrant.

Turbo Warrants. A turbo warrant is a type of barrier option, and thus differsfrom a plain vanilla warrant. A call turbo warrant is similar to a down-and-outcall option, and a put turbo warrant is similar to an up-and-out put option.However, when the barrier (stop loss level) is hit, the payout is not necessar-ily zero. This depends on whether the strike price K and the barrier L areequal or not. If they differ, a reference price is determined and a salvage valueis calculated and paid out to the holder of the turbo warrant [22]. This dif-fers from traditional down-and-out call and up-and-out put options in financialmathematics [23, p. 267-275].

Unlimited Turbos. Unlimited turbos are essentially mini futures, with onespecial property. The barrier is always set equal to the leveraged component.Thus, once the unlimited turbo is knocked out, the salvage value is always zero.

2.1.1 Speculation

In finance, speculation aims to make a profit from the price dynamics of someunderlying security. Financial instruments allow investors a large speculativeposition with respect to relatively small initial depositions [24, p. 15]. In addi-tion, investors can also take a market position without investing direct capitalinto the components by adopting a leveraged position. Leverage allow investorsa greater market exposure than their capital grants elsewhere. Speculators playan important role to the market. Firstly, they are prepared to carry more risk

3In case of a mini future short the barrier is set equal to or less than the leveraged com-ponent.

4At-the-money refers to a situation where the strike price of an option equals the spot priceof the underlying asset.

5Maturity, volatility, and the risk-free interest rate are key components when pricing aderivative in financial mathematics.

5

relative to the average investor meaning they are willing to invest in yet un-proven markets or times when the risk averse trembles [25]. Thus, withoutspeculators, only the large and well-established companies would be the onesprocuring loans. Secondly, they tend to be more active traders and hence pro-vides market liquidity yielding a smaller bid-ask spread. There are two forms ofspeculation. A bullish speculator believes the price of the security to increasewhile a bearish speculator seeks to profit from a price decrease of the security.

2.1.2 Hedging

While speculating often increases the risk, hedging intends to decrease the vari-ance associated with the portfolio. The ultimate object is to reduce the riskof adverse price movements with respect to some underlying asset. There arevarious types of hedges. Short hedges are used by investors for protection ofpotential price declines on specific assets in the future. They are primarily rec-ommended for investors already owning an asset and planning to sell in thefuture [26, p. 66]. Long hedges are used for potential price increases. Thus,these hedges are mainly being used by manufactures that know it will haveto buy a specific asset in the future, e.g. oil, but seek to determine the pricenow [26, p. 66]. Another form of hedge is through diversification. Contrary toarguments in line with Modigliani and Miller (who claim that hedging for corpo-rations should be irrelevant to shareholders because they can do it themselvesby adopting a more well-diversified portfolio), studies have shown a positivecorrelation with use of foreign currency derivatives and stock prices [27].

2.2 Mechanics of Derivatives Trading

This part discusses the crucial parties which all enable derivatives trading.These are derivatives exchanges, brokerage firms, issuers, and market makers.

2.2.1 Derivatives Exchanges

Derivative exchanges exist to mitigate the transfer of financial risk and provideinvestors the opportunity for price discovery [28]. It functions as the connectionbetween cash markets, hedgers and speculators [28, p. 3]. Most of the jobof the exchange involves monitoring various flows, such as orders and trades.Exchanges primarily make their money by charging small commissions for eachsettled trade.

2.2.2 Brokerage Firms

Private investors cannot trade directly on derivatives exchanges, such as NDXor Chicago Mercantile Exchange. Instead, private investors must trade througha brokerage firm which is a market member of the exchange. NGM has some 20odd market members [29]. These market members, or brokerage firms, act asmiddlemen and trade on the behalf of its customers [30]. Avanza and Nordnetare two large Swedish online brokerage firms with a strong presence on NDX.Market members generally charge their customers a commission fee for eachtrade executed. These commission fees can either be fixed or value-based.

6

2.2.3 Issuer of Derivatives

The issuer is responsible for creating the derivatives and list them on the ex-changes. Thus, they create the leverage associated with the investment and alsothe ones responsible to follow through with the financial transactions [31].

2.2.4 Market Maker

Market makers ensure the liquidity and efficiency within the market throughbuying and selling substantial amounts of the asset [32]. Both individual in-vestors and financial institutions can act as market makers [33], however onNDX it is the issuers of ETPs that act as market makers. They profit from thebid-ask spread.

2.3 Volatility

The volatility of a financial security is defined as the standard deviation of thereturn in a year when the return is expressed using continuous compounding[26, p. 319]. The volatility carries great importance in financial mathematicswhen pricing financial derivatives as it is one of five parameters in the celebratedBlack-Scholes model [23, p. 108]. More relevant to this thesis is the fact that, asstated above, there exists a magnitude of research which suggests that volatilityhas a great impact on trading volumes and number of trades in futures markets,stock markets, and currency option markets [14, 34, 35, 36]. One theory whichexplains the relation between volatility and trading volumes is the mixture ofdistributions hypothesis. By this hypothesis, both trading volumes and volatilityare derived from the (unobservable) rate of information flow to the market [36].Another hypothesis which explains the relationship between trading volumeand volatility is the sequential arrival information hypothesis. It assumes thatinformation spreads sequentially among investors and traders [8]. Further, itassumes that ”trading takes place after each trader receives information, butan uninformed trader will be unable to perfectly learn by observing the tradingactivities of informed traders” [8]. Thus, the hypothesis conjectures that thereexists both a lagged, as well as a contemporaneous, relationship between tradingvolume and return volatility [8].

One problematic aspect related to volatility is that the volatility is not di-rectly observable in the market. Two common approaches to deduce the volatil-ity of an asset are to compute the historical volatility or the implied volatility.The historical volatility is computed using elementary statistical theory and his-torical security prices. However, as the name entails this will only approximatethe historical volatility. Volatility changes over time, why historical data doesnot reveal anything about the present or future volatility. An alternative ap-proach is to estimate the market’s expectation of the volatility, which is beingpriced into option prices. Using the market price of an option written on thevery same security one wants to estimate the volatility of, one can invert theBlack-Scholes formula to compute the implied volatility [23, p. 108-110]. Thisis approximately how volatility indices such as the VIX, VVIX and VSTOXXare computed, although the exact formulas differ somewhat [37].

7

2.4 Number of Trades

The term number of trades, also transactions, simply refers to the number ofdeals or transactions in a day. A closely related term is trading volume, whichrefers to the number of contracts traded in a day [26, p. 52]. The trading volumeis equal to the number of trades multiplied by the average trade size.

During the period of 2019 the average turnover per trade in ETPs on NDXwas 3,435 EUR [38], suggesting that the majority of trades were executed bysmall traders. For such small sized transactions, NGM’s commission mainlycomes from fixed transaction fees [39]. Therefore, the number of trades has agreater impact on NGM’s revenue streams than trading volume, and is thus ofgreater interest to investigate. For other actors, such as market makers whichprofit from the bid-ask spread, the trading volume is of greater interest sincethis factor is what drives the revenues.

8

3 Mathematical Theory

3.1 The Multiple Linear Regression Model

Regression analysis is a statistical technique used to analyze the relationshipbetween a dependent variable and a set of predictor variables. The relationshipis modeled and analyzed by fitting the dependent variable as a function ofthe predictor variables. The model then allows the practitioner to estimateconditional expectations of the response variable. The models can later be usedfor forecasting, but also for understanding relationships between the responseand its predictors.

This thesis aims to model the relationship between a response variable andpredictor variables using multiple linear regression. More specifically, the rela-tionship is modeled according to the following equation

yi = β0 + β1xi,1 + β2xi,2 + β3xi,3 + ...+ βkxi,k + εi (1)

where yi denotes the ith observed value of the dependent variable, xi,j denotesthe ith observation of the jth dependent variable, βj denotes the linear coeffi-cients ∀ j = 0, 1, 2, ..., k, and εi the jth error term. k denotes the number ofdependent variables. Let n denote the number of observations and p = k + 1.Then, equation 1 can be written in matrix form according to

y = Xβ + ε,

where y is the n × 1 vector of the observed values, X is the n × p matrix ofthe observations of the independent variables, β constitutes the p× 1 vector ofregression coefficients, and the n× 1 vector ε constitutes the random errors. Inparticular, y, X, β, and ε are given by

y =

y1y2...yn

, X =

1 x11 x12 . . . x1k1 x21 x22 . . . x2k...

......

...1 xn1 xn2 . . . xnk

,β =

β0β1...βk

, ε =

ε1ε2...εn

.

3.2 Ordinary Least Squares

Ordinary least squares (OLS) is the most common estimation of the unknownregression parameters in linear regression models. The estimator β is chosento minimize the square distance of the residuals, which yields the fitted modely = Xβ. Thus, y is the predicted value by the model.

3.2.1 Assumptions

The random errors, εi, are assumed to have mean zero and independent of con-temporaneous, past, and future errors. This assumption is referred to as strictexogeneity. The errors are also assumed to have constant variance, σ2, knownas homoscedasticity. Further, another vital assumption is that the errors areuncorrelated with the observations. In addition, the random error terms arealso assumed to be normally distributed. While strict exogeneity, homoscedas-ticity and uncorrelated errors are vital for derivation and properties of the linear

9

coefficients βj , normality is not. However, the normality assumption allows forhypothesis testing, confidence intervals and t-tests, which will be used through-out this thesis.

3.2.2 Derivation

The OLS estimator β is derived by minimizing the sum of squares

S(β) =

n∑i=1

ε2i = ε′ε = (y −Xβ)′(y −Xβ).

S(β) can be expressed as

S(β) = y′y − β′X′y − y′Xβ + β′X′Xβ = y′y − 2β′X′y + β′X′Xβ.

Since S(β) is convex in β, the first and second optimality conditions impliesthat the minimum is obtained by differentiating S(β) with respect to β andsetting it to zero:

∂S(β)

∂β

∣∣∣β

= −2X′y + 2X′Xβ = 0. (2)

Equation 2 can be rewritten as the least-squares normal equations

X′Xβ = X′y.

Thus, the OLS estimator β is given by

β = (X′X)−1

X′y,

assuming the predictor variables are linearly independent. Hence, the fittedmodel is given by

y = Xβ. (3)

Equation 3 can be written as

y = Xβ = X(X′X)−1

X′y = Hy,

where H is known as the hat matrix.

3.2.3 Properties of the OLS coefficients

Provided that the model is correct, β is an unbiased estimator of β, shownbelow:

E[β]

= E[(X′X)−1X′y

]= E

[(X′X)−1X′(Xβ + ε)

]= E

[(X′X)−1X′Xβ + (X′X)−1X′ε

]= β,

since E(ε) = 0 by assumption. The variance of β is derived through

Var(β) = Var[(X′X)−1X′y

]= (X′X)−1X′Var(y)

[(X′X)−1X′

]′= σ2(X′X)−1X′X(X′X)−1 = σ2(X′X)−1.

The Gauss-Markov theorem establishes that the least squares estimator of β isthe best linear unbiased estimator given that the errors have mean zero, constantvariance and are uncorrelated [40, p. 80].

10

3.3 Residual Analysis

Residual analysis is useful and efficient for detecting inadequacies in the modelor the data. As discussed in section 3.2.1 above, there are a few crucial un-derlying assumptions about the error ε, summarized as ε ∼ N(0, σ2I). Theseassumptions can be verified using residuals.

The observed residuals e for n observations are defined as

e = y − y = (I−H)y,

where I is the n× n identity matrix and H is the hat matrix, defined as previ-ously. The variance of the residuals is

Var(e) = Var(

(I−H)y)

= σ2(I−H),

due to the idempotency of I−H and H [40, p. 131]. Further, since the diagonalelements of H are generally not identical, the residuals e do not have constantvariance. Therefore, it is clear that in order to compare the residuals of dif-ferent observations in any meaningful manner, one needs to scale the observedresiduals. One popular scaling method is studentized residuals, which has exactunit variance. The studentized residual is defined as

ri =ei√

MSRes(1− hii),

where ei is the ith residual, MSRes is the residual mean square, and hii is theith diagonal element of the hat matrix. Observations with studentized residualsgreater than 3 are generally considered potential outliers [40, p. 131].

3.3.1 Graphical Residual Analysis

Graphical techniques are excellent for identifying abnormal values of residuals.Given that the assumptions in the previous part are correct, as well as that themodel in general is correct, certain characteristics of the residuals are expected.To test these expected behaviors, one can plot the residuals in different ways.Any deviations from the majority of the residuals suggest that certain modelinadequacies are present. There are different types of residual plots.

One such residual plot is the Tukey-Anscombe plot which is a plot ofthe residuals versus the fitted values. It provides an efficient way of verifyingor rejecting several of the assumptions. The plot in figure 1a) is the expectedpattern if the underlying assumptions are satisfied. Figure 1b) illustrates theproblem of dispersion, a case of heteroscedasticity. Lastly, figure 1c) shows apattern which often is attributed to the lack of some important independentvariable [41, p. 346-348].

Another common residual plot is the Q-Q plot (short for quantile - quan-tile plot), also known as normal probability plot, which is used to detect non-normality. It is a plot of the ordered residuals against the normal order statistics.If the residuals are normally distributed the residuals should form a straight lineaccording to 1d). Any deviations from this suggests issues with normality.

11

(a) Expected pattern (b) Dispersion

(c) Asymmetry (d) Q-Q

Figure 1: a)-c) show Tukey-Anscombe plots, d) shows a Q-Q plot

3.4 Autocorrelation of Errors

One of the fundamental assumptions about the true error ε is that the errors areuncorrelated. Any violation of this assumption may seriously inflict harm onthe model. Autocorrelation refers to the situation where errors are correlatedwith each other. The presence of autocorrelation implies that OLS estimatesare no longer the minimum variance estimates (however, they are still unbiased)and may cause seriously underestimated error variances σ2. This implies thatconfidence intervals, prediction intervals and hypothesis tests are more impreciseprocedures [40, p. 475].

Often, time series data exhibits some autocorrelation. One approach tohandle the issue of autocorrelation is to add a lagged dependent variable [40,p. 494-495].

3.4.1 Lag Plot

A lag plot can be used to detect autocorrelation. It plots the residuals et andet−1 against each other. If a linear shape appears, autocorrelation is present. Ifthe linear shape has a positive slope, the autocorrelation is positive, and if thelinear shape has a negative slope, the autocorrelation is negative. If no patterncan be identified, it is plausible that there is no autocorrelation. See figure 2[42].

12

(a) Positive autocorrelation (b) No autocorrelation

Figure 2: Lag plots.

3.4.2 Durbin-Watson Test

The Durbin-Watson test is commonly used to detect autocorrelation. TheDurbin-Watson test statistic d is defined as:

d =

∑Tt=2

(et − et−1

)2∑Tt=1 e

2t

≈ 2(1− ρ),

where ρ is the simple correlation between et and et−1 [41, p. 355]. The valueof d can lie between 0 and 4. A value less than 2 indicates positive autocorre-lation, and a value greater than 2 indicates negative autocorrelation. If d = 2,then there is no autocorrelation. In the Durbin-Watson test one tests the nullhypothesis

H0 : ρ = 0,

against the alternative hypothesis

H1 : ρ 6= 0.

The test can be one-sided or two-sided. For both cases, there exist lengthy tableswith numerical values as for when to reject or not reject the null hypothesis, see[43] for such tables. However, a rule of thumb is that values between 1.5 and2.5 are normal, and in general not cause for alarm [44].

3.4.3 Lagged Dependent Variable Models

Introducing a lagged dependent variable (LDV) in an OLS regression is a pop-ular method to mitigate autocorrelation issues:

yt = φyt−1 + xtβ + εt.

Including a LDV does introduce some bias in the model. This bias can rangefrom tiny to severe. However, in many cases a LDV is called for (e.g. due toautocorrelated errors), and excluding a LDV from the model can incur dramaticbias [45].

Some researchers have argued that the Durbin-Watson test is biased toward2 when LDVs are included in the OLS estimates. However, other studies haveshown that this is is not the case, and that the Durbin-Watson tests are com-pletely legitimate even in this case [46].

13

3.5 Influential Observations

Influential observations are points with an undesirably large effect on the modelfit. These points are usually characterized by a large residual and (or) highleverage. High leverage observations are distant from the centroid of the data inX-space leading to an inordinate impact of the estimated regression coefficients[40, p. 212]. Observations with large residuals tend to differ substantially fromthe rest of the data and pull the slope more towards it. Indeed, the combinationof large residuals and high leverage increases the importance of proper diagnostictreatments.

3.5.1 Deletion Diagnostics

Below, three diagnostics, which all measure the effect of ith observation, arediscussed. Thus they are referred to as deletion diagnostics.

Cook’s D. The influence measure Cook’s D is defined as

Di =(βi − β)′(X′X)(βi − β)

pMSRes,

where MSRes = SSRes/(n − p) and βi is the OLS estimate of β when the ithobservation is deleted. Cook’s D measures the shift in β when a certain obser-vation i is deleted. An observation i may be influential if Di > F(0.5,p,n−p) [41,p. 367].

DFFITS. The influence measure DFFITS is defined as

DFFITSi =yi − yi(i)√S2(i)hii

,

where yi = X(i)βi, yi(i) is the estimated mean of the ith observation and S2(i) the

estimated mean of the error without the ith observation. It measures the shiftin yi when the ith observation is deleted. An observation i may be influential if| DFFITSi |> 2

√p/n [41, p. 367].

COVRATIO. The influence measure COVRATIO is defined as

COV RATIOi =

(S2(i)

)p(MSRes

)p(

1

1− hii

).

It measures the impact of the ith observation on the precision of the estimatesof the regression coefficients [41, p. 365]. An observation i may be influential if| COV RATIOi − 1 |> 3p/n [41, p. 367].

3.6 Transformations

Transformations are useful when any of the fundamental assumptions aboutnormality, homoscedasticity and/or linearity are not satisfied. There are twoapproaches to transforming a linear model fit: transforming the dependent vari-able y or transforming the regressor variables xi [40, p. 182].

14

3.6.1 Box-Cox Method

The Box-Cox method is an objective technique to help specify the most ap-propriate transformation on the dependent variable. The method combines theobjectives to induce homogeneous variance, simple relationship and improvingnormality in a linear model.

The method uses a family of power transformations y(λ), which are definedas:

y(λ) =

{yλ−1λ , λ 6= 0

ln(y), λ = 0.

The appropriate λ is selected by some objective criterion, e.g. maximium likeli-hood, the Shapiro-Wilk test, or the probability plot correlation coefficient. Theinventors of the method, Box and Cox, proposed that λ is chosen as the max-imum likelihood estimator [47]. A popular alternative criterion is to maximizethe probability plot correlation coefficient (PPCC), the default selection cri-terion for the function boxcox in the R package EnvStats. Some research hasshown that the PPCC is superior to other tests, such as the maximum likelihood[48]. The technical details of PPCC are omitted from this thesis.

3.7 Multicollinearity

The columns of X are linearly dependent if and only if

p∑j=1

tjXj = 0, (4)

provided the existence of a set of constants t1, t2, ..., tp not all zero [40, p. 286].In particular, multicollinearity is considered near-linear dependency among theregressor variables [40, p. 117]. In conjunction with equation 4, this wouldbe almost true for a subset of constants ti. Almost all data sets suffer frommulticollinearity, the question is rather to what degree [40, p. 286]. Severe mul-ticollinearity increases the variance in the estimated linear coefficients, β, andcan sometime lead to estimates of β that are too large in magnitude. Therefore,the severity requires scrutiny.

3.7.1 Variance Inflation Factor

The variance inflation factor, VIF, is defined as:

V IFj =1

1−R2j

for the jth regressor coefficient [41, p. 372]. Indeed, V IFj depends on R2j which

measures the regression of xj onto the other k−1 regressor variables. R2j values

approaching 1 indicate near singularity, implying V IFj will be large. VIF values> 10 indicate severe multicollinearity [41, p. 377].

3.8 Variable Selection

There are two predominate methods for variable selection in linear regression,namely best subsets regression and stepwise regression methods.

15

Best subsets regression compares all models which can be constructed andthen chooses the best model based on some selection criteria. If there are kregressor variables, a total of 2k models can be constructed. As k increases, thisnumber quickly becomes huge. For instance, to perform best subsets regressionwith 65 regressor variables requires constructing and comparing 3.7·1019 models.Best subsets regression becomes infeasible in practice for k > 40 [49, p. 58]. Dueto the very large number of variables included in this analysis, any best subsetsregression is omitted.

3.8.1 Backward Elimination

Stepwise regression methods require significantly less computing than best sub-sets regression. They work by sequentially adding or deleting regressor variablesone at a time. Stepwise regression methods can be divided into three broad cat-egories: forward selection, backward elimination, and stepwise regression.

Backward elimination starts with a full model with k regressors and theneliminates one regressor variable at each step. At every step, some test statisticis computed for each variable, e.g. p-value, and the regressor with the worststatistic is omitted from the model. This procedure is repeated until either allregressor variables satisfy some preselected cutoff value or a desired numberof regressor variables remain in the model [41, p. 213]. Backward eliminationworks very well as a variable selection method and is heavily favored in theresearch community [40, p. 347].

3.8.2 Hypothesis Testing

In order to determine the significance of the linear relationship in the model, aglobal F-test can be used [50]. The corresponding null hypothesis

H0 : β1 = β2 = ... = βk = 0

is formulated, which is equivalent to testing whether the variation in the re-sponse variable is due to any regressor variable. The alternative hypothesis,

H1 : βj 6= 0 for at least one j

equivalently in words, at least one of the regressor variables is significant to themodel. The test statistic is derived through

SST = SSR + SSRes (5)

where SST denotes the total sum of squares, SSRes =∑ni=1(yi − yi)

2, andSSR =

∑ni=1(y− y)2. Hence, equation 5 can be interpreted as how much of the

sum of squares is explained by regression, SSR, and residual sum of squares,SSRes [40, p. 84]. Assuming H0 is true, it can be shown that this implicates

SSR

σ2∼ χ2

k,

SSRes

σ2∼ χ2

n−k−1,

16

and that SSRes and SSR are independent. Moreover, the test statistic F0 isconstructed such that

F0 =SSR

kSSRes

(n−k−1)=

MSR

MSRes

Therefore, under H0, F0 follows a Fk,n−k−1 distribution why our null hypothesisH0 can be rejected on confidence level 100(1− α) if F0 > Fα,k,n−k−1.

3.9 Model adequacy

The coefficient of determination, R2, is a measure used for examining the modeladequacy. R2 is defined as

R2 =Variance explained by the model

Total variance= 1− SSRes

SST

The measure examines the distance between the observed values and the fit-ted values. More specifically, it measures the proportion of the variance in theresponse variable that is explained by the set of independent variables. How-ever, R2 is proportional to the number of predictor variables chosen to include,meaning it increases when adding more independent variables. An alternativemeasure is given by the adjusted R2. The adjusted R2 is defined as

R2adj = 1−

SSRes

n−pSST

(n−1)

that is similar to R2, but increases only if the added independent variable re-duces the residual mean square [40, p. 88].

The p-value of each linear coefficient examines the null hypothesis that thecoefficient βj is equal to zero. A low p-value (typically < 0.05, depending onthe level of significance chosen by the analyst) indicates that the null hypothesiscan be rejected. A failure to reject the null hypothesis due to a high p-valuesuggests that the predictor variable might be insignificant.

17

4 Methodology

4.1 Data

The data was in part obtained from NGM and in part from Yahoo Finance.Yahoo Finance’s data provider is ICE Data Services [51]. The data consisted of1,269 observations with 66 variables (one dependent variable and 65 regressorvariables), and was modified (according to section 4.5.1) in Microsoft Excel tofit the purposes of this analysis. A complete list of all regressor variables isfound in table 3.

4.2 Software

The regression analysis was performed in the integrated development environ-ment RStudio using the programming language R.

4.3 Timeline

The analysis was performed with data over a five-year period; from 2015-01-01to 2019-12-31.

4.4 Delimitations

Delimitations were made to only examine trading days for which exchanges inall markets with indices included in our analysis were open. Thus, trading dayson NDX with public holidays in any of the markets, e.g. Thanksgiving in theUnited States and Midsummer in Sweden, were omitted from the analysis. Intotal, 107 trading days were omitted. A complete list of these can be found intable 8 in section A.1. The reason for this omission of certain trading days wasto construct a model as accurate as possible.

4.5 Variables

On NDX, ETPs are benchmarked to either stocks, indices, commodities, cur-rencies or interest rates. Due to the limited amount of ETPs benchmarked tointerest rates,6 no regressor variables were related to interest rates. Instead,focus was devoted to equity indices, volatility indices, stocks, commodities, andcurrencies as regressor variables. The following section describes which regres-sor variables that were included and why. However, the reader will first beacquainted to the methods used to alter the data to fit the purpose as appro-priate regressor variables.

4.5.1 Initial Variable Treatments

To fully exploit the information about e.g. equity indices and their relation tothe number of trades in ETPs on NDX, one needs to transform the insignificantasset prices/values. It is very improbable that an equity index’s value will haveany effect on the number of trades. However, daily returns of indices couldpossibly help explain the number of trades in ETPs. As previously noted,

6Only an approximate 70 ETPs are benchmarked to interest rates out of a total of 16,000ETPs on NDX (as of April 2020).

18

studies have suggested that price changes and daily returns of securities affectthe trading volume and number of trades. Therefore, the daily return of mostassets was chosen rather than the assets’ prices as regressor variables. Note thatfor some assets the actual price (e.g. VIX) is used.

The daily return of an asset on trading day t, expressed in percent anddenoted by rt, is computed accordingly:

rt = 100×

(ptpt−1

− 1

)

where pt denotes the security’s closing price at trading day t.We identified a severe weakness in a previous bachelor thesis on the topic

of trading volume, ”Trading volume at Avanza” [52]. The authors of the thesisattempted, as the title entails, to determine the factors which affect the tradingvolume at Avanza. They did so using multiple linear regression, and obtainedan adjusted R2 of 1.926% in their final model. We believe that this poor modelaccuracy was due to the fact that they used the daily returns of securities (e.g.indices) as regressor variables, without any sort of variable treatment or modifi-cation. The problematic aspects of this relates to that in a linear fit this impliesthat solely a positive or a negative daily return in a security can increase thetrading volume, whilst the opposite sign in the return will decrease the tradingvolume. This sort of variable treatment (or rather, mistreatment) neglects andcontradicts previous research which advocates that it is the volatility and pricemovements that affect trading volumes [34].

Therefore, this issue has been bypassed by using the absolute value of thedaily return, and in some cases the regressor variables are separated into twovariables, one of which contains the positive returns and one of which containsthe negative returns. This is clarified mathematically below.

Consider the daily return of a security, rt. Let r+t and r−t denote the positivereturn and negative return, respectively. These are defined as

r+t = rt · 1{rt>0}

r−t = rt · 1{rt<0}

where 1 denotes the indicator function.The idea behind this separation (compared to simply using the absolute

value) is that an increase in a security’s price might not trigger the same ETPtrading pattern as a decrease in the security’s price, thus resulting in differingregression coefficients. Performing this separation of variables could thereforepossibly increase the accuracy of the model. For instance, in a bearish market aninvestor might choose to hedge against further falls, or to go long with leverageto capitalize on a potential price rebound (using ETPs). On the contrary, ina bullish market there might not exist the same incentives to use ETPs, andthe investor might choose a more long-term strategy and invest in single stocks,an investment option which also often incurs lower trading costs. We motivatethis method of separation based on previous research which has suggested thatpositive and negative returns have different effects on the trading volume, e.g.[13].

Regressor variables which comprise the positive return of some asset havebeen named on the form asset pos. Equivalently, the negative returns have

19

been named asset neg. Regressor variables which comprise the absolute valueof the daily return have been named asset change. For the cases where theregressor variables equal some asset’s price or value, the variables have beennamed asset price.

4.5.2 Number of Trades

The number of trades in ETPs is the aggregated number of trades in all ETPs onNDX during one trading day. The number of trades is the dependent variable inour analysis. Henceforth, y and trades will be used interchangeably to denotethe number of trades.

4.5.3 Equity Indices

A substantial portion of the trading at NDX is in instruments with equity indicesas underlying assets [5]. These include (but are not limited to) CAC 40, DAX30, DJIA, EURO STOXX 50, FTSE 100, NASDAQ-100, OMXS30, and S&P500. All these indices are included in the analysis, except for OMXS30 andFTSE 100 which were replaced by OMXSPI and Cboe UK 100 due to erroneousdata. Evidently, some of the world’s largest stock exchanges are absent andnot represented by these indices, including the exchanges in Shanghai, Shenzen,Hong Kong, London, Toronto, Mumbai, Sydney and Seoul. To somewhat coverthese too, the equity index MSCI World Index was included in the analysis.

Further, the equity indices OMXH25, OBX, and OMXC20 were includeddespite the lack of ETPs with these as the underlying asset. This was due tothe substantial trading in Finnish, Norwegian and Danish single stocks on NDX.

For all the aforementioned indices, a separation between the positive andnegative daily return has been done according to section 4.5.1. That is, eachequity index is represented by two variables in the regression analysis, for in-stance DAX 30 is represented by dax30 pos and dax30 neg.

Equity indices are most often price-weighted or capitalization-weighted (alsocap-weighted). In a price-weighted index, each constituent is weighted in propor-tion to their share price. In a cap-weighted index, each constituent is weightedin proportion to their market capitalization.

See table 1 for a full list of included equity indices and descriptions, andtable 3 for the corresponding regressor variables.

4.5.4 Stocks

Substantial trading takes place in ETPs with single stocks as the underlyingasset. However, considerable price movements in single stocks, e.g. Tesla, arenot fully captured in equity indices such as the S&P 500. To account for some ofthe most frequently traded stocks on NDX in our analysis, the following stocksare included: Tesla, Inc., Apple, Inc., Amazon, Inc., H & M Hennes & MauritzAB, Aktiebolaget Volvo, Telefonaktiebolaget LM Ericsson, Lundin PetroleumAB,7 Danske Bank A/S, Nokia Abp, and DNO ASA.

For the American single stocks (i.e. Tesla, Apple and Amazon), the positiveand negative daily returns are separated according to section 4.5.1 as different

7On April 6 2020, Lundin Petroleum AB changed its name to Lundin Energy AB with thenew ticker symbol LUNE.

20

Equity index DescriptionCAC 40 Cap-weighted index of Euronext ParisCboe UK 100 Cap-weighted index of London Stock ExchangeDAX 30 Cap-weighted index of Frankfurt Stock ExchangeDJIA Price-weighted index of NYSE and NasdaqEuro STOXX 50 Cap-weighted index of 50 Eurozone stocksMSCI World Index Cap-weighted index of 1,643 global stocksNASDAQ-100 Cap-weighted index of NasdaqOBX Cap-weighted index of Oslo BørsOMXC20 Cap-weighted index of Nasdaq CopenhagenOMXH25 Cap-weighted index of Nasdaq HelsinkiOMXSPI Cap-weighted index of Nasdaq StockholmS&P 500 Cap-weighted index of NYSE and Nasdaq

Table 1: Equity indices

Corporation Stock TickerTesla, Inc. TSLA

Apple, Inc. AAPL

Amazon, Inc. AMZN

H & M Hennes & Mauritz AB HM B

Aktiebolaget Volvo VOLV B

Telefonaktiebolaget LM Ericsson ERIC B

Lundin Petroleum AB LUPE

Danske Bank A/S DANSKE

Nokia Abp NOKIA

DNO ASA DNO

Table 2: Stocks

variables. For the Nordic single stocks (i.e. the remaining), the absolute signof the relative price change is used as the variable. This was purely to simplifythe analysis. See table 2 for a complete list of the included stocks and table 3for the corresponding regressor variables.

4.5.5 Commodities

Commodities are frequently traded on NDX. The most actively traded com-modities on NDX are energy commodities and precious metals. The commodi-ties included in the analysis are crude oil futures prices (WTI and Brent), goldfutures prices, silver futures prices and palladium futures prices. There are manymore tradable commodities on NDX, however as the aforementioned account forthe most part of the trading volume, the delimitation to these commodities isdeemed appropriate. Ideally, natural gas prices would have been included in theanalysis, however due to erroneous and unreliable data it was omitted. Like thecase for equity indices, a separation between the positive and negative returnswas performed. See table 3 for the regressor variables.

21

4.5.6 Volatility

As mentioned previously, several studies suggest that volatility has a great im-pact on the number of trades and trading volumes. Therefore, the volatilityindices VIX, VVIX and VSTOXX are included. The Cboe Volatility Index,commonly known by its ticker symbol VIX, is a measure of the implied volatil-ity which is being priced into S&P 500 options [53, ch. 2]. More specifically, itis the 30-day implied volatility that is being priced into S&P 500 index options[54]. It is computed using the prices of S&P 500 index options with a maturity ofbetween 23 and 37 days [54]. Five regressor variables were included which relateto the VIX. Firstly, the price of the VIX. Secondly, the positive and negative(relative) returns of the VIX (according to section 4.5.1). Lastly, the positiveand negative (absolute) returns of the VIX. Let r denote the absolute return ofan asset. Then, using the notations in section 4.5.1, the positive and negativeabsolute returns are mathematically given by:

r+t =(pt − pt−1

)· 1{pt−pt−1>0}

r−t =(pt − pt−1

)· 1{pt−pt−1<0}

The reason for this is that the absolute return of the VIX could potentially yieldadditional useful information and help explain the number of trades.

A European corresponding volatility index is the Euro STOXX 50 Volatility(VSTOXX). The VSTOXX is computed in a very similar fashion to the VIX,with the implied volatility derived from option prices on the Euro STOXX 50index [55, ch. 8]. Two regressor variables were included related to the VSTOXX;the price of the VSTOXX and the change of the VSTOXX.

Further, the volatility of volatility measure VVIX was included in the analy-sis. It measures the 30 day implied volatility which is being priced into optionson the VIX. Three variables were included in the analysis related to the VVIX;the positive and negative returns of the VVIX, as well as the price of the VVIX.

See table 3 for the corresponding regressor variables.

4.5.7 Currencies

On NDX, there are around 750 listed ETPs with various currencies as underly-ing assets, including foreign exchange rates and crypto currencies. Substantialtrading takes part in ETPs with these underlying assets, why the exchange ratesEUR/SEK and USD/SEK, as well as BTC/USD were included in the analy-sis.8 For the currency exchange rates EUR/SEK and USD/SEK, the price ofthe asset as well as the positive and negative returns were included as regressorvariables. For BTC/USD, only positive and negative returns were included. Seetable 3 for a list of the corresponding regressor variables.

8BTC is the most frequently used currency code for Bitcoin. However, XBT is the ISO4217 currency code for Bitcoin.

22

Table 3: A complete collection of all 65 regressor variables

Variable Regression variablePositive return CAC 40 cac40 pos

Negative return CAC 40 cac40 neg

Positive return Cboe UK 100 uk100 pos

Negative return Cboe UK 100 uk100 neg

Positive return DAX 30 dax30 pos

Negative return DAX 30 dax30 neg

Positive return DJIA djia pos

Negative return DJIA djia neg

Positive return Euro STOXX 50 eurostoxx50 pos

Negative return Euro STOXX 50 eurostoxx50 neg

Positive return MSCI World Index msci pos

Negative return MSCI World Index msci neg

Positive return NASDAQ-100 nasdaq100 pos

Negative return NASDAQ-100 nasdaq100 neg

Positive return OBX obx pos

Negative return OBX obx neg

Positive return OMXC20 omxc20 pos

Negative return OMXC20 omxc20 neg

Positive return OMXH25 omxh25 pos

Negative return OMXH25 omxh25 neg

Positive return OMXSPI omxspi pos

Negative return OMXSPI omxspi neg

Positive return S&P 500 spx pos

Negative return S&P 500 spx neg

Positive return TSLA tsla pos

Negative return TSLA tsla neg

Positive return AAPL aapl pos

Negative return AAPL aapl neg

Positive return AMZN amzn pos

Negative return AMZN amzn neg

Change HM hm change

Change VOLV volv change

Change ERIC eric change

Change LUPE lupe change

Change DANSKE danske change

Change NOKIA nokia change

Change DNO dno change

Positive return WTI wti pos

Negative return WTI wti neg

Positive return Brent brent pos

Negative return Brent brent neg

Positive return Gold gold pos

Negative return Gold gold neg

Positive return Silver silver pos

Negative return Silver silver neg

23

Positive return Palladium palladium pos

Negative return Palladium palladium neg

Price VIX vix price

Positive return VIX (%) vix pos pct

Negative return VIX (%) vix neg pct

Positive return VIX (absolute) vix pos abs

Negative return VIX (absolute) vix neg abs

Price VVIX vvix price

Positive return VVIX vvix pos

Negative return VVIX vvix neg

Price VSTOXX vstoxx price

Change VSTOXX vstoxx change

Positive return USD/SEK usdsek pos

Negative return USD/SEK usdsek neg

Price EUR/SEK eursek price

Positive return EUR/SEK eursek pos

Negative return EUR/SEK eursek neg

Positive return BTC/USD btc pos

Negative return BTC/USD btc neg

24

5 Results

5.1 Initial Model

The initially fitted model with all 65 regressor variables and 1162 observationsproduced an adjusted R2 of 0.8103. In order to verify the fundamental OLSassumptions, residual analysis was performed. The Tukey-Anscombe plot infigure 3 shows an increasing variance with larger fitted values y. Thus het-eroscedasticity clearly is present. Further, the asymmetry of residuals aroundzero suggests problems with the model. It could potentially be due to the lack ofsome important regressor variables or the lack of a quadratic term of a currentlyincluded regressor variable [41, p. 346-348]. The Q-Q plot in figure 3 suggeststendencies that the residuals derive from a long-tailed distribution rather thana normal distribution [41, p. 358]. These are serious model assumption viola-tions that need to be dealt with before proceeding with the analysis and modelbuilding.

5.1.1 Altered Model

James et al. propose using concave functions to transform the response variableto solve the problem of heteroscedasticity [56, p. 95]. Such concave functionsinclude

√y and log(y). Both transformations were assessed, and the latter

transformation proved superior to the former with regards to heteroscedasticityand normality, see figures 4 and 5. Therefore, we chose to proceed with themodel

log(y) = Xβ + ε.

Henceforth, this model will be referred to as model A.

5.2 Residual Analysis

The Tukey-Anscombe plot in figure 5 suggests no violation of the assumptionE[ε]

= 0 as the residuals are scattered rather symmetrically around zero. In-deed, by looking at the red trend line, one can tell it is not exactly zero, butrather close. Further, the plot demonstrates no particular pattern or shapeof the residuals. Hence, the plot justifies the assumption of homoscedasticity.According to the Q-Q plot of model A in figure 5, the residuals are fairly nor-mally distributed. The distribution of the tails is not perfect and systematicdeviations are evident. However, if deviations in the tails are systematic, suchdeviations are less worrying [50, p. 97]. Thus, it remains plausible that errorsare indeed normally distributed.

5.3 Autocorrelation

As an initial attempt to visualize any potential autocorrelation in model A, theresiduals were plotted in a lag plot, see figure 6a). The lag plot indicates quitesevere autocorrelation as the scatter of points forms a straight line (more orless). The positive slope of the line indicates a strong positive serial correlation.To investigate this further, a Durbin-Watson test was performed. With a teststatistic d = 0.8642218 and a p-value of less than 2.2e− 16, the null hypothesis

25

Figure 3: Tukey-Anscombe and Q-Q plots for initial fit.

Figure 4: Residual plots for√

y = Xβ + ε.

Figure 5: Residual plots for log(y) = Xβ + ε.

26

(a) Model A (b) Model B

Figure 6: Plots of ei versus ei−1

of no autocorrelation between the errors was rejected (see table 4). Thus it isclear that there exists a positive serial correlation.

5.3.1 An Additional Model

To mitigate the problem of autocorrelation, an additional model with a laggeddependent variable was created (henceforth referred to as model B):

yt = φyt−1 + xtβ + εt.

Note that the lagged dependent variable yt−1 will throughout this thesisbe referred to as trade lag. A Durbin-Watson test was performed on modelB. The test statistic d = 1.524416 lies inside the interval of what is deemednormal. However, the p-value of 2.2e − 16 suggests that there is some positiveautocorrelation still present, however not particularly severe. Thus, this seemsto have fixed the issues with autocorrelation. Further, the lag plot in figure 6b)shows a scatter of points which form cluster with no clear shape or pattern.

Model Lag Autocorrelation D-W Statistic p-valueA 1 0.5667505 0.8642218 2.2e− 16B 1 0.235428 1.524416 2.2e− 16

Table 4: D-W test statistics

When analyzing the residual plots for model B, similarities to the initialmodel were detected. A decision to transform the dependent variable accord-ingly was taken:

log(yt) = φyt−1 + xtβ + εt.

Henceforth, this analysis will proceed with two models.

5.4 Leverage and Influential Points

Cook’s D, DFFITS and COVRATIO were used to detect outliers and influ-ential points. No influential points were detected using Cook’s D. However,criticism has been leveled against Cook’s D advocating it does not always suc-cessfully manage to capture influential points [57]. In addition, the cutoff valuesof DFFITS provide guidelines rather than strict rules [40, p. 218]. Hence, the

27

interpretation of the results from performing the different deletion diagnosticsare devolved on the analyst. After performing Cook’s D on model A, therewere six observations that substantially differed from the others, see figure 7.The same figure also suggest a better distribution of the residuals evident inmodel B. There were no observations violating the cutoff value of Cook’s D.However, violations with respect to the cutoff value were detected when per-forming DFFITS and COVRATIO, see figure 8 and 9. Because of the alreadygreat model fit, the five most influential observations for each measure wereremoved. At most, this would result in deleting 15 observations for each model,which corresponds to an average of 3 observations per year. We concluded thatit is better to include some extreme values, thus yielding more reliable modelsrather than deleting too many observations and manipulating the data. Thesame five observations were detected in Cook’s D and DFFITS. We also foundhalf of the observations captured by our deletion diagnostic to be evident inboth models. This further strengthens our method and conclusion to considerthem as influential points and thus remove them. The result of the observationsremoved from our models is presented in table 5.

Model Removed observationsA 2, 145, 340, 342, 436, 724, 725, 933, 1121B 74, 181, 230, 236, 246, 256, 308, 337, 432, 719

Table 5: Influential points deleted

5.5 Transformations

When the Box-Cox method (with PPCC as objective) was performed, the opti-mal λ was found to be approximately 0.5, see figure 10a). A transformation wasperformed accordingly, i.e. taking the square root of the response variable. Yetanother test on the same model was then performed and found current modelto be optimal with respect to normality of the response variable. Figure 11establishes the normality behavior in the residuals following the correspondingtransformation. Hence, the updated model A is:√

log(y) = Xβ + ε

The Box-Cox method was also used on model B. It suggested λ of approxi-mately one, indicating no transformation should be performed. See figure 10b).

Transformations of the independent variables are considered redundant dueto the already excellent precision of both models (i.e. an adjusted R2 valueabove 0.89 for each model).

5.6 Variable Selection

After removing outlier observations and transforming the models to meet theOLS assumptions, variable selection was performed using backward eliminationbased on p-values. In order to reduce the number of variables included in eachmodel the cutoff value was chosen to pout = 0.001. The produced models fromthe backward elimination were accepted as the final models. See table 7 forregressor variables and their coefficients.

28


Figure 7: Cook’s D


Figure 8: DFFITS


Figure 9: COVRATIO

29


Figure 10: Box-Cox

(a) Before (b) After

Figure 11: Q-Q plots before and after Box-Cox transformation of model A

30

Model A Model BVariables VIF Variables VIFusdsek price 3.023354 usdsek price 3.081091eursek price 5.329980 eursek price 6.264010dax pos 1.692688 dax pos 1.983957dax neg 2.710350 dax neg 3.288260vstoxx price 5.266708 vstoxx price 6.348448vstoxx change 1.664500 vvix price 2.197498vvix price 2.231558 vix price 4.457842vix price 4.356770 nokia change 1.085445nokia change 1.096386 omxc20 pos 1.8749775omxc20 neg 1.787848 palladium neg 1.035170palladium neg 1.037996 omxspi neg 3.233242

trade lag 4.415913

Table 6: Variance inflation factors

5.7 Multicollinearity

The variance inflation factor corresponding to each regressor variable is pre-sented in 6. Evidently, no values exceed 10 indicating no serious problem withmulticollinearity. Comparing the similar regressor variables and their corre-sponding VIF in model A and B, one can see that the VIF is marginally smallerin model A. This behavior is however expected due to the lagging dependentvariable in model B.

5.8 Final Models

Two final models were obtained from the analysis in this thesis, model A andmodel B.

5.8.1 Model A

Model A, our model without any lagged dependent variable, is given by:√log(trades) = β0 + β1 · usdsek price

+ β2 · eursek price

+ β3 · dax pos

+ β4 · dax neg

+ β5 · vstoxx price

+ β6 · vstoxx change

+ β7 · vvix price

+ β8 · vix price

+ β9 · nokia change

+ β10 · omxc20 neg

+ β11 · palladium neg

31

The numerical values of β0, . . . , β11 are found in table 7.Further, the residual analysis of final model A in figure 12 suggests that the

assumptions about mean zero, normality and homoscedasticity are true. How-ever, a final Durbin-Watson test shows that the strong positive autocorrelationis still present (d = 0.8669289), which could be problematic since measures ofprecision are less accurate due to underestimated variances [40, p. 475].

5.8.2 Model B

Model B, our model with a lagged dependent variable, is given by:

log(trades) = β0 + β1 · usdsek price

+ β2 · eursek price

+ β3 · dax pos

+ β4 · dax neg

+ β5 · vstoxx price

+ β6 · vvix price

+ β7 · vix price

+ β8 · nokia change

+ β9 · omxc20 pos

+ β10 · palladium neg

+ β11 · omxspi neg

+ φ · trade lag

The numerical values of β0, . . . , β11 and φ are found in table 7.Further, the residual analysis of final model B in figure 13 suggests that the

assumptions about mean zero, normality and homoscedasticity are true. A finalDurbin-Watson test shows that the autocorrelation problem seems to have beenfixed.

32

Table 7: Regressor coefficients included in the final models

Dependent variable:

Model A Model B

usdsek price 0.047∗∗∗ 0.222∗∗∗

(0.004) (0.020)

eursek price −0.149∗∗∗ −0.647∗∗∗

(0.005) (0.027)

dax pos 0.017∗∗∗ 0.088∗∗∗

(0.002) (0.012)

dax neg −0.018∗∗∗ −0.076∗∗∗

(0.003) (0.015)

vstoxx price 0.008∗∗∗ 0.028∗∗∗

(0.0004) (0.002)

vstoxx change −0.001∗∗∗

(0.0003)

vvix price −0.001∗∗∗ −0.005∗∗∗

(0.0001) (0.001)

vix price 0.004∗∗∗ 0.012∗∗∗

(0.001) (0.003)

nokia change 0.004∗∗∗ 0.018∗∗∗

(0.001) (0.004)

omxc20 neg −0.008∗∗∗

(0.002)

omxc20 pos −0.058∗∗∗

(0.013)

palladium neg −0.004∗∗∗ −0.024∗∗∗

(0.001) (0.006)

omxspi neg −0.061∗∗∗

(0.018)

trade lag 0.00005∗∗∗

(0.00000)

Constant 3.873∗∗∗ 12.398∗∗∗

(0.036) (0.199)

Observations 1,153 1,153R2 0.894 0.923Adjusted R2 0.893 0.922Residual Std. Error 0.037 (df = 1141) 0.185 (df = 1140)F Statistic 874.533∗∗∗ (df = 11; 1141) 1,141.884∗∗∗ (df = 12; 1140)

Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01

33

Figure 12: Residual plots for the final model A

Figure 13: Residual plots for the final model B

34

6 Discussion

6.1 Model Adequacy

The adjusted R2 value for model A and model B was 0.893 and 0.922, respec-tively. What is considered a good value of adjusted R2 depends on numerousof factors. E.g. for studies conducted on social science, adjusted R2 > 0.25 isconsidered moderate and > 0.64 as strong [58, p. 533]. Another known rule ofthumb among clinicians and analysts are adjusted R2 above 0.5 are consideredgood, but even values greater than 0.1 are in some cases decent. Hence, to becareful with the conclusions and context is indispensable. Similar theses canbe examined to provide guidelines and indications of decent model adequacy.One previously mentioned thesis, ”Trading volume at Avanza” [52], presenteda model with similar response variable (i.e. trading volume) with an adjustedR2 ≈ 0.019 indicating our model adequacy to be excellent. One possible ex-planation for this may stem from our choice of distinguishing between nega-tive and positive changes within the predictor variables. Yet another, althoughstill speculative, difference in the model accuracy may be that fluctuations andprice changes affect the number of trades in ETPs more than the whole mar-ket. Hence, the effects of price changes among the underlying assets are moreevident, and therefore a higher variability of the dependent variable is observedyielding a better model fit.

One shortcoming in model A is the evident violation of the assumption ofuncorrelated errors. Consequently, such violations implies β is still unbiased,but the variance of β is no longer minimized. Yet another problem with corre-lating errors is that the value of the t-statistic increases implying that regressorvariables may seem statistically significant when they are not. This leads to un-certainties with regards to model accuracy. Typically, problems with autocor-relation indicate a time dependency meaning time series analysis is of necessity.One solution to mitigate effects of time series is to add a lagging variable, whichindeed was done in model B. However, adding a lagging variable complicatesforecasting which is a characteristic still desirable. Therefore, two models werekept.

6.2 Financial Interpretations

6.2.1 Volatility

This thesis affirms the long-established relationship between number of tradesand volatility. Both model A and model B contain the VIX and VSTOXXvolatility indices, which directly measure the implied volatility in the market.

Interestingly, both models suggest that the change in volatility affects thenumber of trades (in ETPs) as the VVIX is included in both models, and thechange of the VSTOXX is included in model A. The sign of the coefficients forboth the VVIX and the change of the VSTOXX is negative. Thus, an increasingvolatility of volatility decreases the number of trades. Why is a conundrum,although it could potentially be explained by the sequential information arrivalhypothesis, which states that in addition to the contemporaneous relationshipbetween trading volume and volatility, there exists a lagged relationship betweenthe two. It is possible that the VVIX and the change of the VSTOXX somehowdepict this lagged relationship.

35

6.2.2 Palladium

The inclusion of the negative daily return of palladium in the models is some-what surprising. However, the negative sign in the regressor coefficient tells usthat negative price returns increase the number of trades. A potential expla-nation for this is that investors try to exploit potential price rebounds as theprice falls. The price of palladium has grown exponentially in the past years,and is by many people expected to keep climbing due to a rising demand. Theprice growth has mainly been driven by an increased demand in the automo-tive industry, where palladium is used as a catalytic converter for automotiveexhausts [59].

6.2.3 Currencies

One fascinating (and yet unexpected) result from our analysis is the seeminglyimportant role of the USD/SEK and EUR/SEK exchange rates for the numberof trades in ETPs on NDX. Inclusion of foreign currency hedging can be favor-able for investors and studies have found a positive correlation between stockprice and currency hedges indicating investors to premier such hedges [27]. Thissuggests the importance and interest of currency hedging among investors andcould therefore provide basis for arguments in favor of its statistical relevance.However, the interpretation of the regression coefficients remains unclear. Theusdsek price is positive for both models whilst eursek price is negative. Sur-prisingly, the positive and negative returns were excluded on the determined rel-evant statistical significance for this thesis, thus further complicating possibleinterpretations. To argue that the number of traded ETPs should increase withUSD/SEK but decrease with EUR/SEK calls for vague and cheap translations.

6.2.4 Equity Indices

The positive and negative returns of DAX were included in both models. Thiswas not particularly surprising as DAX has invariably been among the threemost traded underlying assets on NDX since at least 2011 [5]. In model A,the coefficients for the positive and negative returns are very similar (0.017 and-0.018, respectively). This suggests that there is (almost) no difference in thenumber of trades attributed to positive or negative returns. However, thereis a slight difference between the coefficients in model B (0.088 and -0.076,respectively). The minimal differences in regression coefficients imply that forsimplicity’s sake, the positive and negative return variables could have beenreplaced by the absolute value of the relative return, without any serious loss ofmodel accuracy.

In model A, the negative return of OMXC20 was included, and in model B,the positive return of OMXC20 was included. This was somewhat surprising asthere are no ETPs benchmarked to OMXC20 (or the similar index OMXC25).However, one possible explanation is that substantial trading on NDX takesplace in Danish stocks, e.g. Novo Nordisk, A.P. Moeller-Maersk, and DanskeBank, which are constituents of the OMXC20 index. Thus, changes in theOMXC20 index seemingly help to explain the ETP trading in the underlyingstocks.

Further, the negative return of the OMXSPI was included in model B. Itwas expected that the returns of Nasdaq Stockholm would somehow constitute

36

an important factor in this analysis, as substantial trading takes place in ETPsbenchmarked to OMXS30 and Swedish stocks [5]. The negative sign of theregression coefficient can be interpreted similarly to the case of palladium insection 6.2.2. Apparently, the number of trades in ETPs on NDX increases asthe OMXSPI falls. This could potentially be explained by the fact that someinvestors trade speculatively in an attempt to exploit potential price rebounds.Another possible explanation is that traders hedge their portfolios using ETPsagainst further falls.

6.2.5 Nokia

The inclusion of Nokia in both models was surprising to us. It is the onlysingle stock included in any of the models, and other stocks were beforehandconsidered more probable to be significant for the model, e.g. Ericsson or Tesla.However, Nokia is one of the largest constituents of the OMXH25 index, andone of the most traded assets on NDX Finland [5]. Thus the inclusion of Nokiaseems justifiable.

6.3 Further Research

In this thesis we have identified several areas of potential improvements in futurestudies.

One such potential improvement is the inclusion of some kind of volatilitymeasure on different types of commodities, in addition to mere daily returnsfor commodities, since studies have suggested that volatility greatly affects thetrading volume and number of trades. Thus, it is plausible that such mea-sures better capture ETP trading patterns than returns. Further, volatilitiesfor commodities and equities might differ greatly, e.g. in times of geopoliticalturmoil in OPEC countries, why equity indices (e.g. S&P 500) should not suf-fice. Such volatility measures include Cboe Energy Sector ETF Volatility Index(VXXLE), Cboe Crude Oil ETF Volatility Index (OVX) or Cboe Silver ETFVolatility Index (VXSLV). However, due to difficulties retrieving such data, suchmeasures were completely omitted from the analysis and it remains a potentialimprovement for further research.

Second, we added a lagged dependent variable in order to mitigate the auto-correlation issues in model A. Previous research has showed that trading volumesdisplay strong serial correlation [60], which is not especially surprising consider-ing that subsequent trading data constitutes a time series, and that the errorsin time series often exhibit autocorrelation [40, p. 474]. Ideally, we would haveliked to combine multiple linear regression with some autoregressive-moving av-erage model (e.g. ARMA or ARIMA) to combat the autocorrelation ratherthan just adding the lagged dependent variable for a more robust model. Addi-tionally, studies have shown that ARMA models outperform LDV models [45].However, as this was deemed outside the scope of this thesis we leave this as apotential improvement for future research.

While relating to the topic of time series models, another potentially usefulextension of this thesis would be to identify the seasonal effects on ETP tradingusing e.g. SARIMA (seasonal ARIMA).

All factors included as regressor variables in our models (apart from thelagged dependent variable) are exogenous in the sense that NGM, brokers, or

37

issuers of ETPs have no ability to affect them. Previous research has shown thatthe bid-ask spread for one affects the trading volume in future [7]. To examinehow endogenous factors (e.g. bid-ask spread, commission fees, and the leverageof ETPs) affect the number of trades in single ETPs could be an interestingtopic for future research. Undoubtedly, issuers of ETPs would find this of greatinterest since they have the ability to affect these factors and in turn affect theirrevenue streams.

Lastly, to use mathematical tools other than the ordinary least squares onthe very same data could improve the models and must thus be considered aspotential future research. This includes generalized least squares to manage theautocorrelation or generalized linear models to completely surpass the assump-tions about no autocorrelation or normal distribution.

38

7 Conclusion

The research question in this thesis is:

Which factors affect the number of trades in exchange-traded prod-ucts on Nordic Derivatives Exchange?

The specific factors were those included in the obtained models using multiplelinear regression. We refer to table 7 for a complete list of these factors. Allregressor variables obtained from the mathematical analysis were statisticallysignificant at p < 0.001.

This thesis affirms the long-established relationship between the number oftrades and volatility. Further, the results suggest that change of volatility alsoaffects the number of trades. Potentially, this discovery could be explained bythe sequential information arrival hypothesis. Additionally, currency exchangerates and equity indices were also found significant for the number of trades.

39

A Appendix

A.1 An Introduction to Financial Derivatives

In this section, the fundamentals of financial instruments, options, and futurescontracts are described.

Assets viable for trading are considered financial instruments. There aretwo financial instruments on the market today, namely derivatives and cashinstruments [61]. While the value of cash instruments are determined directlyby the market, the value of a derivative depends on the value of an underlyingasset, and is hence determined indirectly by the market [24, p. 1]. Anotherinterpretation of derivatives is trading of risk from one entity to another.

The market for financial instruments has become increasingly popular andimportant over the last 40 years [24, p. 1]. Forward contracts, swaps, options,and other derivatives are being used by fund managers, financial institutions,investors and corporations on a daily basis. When measured in terms of under-lying assets, the derivatives market is much bigger than the stock market [24,p. 1].

Even derivatives on weather conditions are being traded on a daily basis,which at a first glimpse can seem ridiculous. But, suppose you are a farmergrowing seeds. The profit of your farm depends on the weather conditionsand derivatives can therefore be used for e.g. hedging themselves against badweather conditions. Hence, the derivative markets role to finance is inevitable,whether you are a fund manager or a farmer.

A.1.1 Options

Options are financial derivatives that gives the holder of the option the right -but not the obligation - to buy or sell an underlying asset at some certain timefor some certain price. An option that gives the holder the right to buy theunderlying asset is called a call option. An option that gives the holder theright to sell the underlying asset is called a put option. The price which theholder of the option can buy or sell the underlying asset is called strike price.The date at which the holder of the option can buy or sell the underlying assetis called maturity date [26, p. 23].

There are various types of options, e.g. European options which can only beexercised on the maturity date, and American options which can be exercisedat any time before maturity. Both European and American options exist as calland put options [26, p. 23].

Further, there are barrier options (e.g. down-and-out, up-and-out, down-and-in, up-and-in, ladder, and lookback options), whose payments depend onthe underlying asset’s price path until maturity. For down-and-out and up-and-out contracts, if the underlying asset’s price hits some barrier specified by thecontract during the contract period, then the contract ceases to exist and thepayout is zero. Down-and-out and up-and-out options exist as both call andput options [23, p. 265-280].

Options can be both exchange-traded and traded over-the-counter [26, p. 23].Options can be benchmarked to roughly any assets, but most commonly optionsare benchmarked to stocks, commodities and currencies, as well as other finan-cial derivatives, including futures contracts [23, p. 106].

40

A.1.2 Futures Contracts

A futures contract is an obligation to buy or sell an underlying security at aspecific price at a specific time in the future [62]. Futures are exchange tradedfinancial derivatives and can be benchmarked to assets such as commodities,indices or currencies. Futures contracts enable trading in otherwise inaccessibleand/or inconvenient assets to investors. A great example which illustrates thisis the commodity market, where one actually has to deliver the traded asset (e.g.timber, ripening grapes or barrels of crude oil) [23, p. 455]. Futures contractscan be either cash-settled or physically delivered.

Futures contracts are traded on exchanges. The terms of the contract arestandardized and resolved by the exchange [26, p. 40]. The contract must es-tablish the underlying asset, the size of the asset, time to maturity and how thedelivery will be formed. The exchange uses margin accounts to avoid contractdefaults. These are adjusted on a daily basis, called daily settlement, to reflectthe investors gain or losses [26, p.45].

Most futures contract do not lead to delivery [26, p. 40]. The ulterior reasonis that investors often take a closing position. A closing position is obtained byentering a new futures contract, but taking the opposite position with respectto the first contract.

41

A.2 Omitted Trading Days

The table below gives a comprehensive overview of trading days omitted fromthe analysis due to closed markets. There is a total of 107 omitted trading days.

2015-01-06 2015-01-15 2015-01-19 2015-02-16 2015-02-232015-02-25 2015-02-27 2015-04-02 2015-05-15 2015-05-252015-06-05 2015-06-16 2015-06-18 2015-06-19 2015-06-292015-07-03 2015-08-12 2015-08-13 2015-09-03 2015-09-072015-11-26 2015-12-29 2016-01-06 2016-01-18 2016-02-152016-03-04 2016-03-24 2016-04-22 2016-05-06 2016-05-162016-05-17 2016-05-30 2016-06-06 2016-06-24 2016-07-042016-09-05 2016-10-03 2016-11-24 2016-12-06 2017-01-022017-01-03 2017-01-06 2017-01-16 2017-01-31 2017-02-202017-04-13 2017-05-01 2017-05-12 2017-05-17 2017-05-262017-05-29 2017-06-05 2017-06-06 2017-06-07 2017-06-232017-06-26 2017-07-04 2017-09-04 2017-09-05 2017-10-032017-10-31 2017-11-23 2017-12-06 2018-01-15 2018-01-262018-02-19 2018-03-29 2018-04-27 2018-05-01 2018-05-112018-05-17 2018-05-21 2018-05-28 2018-06-05 2018-06-062018-06-07 2018-06-22 2018-06-25 2018-07-04 2018-09-032018-10-03 2018-11-22 2018-12-04 2018-12-05 2018-12-062018-12-10 2019-01-21 2019-02-18 2019-03-04 2019-03-052019-04-18 2019-05-17 2019-05-27 2019-05-31 2019-06-052019-06-06 2019-06-10 2019-06-17 2019-06-21 2019-07-042019-08-05 2019-08-30 2019-09-02 2019-10-03 2019-11-042019-11-28 2019-12-06

Table 8: Omitted trading days

42

References

[1] Whaley, R. E. Derivatives markets, valuation, and risk management(S.l.], 2007). URL http:

//portal.igpublish.com/iglibrary/search/WILEYB0009975.html.

[2] Translated by L. W. King. The Avalon Project : Code of Hammurabi.URL https://avalon.law.yale.edu/ancient/hamframe.asp.

[3] Stockholms Optionsmarknad. Optionshandeln pa OM Stockholm 1990.URL https://www.youtube.com/watch?v=2mOirS1NPs4. Youtube.

[4] NGM. Var historia. URL http://www.ngm.se/om-ngm/var-historia/.Accessed: 2020-04-03.

[5] NGM. Statistics. URL http://www.ngm.se/statistik/.

[6] Ye, C. & Yu, L.-H. The effect of restatements on trading volume reactionsto earnings announcements. Review of Quantitative Finance andAccounting 50, 129–180 (2018).

[7] Wang, G. H. K. & Yau, J. Trading volume, bid–ask spread, and pricevolatility in futures markets. Journal of Futures Markets 20, 943–970(2000).

[8] Frino, A., Jarnecic, E. & Zheng, H. Activity in futures: does underlyingmarket size relate to futures trading volume? Review of QuantitativeFinance and Accounting 34, 313–325 (2010).

[9] Jones, C. M. Transactions, volume, and volatility. The review of financialstudies 7 (1994).

[10] Chan, K. & Fong, W.-M. Trade size, order imbalance, and thevolatility–volume relation. Journal of Financial Economics 57, 247–273(2000).

[11] Sarwar, G. The interrelation of price volatility and trading volume ofcurrency options. Journal of Futures Markets 23, 681–700 (2003).

[12] Machnes, Y. The trading volume of currency options and the spotexchange rate. Emerging Markets Finance and Trade 42, 91–97 (2006).URLhttp://www.tandfonline.com/doi/abs/10.2753/REE1540-496X420305.

[13] Jain, P. C. & Joh, G.-H. The dependence between hourly prices andtrading volume. Journal of Financial and Quantitative Analysis 23,269–283 (1988).

[14] Mcinish, T. H. & Wood, R. A. Hourly returns, volume, trade size, andnumber of trades. Journal of Financial Research 14, 303–315 (1991).

[15] Shefrin, H. A behavioral approach to asset pricing. Academic Pressadvanced finance series (Academic Press/Elsevier, Amsterdam ; Boston,2008), 2nd ed.. edn.

43

http://portal.igpublish.com/iglibrary/search/WILEYB0009975.html

http://portal.igpublish.com/iglibrary/search/WILEYB0009975.html

https://avalon.law.yale.edu/ancient/hamframe.asp

https://www.youtube.com/watch?v=2mOirS1NPs4

http://www.ngm.se/om-ngm/var-historia/

http://www.ngm.se/statistik/

http://www.tandfonline.com/doi/abs/10.2753/REE1540-496X420305

[16] Chen, J., Hong, H. & Stein, J. C. Forecasting crashes: trading volume,past returns, and conditional skewness in stock prices. Journal ofFinancial Economics 61, 345–381 (2001).

[17] WisdomTree. ETPedia - The educational guide to Exchange TradedProducts (ETPs) (2018). URL https://www.wisdomtree.eu/en-gb/-/

media/eu-media-files/uncategorized/etpedia/etpedia.pdf.

[18] SEB. Mini Futures - En del av SEB:s utbud inom Borshandladeprodukter. URLhttps://seb.se/pow/BorsFinans/ETF/MiniFutures/MiniFutures.pdf.

[19] NGM. Mini Future – havstang i bade upp och nedgang, del 1. URLhttp://www.ngm.se/utbildning/mini-future-del-1/. Accessed on2020-04-06.

[20] Societe Generale. Mini Futures - En Fartfylld Investering (2019). URLhttps://www.warrants.societegenerale.se/SiteContent/17/17/2/

918/10/Mini_Futures.pdf.

[21] Societe Generale. Warranter - kraftfulla handelsverktyg (2019). URLhttps://www.warrants.societegenerale.se/SiteContent/17/17/2/

918/10/Warranter.pdf.

[22] NGM. Turbowarranter – ett vardepapper med havstang, del 2. URLhttp://www.ngm.se/utbildning/turbowarranter-del-2/. Accessed on2020-04-06.

[23] Bjork, T. Arbitrage theory in continuous time (Oxford university press,2009).

[24] Hull, J. C. Options, futures and other derivatives (Upper Saddle River,NJ: Prentice Hall,, 2009).

[25] CFI. Speculation. URL https://corporatefinanceinstitute.com/

resources/knowledge/trading-investing/speculation/. Accessed on2020-04-30.

[26] Hull, J. C. Fundamentals of futures and options markets (Pearson HigherEducation AU, 2013).

[27] Allayannis, G. & Weston, J. P. The use of foreign currency derivatives andfirm market value. The review of financial studies 14, 243–276 (2001).

[28] Tsetsekos, G. & Varangis, P. N. The structure of derivatives exchanges:Lessons from developed and emerging markets. 1887 (World BankPublications, 1998).

[29] NGM. Vara medlemmar. URL http://www.ngm.se/medlemmar/.Accessed on 2020-04-06.

[30] Investopedia. Brokerage company. URLhttps://www.investopedia.com/terms/b/brokerage-company.asp.Accessed on 2020-04-06.

44

https://www.wisdomtree.eu/en-gb/-/media/eu-media-files/uncategorized/etpedia/etpedia.pdf

https://www.wisdomtree.eu/en-gb/-/media/eu-media-files/uncategorized/etpedia/etpedia.pdf

https://seb.se/pow/BorsFinans/ETF/MiniFutures/MiniFutures.pdf

http://www.ngm.se/utbildning/mini-future-del-1/

https://www.warrants.societegenerale.se/SiteContent/17/17/2/918/10/Mini_Futures.pdf

https://www.warrants.societegenerale.se/SiteContent/17/17/2/918/10/Mini_Futures.pdf

https://www.warrants.societegenerale.se/SiteContent/17/17/2/918/10/Warranter.pdf

https://www.warrants.societegenerale.se/SiteContent/17/17/2/918/10/Warranter.pdf

http://www.ngm.se/utbildning/turbowarranter-del-2/

https://corporatefinanceinstitute.com/resources/knowledge/trading-investing/speculation/

https://corporatefinanceinstitute.com/resources/knowledge/trading-investing/speculation/

http://www.ngm.se/medlemmar/

https://www.investopedia.com/terms/b/brokerage-company.asp

[31] Vontobel. Final terms for constant leverage certificates. URLhttp://idoc.ngm.se/20200204120410142000.pdf. Accessed on2020-04-06.

[32] IG Trading. Market maker definition. URL https:

//www.ig.com/se/trading-ordlista/market-maker-definition.Accessed on 2020-04-06.

[33] Investopedia. Market maker definition. URLhttps://www.investopedia.com/terms/m/marketmaker.asp. Accessedon 2020-04-06.

[34] Karpoff, J. M. The Relation between Price Changes and Trading Volume:A Survey. Journal of Financial and Quantitative Analysis 22, 109–126(1987).

[35] Karpoff, J. M. A theory of trading volume. Journal of Finance 41,1069–1087 (1986).

[36] Carroll, R. & Kearney, C. Testing the mixture of distributions hypothesison target stocks. Journal of International Financial Markets, InstitutionsMoney 39, 1–14 (2015).

[37] Shiller, R. Econ 252-11: Financial markets [lecture 17—options markets](2011). URL https://web.archive.org/web/20160922071546/http:

//oyc.yale.edu/transcript/1086/econ-252-11. Lecture at YaleUniversity.

[38] Trading data obtained from NGM.

[39] NGM. Handelsavgifter. URL http://www.ngm.se/wp-content/uploads/

2019/09/Handelsavgifter-20181101.pdf. Accessed: 2020-04-03.

[40] Montgomery, D. C., Peck, E. A. & Vining, G. G. Introduction to linearregression analysis (John Wiley & Sons, 2012), 5 edn.

[41] Rawlings, J. O., Pantula, S. G. & Dickey, D. A. Applied RegressionAnalysis: A Research Tool. Springer Texts in Statistics, (Springer NewYork, New York, NY, 1998), second edition. edn.

[42] Statistics How To. Lag plot: Definition, examples (2016). URLhttps://www.statisticshowto.com/lag-plot/. Accessed on2020-04-07.

[43] Savin, N. & White, K. The durbin-watson test for serial correlation withextreme sample sizes or many regressors. Econometrica 45, 1989 (1977).URL http://search.proquest.com/docview/1296448043/.

[44] Statistics How To. Durbin watson test test statistic (2016). URL https:

//www.statisticshowto.com/durbin-watson-test-coefficient/.

[45] Keele, L. & Kelly, N. J. Dynamic models for dynamic theories: The insand outs of lagged dependent variables. Political Analysis 14, 186–205(2006).

45

http://idoc.ngm.se/20200204120410142000.pdf

https://www.ig.com/se/trading-ordlista/market-maker-definition

https://www.ig.com/se/trading-ordlista/market-maker-definition

https://www.investopedia.com/terms/m/marketmaker.asp

https://web.archive.org/web/20160922071546/http://oyc.yale.edu/transcript/1086/econ-252-11

https://web.archive.org/web/20160922071546/http://oyc.yale.edu/transcript/1086/econ-252-11

http://www.ngm.se/wp-content/uploads/2019/09/Handelsavgifter-20181101.pdf

http://www.ngm.se/wp-content/uploads/2019/09/Handelsavgifter-20181101.pdf

https://www.statisticshowto.com/lag-plot/

http://search.proquest.com/docview/1296448043/

https://www.statisticshowto.com/durbin-watson-test-coefficient/

https://www.statisticshowto.com/durbin-watson-test-coefficient/

[46] Sapir, A. Use of the durbin-watson statistic with lagged dependentvariables. Metroeconomica 29, 169–172 (1977).

[47] Box, G. E. & Cox, D. R. An analysis of transformations. Journal of theRoyal Statistical Society: Series B (Methodological) 26, 211–243 (1964).

[48] The probability plot correlation coefficient test for the normal, lognormal,and gumbel distributional hypotheses. Water Resources Research 22,587–590 (1986).

[49] Hastie, T., Friedman, J. & Tibshirani, R. The Elements of StatisticalLearning: Data Mining, Inference, and Prediction. Springer Series inStatistics, (Springer New York, New York, NY, 2001).

[50] Dr. Marcel Dettling. Applied Statistical Regression (EidgenossischeTechnische Hochschule Zurich, 2019).

[51] Yahoo. Exchanges and data providers on yahoo finance. URLhttps://help.yahoo.com/kb/finance-for-web/SLN2310.html?

impressions=true.

[52] Knutsson, G. & Espahbodi, K. Trading volume at Avanza (2019). URLhttp://www.diva-portal.org/smash/get/diva2:

1334365/FULLTEXT01.pdf.

[53] Rhoads, R. Trading VIX derivatives trading and hedging strategies usingVIX futures, options, and exchange-traded notes. Wiley trading ; 503(Wiley, Hoboken, N.J., 2011).

[54] Cboe (Chicago Board Options Exchange). White Paper Cboe VolatilityIndex. Tech. Rep. (2019). URLhttps://www.cboe.com/micro/vix/vixwhite.pdf.

[55] STOXX. STOXX Strategy Index Guide. Tech. Rep. (2020). URLhttps://www.stoxx.com/document/Indices/Common/Indexguide/

stoxx_strategy_guide.pdf.

[56] James, G. An Introduction to Statistical Learning with Applications in R.Springer Texts in Statistics, 103 (2013), 1st ed. 2013.. edn.

[57] Kim, M. G. A cautionary note on the use of cook’s distance.Communications for Statistical Applications and Methods 24, 317–324(2017).

[58] Ferguson, C. J. An effect size primer: a guide for clinicians andresearchers. (2016).

[59] H. Sanderson, N. Hume. Palladium surges to new record high over $2,500an ounce. Financial Times (2020). URL https:

//www.ft.com/content/4a20f6f0-3951-11ea-a6d3-9a26f8c3cba4.

[60] Ferguson, N. J. Investor information processing and trading volume.Asia-Pacific Journal of Financial Studies 44, 322–351 (2015).

46

https://help.yahoo.com/kb/finance-for-web/SLN2310.html?impressions=true

https://help.yahoo.com/kb/finance-for-web/SLN2310.html?impressions=true

http://www.diva-portal.org/smash/get/diva2:1334365/FULLTEXT01.pdf

http://www.diva-portal.org/smash/get/diva2:1334365/FULLTEXT01.pdf

https://www.cboe.com/micro/vix/vixwhite.pdf

https://www.stoxx.com/document/Indices/Common/Indexguide/stoxx_strategy_guide.pdf

https://www.stoxx.com/document/Indices/Common/Indexguide/stoxx_strategy_guide.pdf

https://www.ft.com/content/4a20f6f0-3951-11ea-a6d3-9a26f8c3cba4

https://www.ft.com/content/4a20f6f0-3951-11ea-a6d3-9a26f8c3cba4

[61] Investopedia. Financial instruments. URLhttps://www.investopedia.com/terms/f/financialinstrument.asp.Accessed on 2020-04-06.

[62] Lind-Waldock. The Complete Guide to Futures Trading: What You Needto Know about the Risks and Rewards (Wiley, 2006), 1 edn.

47

https://www.investopedia.com/terms/f/financialinstrument.asp

TRITA 2020:021

www.kth.se

Factors Affecting the Number of Trades in ETPs on Nordic ...

Documents

Transcript of Factors Affecting the Number of Trades in ETPs on Nordic ...