Aleksander Senek, Robin Eriksson, Viktor Mattsson Project ... · 2.2 Vasicek Model The Vasicek...

Department of Information Technology

Parameter Estimation and ApproximatePosterior Exploration in Financial Models

Aleksander Senek, Robin Eriksson, Viktor Mattsson

Project in Computational Science: Report

Jan 2017

PROJECTREPORT

Abstract

We apply methods used for parameter estimation in statistical ecology to financial models.Stochastic differential equations (with financial applications) with and without explicit solutions,and probability density functions, are used. Both the Approximate Bayesian Computation andintroduced Synthetic Rejection Markov chain Monte Carlo method perform well on a geometricBrownian motion model compared to the optimal Markov chain Monte Carlo method, whichuses a log-likelihood. We also test models with different properties; mean reverting (Vasicek),stochastic drift mean reverting (2-factor Hull-White), and multi-dimensional (Heston). Assome of these models have an intractable log-likelihood, there is no comparison for an optimalinference. The poor results for the last three models shed questions regarding method heuristics.The problems can be traced to the fact that the summary statistics used for geometric BrownianMotion did not summarize the other models well. We believe that use of the methods cannotbe assumed to be proper without further error analysis or method improvements.

1

Contents

1 Introduction 3

2 Financial Models 32.1 Geometric Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 The sign of the exponent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Vasicek Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Heston Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Two-factor Hull-White Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Methods 63.1 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.2 Metropolis-Hastings MCMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.3 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.4 Approximate Bayesian Computation . . . . . . . . . . . . . . . . . . . . . . . . . . 83.5 Data Cloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.6 Synthetic Rejection MCMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Results 104.1 Geometric Brownian Motion generated data . . . . . . . . . . . . . . . . . . . . . . 10

4.1.1 Error and Standard deviation dependence on parameters . . . . . . . . . . . 114.2 Data generated using other models . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.3 Real market data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.4 Convergence tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.4.1 ABC convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.4.2 SR MCMC convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5 Discussion 19

6 Conclusion 196.1 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

7 Acknowledgments 20

2

1 Introduction

Given an observed set of data, how much can be inferred regarding the underlying process? Arecurring problem in many fields of science such as genetics, ecology and finance [1]. In the casewhen the underlying function is well-behaved and deterministic one would use e.g. least squarefitting of a polynomial etc. However, when the underlying system is stochastic one has to resort toother methods. One such method for computationally solving this problem is Approximate BayesianComputation (ABC). Results from Blum et.al [2] show that in the ideal case and assumption of amodel error, ABC-simulations give an exact posterior, under the assumption that one has usedsufficient statistics. Another method is the Synthetic Rejection Markov Chain Monte Carlo (SRMCMC), a new method defined using the idea of synthetic likelihood proposed by Wood [3].

Markov Chain Monte Carlo (MCMC) methods are widely used as a well defined tool for findingthe posterior. The drawback of MCMC is the need for a likelihood function and for complexprobability models. In practice, they are either impossible or computationally prohibitive to obtain.Marjoram et. al [4] explores the situation when said likelihood function is not used with MCMC-likemethods with positive results for an example problem of ancestral inference in population genetics.

The reason for us to be looking into parameter estimation in finance is the stochastic models.In finance, stock prices can be viewed as stochastic processes, and are modeled daily. A well knownmodel for this is the Geometric Brownian Motion (GBM), an exponential growth process with astochastic parameter connected to a random variable. It is said that the Black-Scholes-Mertonequation is the most solved partial differential equation in the world as it is used to price optionwith the GBM as its underlying model. Today more complex models are used as the GBM doesnot capture all the dynamics of the price. But GBM is good enough to use for proof of concepts asmany of its properties are known.

The stochastic models used in finance, including GBM, are all general models and can be appliedon other problems outside finance. Therefore the ambition of an application in finance will not bejust to fit financial model but to also to be transferable to other areas as well.

The application for ABC in finance would either be in parameter estimation or in filtering.For the later, Jasra et.al [5] perform biased filtering for a Hidden Markov model using ABC whenmanaging a portfolio of financial securities. We aim to evaluate the possibilities of using ABC forthe former.

2 Financial Models

When choosing a model for your problem you need to think of what different features are needed e.g.exponential growth, mean reverting, jumps, or multi-dimensional dynamics. More features gives amore complex model and more parameters are needed. In finance there are some well developedmodels. We look at four different models:

• Geometric Brownian Motion

• Vasicek Model

• Heston Model

• Two-factor Hull-White model

What these models have in common is that they all are a stochastic differential equation (SDE)with one or multiple random variables. Consider the following equation,

dWt =√dtN (0,1). (1)

The term in equation (1) is known as the Wiener process and it is the driving stochastic measureused in our stochastic models.

2.1 Geometric Brownian Motion

Geometric Brownian Motion (GBM) is one of the most used model in financial mathematics tomodel stock prices. So why is that? It is fairly inexpensive computational-wise, at least comparedto a lot of other models, and also it is Markovian; meaning that in each discrete moment in time

3

the model is independent of the previous values. This means that we can split the data into smallersegments and still have complete GBM data. GBM is the stochastic differential equation (SDE)defined as

dXt = µXtdt+ σXdWt, (2)

where dWt is a Wiener process and µ and σ are the drift and variance constants.In equation (3) we get the analytic solution to (2) using Ito’s calculus, where we have X0 as the

initial value,

Xt = X0 exp

((µ− σ2

2)t+ σW (t)

). (3)

For the GBM we know the probability density function as

fXt(x;µ,σ,t) =

1√2π

1

xσ√t

exp

(−

(lnx− lnX0 − (µ− 12σ

2)t)2

2σ2t

). (4)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Time T

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

Price

Xt

Figure 1: Example of a price process generated with Geometric Brownian Motion. Y-axis showsthe value of the asset.

2.1.1 The sign of the exponent

Depending on the sign of (µ− 12σ

2) in equation (4), the resulting generated trajectory will eitherconverge, diverge or oscillate. This is important to know when generating the data and settingvalues for the stop time, T , because for small T this behavior will not be visible. For large T , adiverging time-series inference will not be computationally possible with any summary statistics(defined in Section 3.3.

2.2 Vasicek Model

The Vasicek model is an Ornstein-Uhlenbeck process and is sometimes used in finance to describethe evolution of interest rates [6],

drt = κ(µ− rt) dt+ σ dWt, (5)

where µ is the long term mean level, meaning in the long run r will be around µ, κ the speed ofreversion and σ is instantaneous volatility. The constant κ determines how long it takes before thetrajectories is evolving around µ, we set it equal to 1 because it is a hard to estimate. The Vasicekmodel has an explicit probability density function (PDF) but we did not perform the ML-estimationfor this model since the GBM model is behaviorally basic and therefore well fit for a comparison ofour methods.

4

500 1000 1500 2000 2500 3000 3500 4000 4500 5000

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

= 0.2, = 0.2, = 1

Figure 2: An example of a simulated Vasicek process using µ = 0.2, σ = 0.2, and κ = 1. The redline is µ on which the process is reverting to. Y-axis gives the value of the process.

2.3 Heston Model

The Heston model is similar to the GBM but instead of a constant volatility it is a CIR process(Cox-Ingersoll-Ross) that has its own stochastic behavior, namely,

dSt = µStdt+√vtStdW

St ,

dvt = κ(θ − vt)dt+ ξ√vtdW

vt .

(6)

The two Wiener processes in (6), (dWSt and dW v

t ) are correlated with a factor ρ, such thatdWS

t · dW v = ρdt.The Heston model assumes that vt is not inferable. In order to perform parameter estimation

we need to have no correlation between the two Wiener processes, dWSt and dW v

t . Therefore, byintroducing the relation

dW vt = ρdWS

t +√

1− ρ2dW γt , (7)

with a new Wiener process dW γt , which is uncorrelated to dWS

t , the problem of correlation isaddressed.

To make sure that vt is strictly positive, we don’t want any complex values, we have to makesure we obey the so-called Feller Condition:

2κθ > ξ2.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Time T

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

Price

Xt

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

Vo

latilit

y

t

Price

Volatility

Figure 3: Example of a price process generated by the Heston model. Left y-axis gives the value ofthe price and the right y-axis gives the value of the volatility.

5

2.4 Two-factor Hull-White Model

Like the Vasicek model, the Hull-White model is used to model interest rates. We chose to use thetwo-dimensional model called Two-factor Hull-White, the equation for which we see in (8),

dr(t) = [θ(t) + u− α(t)r(t)]dt+ σ1(t)dW1(t),

du(t) = −budt+ σ2(t)dW2(t).(8)

The du term in equation (8) is an additional stochastic process, that is mean-reverting aroundzero and acts as an extra disturbance term in the model. du also postulates the difference betweenthe Two-factor from regular One-factor Hull-White. In Figure 4a we have an example run usingthe Two-factor Hull-White model and Figure 4b is a short section of the du term from the samesimulation as in Figure 4a and we can clearly see that the value of it is around zero.

0 200 400 600 800 1000 1200 1400

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

(a) An example of generated two-factor Hull-White data

0 50 100 150

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

(b) Example of the zero mean-reverting volatilityterm in the two-factor Hull-White model

Figure 4: Example run of the Two-factor Hull-White model, where the x-axis is the time step andy-axis the value of the generated data respectively the zero mean-reverting process.

3 Methods

We investigate the following methods,

• Metropolis-Hastings Markov chain Monte Carlo,

• Approximate Bayesian Computation,

• Synthetic Rejection Markov chain Monte Carlo,

in regards to accuracy when estimating parameters to the models mentioned in Section 2. Wealso test convergence rates for ABC, and SR MCMC when increasing the number of simulatedtrajectories for the former and decreasing ε for the latter.

The MCMC method is used as an optimal method-reference, as it requires the least amount oftime and the most model-information (likelihood). The main method of interest is the ABC whichdraws θ from a set prior and basically uses brute force together with (9) to get an approximateposterior. Using some of the references [3, 4] we devise our own method and test it, named SRMCMC. The three methods share the same common ground; Bayesian inference.

3.1 Parameter Estimation

In Bayesian inference outcomes are described by its probabilities. Assume an example with twooutcomes, A and B, with equal probability. The prior is set. An experiment is run and statisticsare extracted from the outcomes of thousands of tries. The statistics show a set of A’s or B’s. Ifthe statistics showed that A is more frequent in the data than B then the probability of outcomeA,P (A), should be larger than the probability of outcome B, P (B). The critical point about theBayesian inference, is that it provides a principal way of combining new evidence with prior beliefs,

6

through the application of Bayes’ rule given in equation (9). In a more general setting, we imaginedata D generated from a model M with parameters θ, and the prior density π(θ). The posteriordistribution is then given by

P (θ|D) =P (D|θ)P (θ)

P (D)(9)

The simplest approach to compute P (θ|D) is a simple rejection method:

1 Generate θ from π(·).

2 Accept θ with probability h = P (D|θ); return to 1.

In the simple rejection method we assume that the likelihood P (D|θ) is known for the model M,which is something that is not always true. Therefore, the tested methods used work even whenthe likelihood is intractable.

3.2 Metropolis-Hastings MCMC

The Metropolis-Hasting Markov Chain Monte Carlo (MCMC) is one of the oldest and most usedalgorithms in stochastic modeling [7]. It is a very inexpensive algorithm computational-wise sincewe only need to do two things, propose new parameter values θ and compute the likelihood onthose parameter values. The rest are trivial tasks. In Algorithm 1 we see the full method.

However, the fact that we do need a likelihood function for this method is a major drawbackas it can be intractable for some models, introducing a potential bottleneck for further modelingdevelopment and is the reason we are exploring other alternatives.

Algorithm 1 MCMC Metropolis

1: Load original data g∗2: Guess initial θ [for GBM θ = θ(µ,σ)]3: Compute likelihood L of θ regarding g∗4: for i = 1,2.. do5: Suggest new θf based on some probability f6: Compute likelihood Lf of θf regarding g∗

7: Compute a = min[1,L(θf )f(θf |θ)L(θ)f(θ|θf ) ]

8: Draw a uniformly distributed number r ∼ U [0,1]9: if a ≥ r then

10: Set θ = θf11: Set L = Lf

12: end if13: end for

3.3 Summary Statistics

If a data set of length n is said to have n dimensions. Comparing two data sets of length n, meanscomparing two sets of n dimensions, which scales poorly as n increases. Therefore, a more efficientapproach is to summarize the data into a lower dimension m, where m << n. The comparisonis then between two data sets of dimension m instead of n. The new dimension m is given by msummary statistics (SS) of the data sets.

An SS can often be categorized into one of four types: location (e.g. mean or median), spread(e.g. standard deviation or range), shape (e.g. skewness or kurtosis) , and dependence (e.g. Pearsonproduct-moment correlation coefficient). Summarizing the data with one SS from each of thementioned categories could give a good set of summarizing data points for the entire data set.

If the SS summarizes the data perfectly, they are called sufficient statistics and are a perfectmapping from n to m dimensions without loss of any crucial information. The goal for choosing SSis to mimic sufficient statistics, but is often non-reachable.

Methods for discerning which SS to use have been proposed by Nunes et.al and Fearnhead &Prangle [8, 9] and further explored by Blum et.al [10]. However, no conclusive results were achieved

7

in testing using their method of using Kth nearest neighbor as a classifier of SS for our models.The SS presented in the results were thus chosen heuristically.

When comparing SS vectors in our methods. Relative weights are assigned to the elements inthe vectors, which allows for fair comparison between SS of different magnitude. The Euclideannorm, with the weights, then gives an absolute value of the distance between the two SS vectors.

3.4 Approximate Bayesian Computation

Approximate Bayesian Computation is a method that has increased in popularity in recent years.This is largely due to the availability of relatively powerful computers. The advantage of using ABCover MCMC, is the possibility of using SS instead of a likelihood [11]. What makes this methodvery computationally heavy is that we need to simulate a lot of data and compute SS on them.

Algorithm 2 ABC algorithm

1: Load original data g∗ and compute summary statistic S∗2: Draw θi where i = 1,2.. from prior distribution3: Simulate one set of data gi for each parameter θi4: Compute summary statistic Si of gi5: Calculate norm ρi = ||Si − S∗||6: Option 1: {θi|ρi < ε} should follow true posterior f(θ) for small ε7: Option 2: Sort ρi in ascending order and keep top proportion

In Algorithm 2 we see the methodology behind ABC, where we first compute the SS of theoriginal dataset, then we continue to draw random parameters and simulate dataset using thoseparameters and compute the SS on those datasets. There are two different ways to accept or rejectparameter values, with option one being to a set value ε, where ε << 1, and accepting every ρismaller than ε, continuing until we have accepted enough parameter values. The second option isto sort every ρi in ascending order after we have ran all simulations and then choose to keep thesmallest values as accepted values and discard the rest.

Our opinion is that option 2 is the preferred way because when choosing the SS it can be hardto set a good value for ε. We also always know roughly how long a simulation will take whereaswith option 1 it can be difficult to estimate how many simulations we need to do in order to get asufficient number of accepted parameter proposals. Figure 5 shows a plot of the accepted proposalsfor different ε allowing one to easily switch between using top x% and ε.

0 0.2 0.4 0.6 0.8 1 1.2

Accepted [%]

0

0.05

0.1

0.15

0.2

0.25

0.3

Figure 5: Percentage accepted proposals for different ε. This depends on the SS used. We used anacceptance of approximately 0.1 %

8

3.5 Data Cloning

Data cloning is another method to use when we have an intractable or computationally heavylikelihood function. It is a relatively new method introduced by Lele et al. [12]. We take the originaldata series and divide it up to K intervals meaning we end up with a lot of short intervals. We canstill assume our model, with the same parameter values, is valid on each of these intervals. This isdue to the fact that the stochastic Wiener process is a martingale, meaning that it does not dependon any future values at all. We then compute parameter values.

3.6 Synthetic Rejection MCMC

Wood [3] proposed the idea to approximate summary statistics for the data as

s ∼ N (µθ,Σθ), (10)

where µθ is the multivariate mean and Σθ the covariance. The validity of the approximation madein (10) can be checked per SS using e.g. the Jarque-Bera test. A synthetic log likelihood can beconstructed and, when negligible variables are dropped, can be written as

ls(θ) = −1

2(s− µθ)T Σ−1θ (s− µθ)−

1

2log |Σθ|, (11)

where

µθ =∑i

s∗i /Nr, (12)

S = (s∗1 − µθ,s∗2 − µθ, . . .), (13)

Σθ = SST /(Nr − 1), (14)

and Nr is the number of data clones. As Nr → ∞ the limit ls behaves like a conventional loglikelihood.

Marjoram et al. [4] give a framework on how to combine ABC with MCMC, calling it approximateMCMC (AMCMC). Still present in AMCMC is a general way of constructing the probability inwhich you accept proposed parameters θ. Combining AMCMC with the synthetic likelihood wename the method Synthetic Rejection MCMC.

In SR MCMC one proposes new parameters θ′ using the transition kernel as in MCMC. Usingproposed θ′ you simulate data and compute its summary statistics. If the simulated data is not closeenough to the original data; measured from the summary statistics, propose a new θ. These stepsresemble the ABC algorithm. The following steps are as in MCMC, with the small difference thatthe likelihood function is interchanged with the synthetic likelihood function. The full algorithmfollows:

9

Algorithm 3 SRMCMC

1: Load original data g∗2: Propose initial θ∗3: Compute summary statistics S∗ for g∗4: while i < N do5: Propose new θi6: Generate data gi using θi7: Dataclone gi → Gi8: Compute summary statistics Si for set of clones Gi9: if ||S∗ − Si|| < ε then

10: Compute synthetic log-likelihood ls(θi)11: Draw a uniformly distributed number h ∼ U [0,1]12: if h < min(1, exp{(ls(θi)− ls(θ∗))} then13: Set θ∗ = θi and S∗ = S(gi)

14: end if15: end if16: end while

4 Results

To evaluate the methods, a number of numerical test were performed.

• Estimating the parameters of our methods on synthetic data from the previously definedmodels, where the parameters are controlled and assumed ideally in time.

• Test convergence rate of the methods. We should achieve an accuracy of 1√N

where N is the

number of simulated input data points.

• Test the methods on real market data, where the parameters are not known.

4.1 Geometric Brownian Motion generated data

Data is generated as geometric Brownian motion with θ = [µ,σ]. We then try and infer theparameter set θ with: MCMC, SR MCMC, and ABC. The resulting posterior is given in Figure6. The axes in the figure are the parameters µ and σ. In the figure we see three scatter clouds;blue, red, and black. They correspond to the posterior of the three methods, ABC, SR MCMC,and MCMC. The clouds are also encircled by the best fitted ellipse using a principal componentanalysis of the posterior. We see that the accuracy of MCMC is better than ABC, and SR MCMCis in between the other two, because their ellipses have smaller minor axis.

The distribution of the posteriors are also visualized using the histogram in Figure 7. The redline in the subplots are the exact value for µ and σ. In the figure we see what the ellipses areshowing in Figure 6, that the standard deviation is smallest for σ for all methods, but increasingfrom MCMC, to SR MCMC and ABC. But for µ the standard deviation is almost the same for allmethods, which is represented in the ellipses as their width µ are almost the same only centeredaround different values.

In Table 1, the resulting mean and standard deviation for all methods and parameters: µ and σ,are presented. For reference, the exact values for both parameters are included.

10

-0.4 -0.2 0 0.2 0.4 0.6

0.14

0.16

0.18

0.2

0.22

0.24

0.26

MCMC

SR MCMC

ABC

Figure 6: Resulting posterior approximation from a simulation of Geometric Brownian motion. Allmethods are used; MCMC, ABC with Ntot = 106 and Nchosen = 103, SR MCMC with ε = 0.1 andclone length 10.

-0.5 0 0.5

mcmc

0

200

400

0.19 0.2 0.21 0.22

mcmc

0

200

400

-0.2 0 0.2 0.4

sr mcmc

0

200

400

0.15 0.2 0.25

sr mcmc

0

100

200

-0.5 0 0.5

abc

0

200

400

0.15 0.2 0.25

abc

0

100

200

Figure 7: Posterior approximation from a simulation of Geometric Brownian motion. Using MCMC,ABC with Ntot = 106 and Nchosen = 103, SR MCMC with ε = 0.1 and clone length 10. The redlines represent the exact value for each parameter.

Table 1: Mean and standard deviation for the posterior approximation in Figure 7

µ std µ σ std σExact 0.1 - 0.2 -MCMC 0.081066 0.092986 0.20322 0.0039409SR MCMC 0.091154 0.085623 0.20217 0.0097663ABC 0.06864 0.090721 0.20083 0.018272

4.1.1 Error and Standard deviation dependence on parameters

Using ABC it is possible to perform several ABC estimations for an array of µ and σ pairs. Usingthis we can evaluate the consistency and the error of the method depend on the parameter values.We generated surfaces dependent on the original parameters, on the µ- and σ-axis are the originalparameters.

11

0

0.3

0.5

1

0.6

1.5

0.2

2

Percentage error in

2.5

0.4

3

0.10.2

0 0

0.5

1

1.5

2

2.5

Figure 8: Generated parameters using N = 2 ·105 in ABC and taking the average of seven runs. Theerror is calculated by subtracting the original parameter values from the estimation and dividing bythe original parameters.

0

0.3

0.02

0.04

0.60.2

0.06

Percentage error in

0.08

0.4

0.1

0.10.2

0 0

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Figure 9: Generated parameters using N = 2 ·105 in ABC and taking the average of seven runs. Theerror is calculated by subtracting the original parameter values from the estimation and dividing bythe original parameters.

12

0

0.3

0.5

1

0.6

1.5

0.2

2

Relative standard deviation in

2.5

0.4

3

0.10.2

0 0

0.5

1

1.5

2

2.5

Figure 10: Generated parameters using N = 2 · 105 in ABC and taking the average of seven runs.The standard deviation is calculated on the seven different ABC estimations. It is then divided bythe original parameter to normalize the standard deviation.

0.04

0.3

0.06

0.08

0.60.2

0.1

Relative standard deviation in

0.12

0.4

0.14

0.10.2

0 0

0.06

0.07

0.08

0.09

0.1

0.11

0.12

0.13

Figure 11: Generated parameters using N = 2 · 105 in ABC and taking the average of seven runs.The standard deviation is calculated on the seven different ABC estimations. It is then divided bythe original parameter to normalize the standard deviation.

4.2 Data generated using other models

After completing the test on GBM we continue and see how the methods, SR MCMC and ABC,perform on the other models presented in Section 2. We don’t test any new summary statistics forthese models so we used the same ones as for the GBM and the results may suffer due to this.

From Figure 12 to Figure 14 we present histogram and tables with parameter values for theother models we discussed in section 2. The red line in the histograms is the real value used and inthe tables we have the mean value and the standard deviation for every parameter.

13

Vasicek model

Figure 12 shows histograms for the estimated parameters in Vasicek model. The histograms showsthat κ is hard to estimate with the methods ABC and SR MCMC. The methods give a posteriorfor µ and σ that is more precise. The mean are close to the actual value and the standard deviationis fairly concentrated as seen in the histogram. In Table 2 the mean and standard deviation for theestimated parameters are presented.

0 0.2 0.4

sr mcmc

0

100

200

300

0 0.1 0.2

sr mcmc

0

200

400

600

0 200 400

sr mcmc

0

200

400

0 0.2 0.4

abc

0

200

400

600

0 0.1 0.2

abc

0

100

200

300

0 100 200

abc

0

100

200

Figure 12: Posterior approximation from a simulation with Vasicek model. Using ABC withNtot = 106 and Nchosen = 103, SR MCMC with ε = 0.1 and clone length 10. The red lines representthe exact value for each parameter.

Table 2: Mean and standard deviation for the posterior approximation in Figure 12.

µ std µ σ std σ κ std κExact 0.1 - 0.06 - 1 -SR MCMC 0.19093 0.054533 0.074065 0.031122 87.111 69.97ABC 0.11942 0.037155 0.095862 0.014993 32.546 29.643

Heston model

Figure 13 shows histograms for the estimated parameters in the Heston model. We see that bothSR MCMC and ABC are able to estimate µ and κ with their mean close to the exact value with asmall standard deviation. ABC fails to estimate both σ and ξ with the resulting posterior beingthe prior. SR MCMC estimates values for σ and ξ but the mean values are offset from the exactvalue. In Table 3 the mean and standard deviation for the estimated parameters are presented.

14

-0.5 0 0.5

sr mcmc

0

100

200

300

0 0.1 0.2

sr mcmc

0

500

1000

0 10 20

sr mcmc

0

100

200

300

0 1 2

sr mcmc

0

200

400

0 0.5

abc

0

100

200

300

0 0.5 1

abc

0

500

1000

0 10 20

abc

0

50

100

150

0 0.5 1

abc

0

50

100

150

Figure 13: Posterior approximation from a simulation with Heston model. Using ABC withNtot = 106 and Nchosen = 103, SR MCMC with ε = 0.1 and clone length 10. The red lines representthe exact value for each parameter.


µ std µ κ std κ σ std σ ξ std ξExact 0.1 - 0.2 - 0.4 - 0.3 -SR MCMC 0.10049 0.089609 0.061222 0.018851 8.1873 3.0781 0.29975 0.22192ABC 0.15241 0.084341 0.072749 0.085686 9.6215 5.8978 0.48014 0.28821

Hull-White model

The histograms in Figure 14 show the accepted values for each parameter, SR MCMC performspretty well for most of the parameters while ABC struggles more, especially with κ and σ2. InTable 4 we see the exact value and the mean and standard deviation with SR MCMC and ABC foreach parameter.

-5 0 5

sr mcmc

0

100

200

300

0 0.2 0.4

sr mcmc

0

100

200

300

0 10 20

sr mcmc

0

200

400

600

0 0.5 1

sr mcmc b

0

100

200

0 5 10

sr mcmc 2

0

500

1000

0 5

abc

0

100

200

300

0 0.2 0.4

abc

0

100

200

300

0 10 20

abc

0

100

200

300

0 1 2

abc b

0

50

100

150

0 10 20

abc 2

0

50

100

150

Figure 14: Posterior approximation from a simulation with Hull-White model. Using ABC withNtot = 106 and Nchosen = 103, SR MCMC with ε = 0.1 and clone length 10. The red lines representthe exact value for each parameter.

15


µ std µ σ std σ κ std κ b std b σ2 std σ2Exact 0.1 - 0.2 - 1 - 0.2 - 0.05 -SR MCMC 0.68 0.77 0.25 0.06 3.22 2.97 0.55 0.26 0.51 0.83ABC 3.13 1.04 0.22 0.07 14.01 4.45 0.50 0.29 10.39 5.79

4.3 Real market data

After testing the methods on synthetic data, were the parameters can be fully controlled, wecontinue to perform tests on real market data. Stock price data is chosen for Apple Inc (Figure 15a)and Amazon.com Inc (16a) on the interval Dec 2015 - Dec 2016. By assuming that the stock pricesfollow a GBM model we can apply MCMC, ABC, and SR MCMC. As mentioned, in synthetic datawe can control the model parameters, but in the real market data that does not hold any longer.We can however, get the market accepted σ as it is used to price some financial derivatives tradedon the market. Either one calculates σ from the traded price of these derivatives, or gets the datafrom vendors who do this for you. We chose the latter and took data from Ivolatility.com.

In Figure 15b the approximate posterior for the different methods, for Apple price data, arepresented. In three different colors and symbols the methods are separated. ABC and SR MCMCare close to each other but MCMC is more dense than the other two. We see similar results inFigure 16b for Amazon price data. In Table 5 and Table 6 we see the mean and standard deviationfor the estimated parameters. For both datasets, Apple and Amazon, the respective mean is closeto each other for all methods. SR MCMC is a bit off for µ from the other methods. True forboth parameters and datasets, the standard deviation for MCMC is smaller than for ABC and SRMCMC.

50 100 150 200 250

Time

90

95

100

105

110

115

Sto

ck p

rice

(a) Apple stock price, Dec 2015 - Dec 2016

-1 -0.5 0 0.5 1

0.15

0.2

0.25

0.3

0.35

MCMC

SR MCMC

ABC

(b) Results on Apple stock data assuming Geo-metric Brownian Motion, ABC Ntot = 106 andNchosen = 103, SR MCMC ε = 0.1 and clonelength 10.

Figure 15: Parameter estimation on the Apple stock.

Table 5: Estimated mean and standard deviation of the parameters estimated by MCMC, ABC,and SR MCMC on using data from the Amazon stock price from Dec 2015 - Dec 2016

µ std µ σ std σIVol1 - - 0.2355 -MCMC 0.0630 0.0181 0.2384 0.0006SR MCMC 0.0198 0.2131 0.2403 0.0143ABC 0.0819 0.2448 0.2349 0.0267

1IVolatlity - A website which facilitates market traded volatility.

16

Ivolatility.com

50 100 150 200 250

Time

500

550

600

650

700

750

800

Sto

ck p

rice

(a) Amazon stock price, Dec 2015 - Dec 2016

-1 -0.5 0 0.5 1 1.5

0.25

0.3

0.35

0.4

0.45

MCMC

SR MCMC

ABC

(b) Results on Amazon stock data assuming Ge-ometric Brownian Motion, ABC Ntot = 106 andNchosen = 103, SR MCMC ε = 0.1 and clonelength 10.

Figure 16: Parameter estimation on the Amazon stock.

Table 6: Estimated mean and standard deviation of the parameters estimated by MCMC, ABC,and SR MCMC on using data from the Amazon stock price from Dec 2015 - Dec 2016

µ std µ σ std σIVol - - 0.2779 -MCMC 0.1925 0.0213 0.2972 0.0008SR MCMC 0.0685 0.2653 0.3109 0.0197ABC 0.1758 0.2954 0.2985 0.0185

4.4 Convergence tests

The convergence of the two methods is examined and presented in this subsection of the report.For ABC we check the absolute error and standard deviation as a function of number of simulatedparameters. Convergence in SR MCMC is evaluated for a decreasing ε and decreasing length of theused clones.

4.4.1 ABC convergence

Convergence for the ABC is presented in Figure 17, where we see the size of the behaviors of theerror in mean and standard deviation for µ and σ over the number of simulated datasets. All thesubfigures seem to be converging with

√N rate. As we also can see, the error in µ is decreasing at

first but and stays at a value of 0.05, whereas in the subfigure for σ it goes down to zero. Thisresult is expected though, because compared to Figure 6 we see that estimating µ is harder than σ.The standard deviation is, as expected, decreasing as the number of simulated data increases.

17

105

106

107

N

10-5

10-4

10-3

10-2

Err

or

Error Convergence

sqrt(N)

(a) Convergence in µ

105

106

107

N

10-6

10-5

10-4

10-3

10-2

10-1

Err

or

Error Convergence

sqrt(N)

(b) Convergence in σ

105

106

107

N

0.09

0.095

0.1

0.105

0.11

0.115

0.12

Err

or

(c) Standard deviation in µ

105

106

107

N

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Err

or

(d) Standard deviation in σ

Figure 17: Convergence over increasing Nsimulated

4.4.2 SR MCMC convergence

The resulting posterior computed with SR MCMC is dependent on the threshold ε. We test theresulting error and standard deviation when estimating θ for a generated model using GBM.

As the resulting error is stochastic the estimation is performed multiple times, here 15 times,and the resulting mean error (18a) and standard deviation (18b) is presented in Figure 18. What wesee in Figure 18 is that the error gradient is steeper when ε > 1. The standard deviation continuesto improve for σ but not µ after ε = 1. We see in the far left in Figure (18b) that the mean of thestandard deviation for σ seems to converge to its lowest value after ε = 0.4.

10-1

100

101

10-2

10-1

Ab

so

lute

err

or

mean

mean

(a) Mean convergence in the error for µ and σover 15 independent runs.

10-1

100

101

10-2

10-1

Sta

ndard

devia

tion

mean std

mean std

(b) Mean convergence in the standard deviationfor µ and σ over 15 independent runs.

Figure 18: Error convergence of SR MCMC for ε.

18

5 Discussion

The ABC and SR MCMC methods can be improved to a similar accuracy when compared to alikelihood-model (see Figure 6) for a GBM-model. We see for the three following models thatthe chosen SS make a poor job estimating some aspects. This leads to some parameters beingreadily found and others are being estimated poorly. Whether or not this can be seen as acceptablebehavior for a algorithm is up to the reader to decide, but consistency cannot be expected.

The choice of SS is still dependent on heuristics and no method that we tested could performclose to acceptably without heuristics [8].

Some tests, not included here, were performed replacing drawing parameters randomly from theprior by implementing pseudo-random numbers. This did not result in anything useful and wastherefore excluded from the report.

A comment for our methods, mainly focused on the GBM, is that in our simulated data wepurposely chose θ close to what you expect to find on real market place data. This meant choosinga deterministic drift, µ in GBM, close to 0. As the methods then estimate µ they struggle to keepthe sign of the parameter giving an almost null posterior, as the parameter can be either positiveor negative. To manage this, saying that we want to try to get the posterior to be either positiveor negative by sign, we saw that we needed to increase the end time T . By restricting us to onlyview µ over a specific lower boundary also worked, but that limits the possible scenarios where themethods would be applicable.

From Figures 8 to 11 one can get an idea of what accuracy and consistency of ABC inferenceof GBM parameters. We see that the error in µ and σ increase linearly with σ, with the error inµ approximately a factor ten larger. The standard deviation of parameters estimations betweenABC-runs increases linearly for estimating σ. An interesting trend is that the standard deviation inµ is highly non-linear, with the minimum of the standard deviation of µ being a non-linear curve.

As we continue on with ABC and SR MCMC what should be kept in mind is the possiblestacking of errors and uncertainties. We started off our work with GBM which can be seen as asimple model, with linear drift and linear noise. Regular MCMC can then be seen as the minimumlevel of error we can achieve with the other methods. The difference we see there should be regardedas a posterior inaccuracy measurement of optimal MCMC versus the likelihood-free models. Whenwe then apply our methods on more complex models with higher parameter dimensionality weshould expect that inaccuracy to increase further.

When looking at our implementation on the real market data we see in Table 5 and Table 6that the SR MCMC mean estimate is a bit off from both MCMC and ABC. The reason for thiscould be that when looking at real market data the approximation of GBM falls short on the factthat it is not cloneable. Meaning that if the yearly data is split up into weeks, they might actuallybe correlated with each other and not a martingale, leading to a false estimate.

In Figure 15b and Figure 16b we see the potential accuracy of MCMC. Both ABC and SRMCMC performed as in Figure 6 but not MCMC, is able to find a mean estimate for µ, and notonly σ, with a small standard deviation. The reason for this is unclear to us.

Regarding convergence, ABC follows the expected convergence of 1√N

. The test on SR MCMC

includes some other difficulties. Decreasing ε changes number of clones and length of the cloneswould all affect the error. The two later are dependent on the data; how long clone can be allowedto be before it stops being describing enough of the data. The error is thus tested for a decreasingε, as it should improve the resulting posterior the most. The result is similar to ABC, the errordecreases and stops improving after a limit value of ε. Further testing should be done on SRMCMC before claiming that it converges as fast as either ABC or MCMC. The presented resultsare promising and encourage further work.

6 Conclusion

Using these methods we have been able to achieve a methodological parameter inference andconvergence of the GBM model. Changing models from the GBM gave comparatively poor resultsfor the new parameters, this would lead us to believe that the initial success is founded on heuristicshence encouraging the findings of quantitative methods of defining SS. Without the MCMC methodto compare with it is hard to determine the error of the methods.

One way to strengthen the backbone of these methods would be to create a theory estimating theerror by using statistics. Another way would be to create robust methods to replace the heuristics,

19

that do not give any uncertainty during the process.

6.1 Further work

In order to improve on our work we propose a number of possible steps.

Fitness Explore methods to rank fitness of SS similar to the one proposed by Nunes et al. To aidthe method by removing part of the heuristics. It would help to reliably reach some form ofoptimal summary statistics.

Automate choice of SS Define machine learning algorithms to find suitable SS for each model,to make sure one has sufficient statistics.

Model validation Develop a method to assess and find models given computed posterior. Thiswould seen as an investigation into the inverse model selection problem.

Parallelism Write the code in C to allow for parallelism, since only small parts are done in serial.

7 Acknowledgments

This project was supervised by Stefan Engblom and Josef Hook in the course: Project in Computa-tional Science, at Uppsala University (Department of Information Technology).

20

References

[1] Beaumont, Mark A., ”Approximate Bayesian computation in evolution and ecology.”, Annualreview of ecology, evolution, and systematics 41 (2010): 379-406.

[2] Blum, Michael GB, and Olivier Francois. ”Non-linear regression models for Approximate BayesianComputation.” Statistics and Computing 20.1 (2010): 63-73.

[3] Simon N. Wood, Statistical inference for noisy nonlinear ecological dynamic systems, NatureVol 466, August 2010.

[4] P. Marjoram, J. Molitor, et al., Markov chain Monte Carlo without likelihoods, PNAS December23 vol 100, December 2003.

[5] Jasra, Ajay and Singh, Sumeetpal S. and Martin, James S. and McCy, Emma, ”Filtering viaapproximate Bayesian computation.”, Statistics and Computing 22.6 (2012): 1223-1237.

[6] Oldrich Vasicek, An equilibrium characterization of the term structure, Journal ofFinancial Economics, Volume 5, Issue 2, 1977, Pages 177-188, ISSN 0304-405X,http://dx.doi.org/10.1016/0304-405X(77)90016-2.

[7] W. K. Hastings; Monte Carlo sampling methods using Markov chains and their applications.Biometrika 1970 97-109. doi: 10.1093/biomet/57.1.97

[8] Nunes, Matthew A., and David J. Balding. ”On optimal selection of summary statistics forapproximate Bayesian computation.” Statistical applications in genetics and molecular biology9, no. 1 (2010).

[9] Fearnhead, Paul, and Dennis Prangle. ”Constructing summary statistics for approximateBayesian computation: semiautomatic approximate Bayesian computation.” Journal of theRoyal Statistical Society: Series B (Statistical Methodology) 74, no. 3 (2012): 419-474.

[10] Blum, Michael GB, Maria Antonieta Nunes, Dennis Prangle, and Scott A. Sisson. ”A compara-tive review of dimension reduction methods in approximate Bayesian computation.” StatisticalScience 28, no. 2 (2013): 189-208.

[11] Sunnaker M, Busetto AG, Numminen E, Corander J, Foll M, et al. (2013) Approximate BayesianComputation. PLOS Computational Biology 9(1): e1002803. doi: 10.1371/journal.pcbi.1002803

[12] R. Lele, B.Dennis and F. Lutscher, Data cloning: easy maximum likelihood estimation forcomplex ecological models using Bayesian Markov chain Monte Carlo methods, Ecology Letters(2007) 10: 551-563.

21

Aleksander Senek, Robin Eriksson, Viktor Mattsson Project ... · 2.2 Vasicek Model The Vasicek...

Documents

Transcript of Aleksander Senek, Robin Eriksson, Viktor Mattsson Project ... · 2.2 Vasicek Model The Vasicek...