Machine Learning on EPEX Order Books: Insights and Forecasts · Fraunhofer-Platz 1, 67663...

Machine Learning on EPEX Order Books:

Insights and Forecasts

S. Schnurch∗ and A. Wagner

Fraunhofer ITWM, Fraunhofer Institute for Industrial Mathematics ITWM,Fraunhofer-Platz 1, 67663 Kaiserslautern, Germany

September 6, 2019

Abstract

This paper employs machine learning algorithms to forecast German elec-tricity spot market prices. The forecasts utilize in particular bid andask order book data from the spot market but also fundamental marketdata like renewable infeed and expected demand. Appropriate feature ex-traction for the order book data is developed. Using cross-validation tooptimise hyperparameters, neural networks and random forests are pro-posed and compared to statistical reference models. The machine learningmodels outperform traditional approaches.

Keywords: Machine Learning, Neural Networks, Random Forests, Elec-tricity Market, Renewables, Spot Price, Forecasting.

1 Introduction

Forecasting electricity prices is an important task in an energy utility and needednot only for proprietary trading but also for the optimisation of power plantproduction schedules and other technical issues. A promising approach in powerprice forecasting is based on a recalculation of the order book using forecasts onmarket fundamentals like demand or renewable infeed. However, this approachrequires extensive statistical analysis of market data. In this paper, we examineif and how this statistical work can be reduced using machine learning. Ourpaper focuses on two research questions:

• How can order books from electricity markets be included in machinelearning algorithms?

• How can order-book-based spot price forecasts be improved using machinelearning?

We consider the German/Austrian EPEX spot market for electricity. Thereis a daily auction for electricity with delivery the next day. All 24 hours of theday are traded as separate products. Figure 1 shows auction results on different

∗Corresponding author: [email protected]

1

arX

iv:1

906.

0624

8v3

[st

at.A

P] 5

Sep

201

9

[email protected]

Figure 1: Example electricity price time series on different time scales.

Figure 2: Order book data for a particular hour on different scales.

time scales. The pronounced seasonality of prices is visible as well as their highvolatility.1

In the following, we shortly explain the idea of order-book-based price fore-casts. Each price is the result of an auction, which can be represented as a bidand an ask curve. For a particular hour, those curves are shown in Figure 2. Theintersection of the bid (purchase, demand) and ask (sell, supply) curve is themarket clearing price (MCP). In the magnified figure, it is clearly visible thatthe bid and ask curves are step functions. Each step width is the cumulated vol-ume which market participants have put in the auction at a certain price. Pricelevels correspond in fact to the marginal production costs of different powerplants. Due to the regulatory environment, in particular renewables bid at neg-ative prices in the auction. Moreover, in contrast to a classical power plant, theproduced amount of renewable energy is stochastic and total expected produc-tion is sold on the exchange. Relying on those economical circumstances, theorder-book-based forecasting modifies the volumes at different price levels in thebid and ask curves. The modifications correspond to the forecasted wind andsolar power infeed. An important issue is which price levels are influenced bythe renewable infeed. Usually, energy utilities use exhaustive statistical analysison historical data to identify the price levels and the impact of the renewableforecasts. In fact, there are also other fundamental factors which influence themarket price, first of all the expected electricity demand. This paper focuses onmachine learning methods to reduce the effort for building a forecast model.

In the following section we give an overview on existing literature on theeconomics of electricity markets, order-book-based models and the use of ma-chine learning in price forecasting. In Section 3 we detail our methodology.Section 4 is devoted to numerical results and a comparison to other modelsfrom the literature. Section 5 concludes.

1Another interesting property is that in contrast to price series of other commodities orstocks, electricity prices may become negative.

2

Figure 3: Influence of wind and solar infeed on price.

2 Existing literature on price forecasting andmachine learning in electricity markets

Solar and wind energy is playing a more and more prominent role in today’selectricity markets. Empirical studies show that renewable electricity genera-tion is both highly volatile and has a substantial impact on the day-ahead elec-tricity price (Wagner (2014)). Using multivariate regression methods, variousauthors have quantified the influence renewable infeed has on the price (Cludius,Hermann, Matthes, and Graichen (2014); Wurzburg, Labandeira, and Linares(2013)). Higher renewable infeed generally leads to lower market prices. Theimpact on the hourly prices is demonstrated separately by typical profiles forwind and solar energy in Figure 3. Therefore, we also use expected solar andwind infeed as features for the price forecasts.

There is a vast body of literature on electricity price forecasting, over whichAggarwal, Saini, and Kumar (2009) give an early overview. Their survey covers47 papers published between 1997 and 2006 with topics ranging from gametheoretic to time series and machine learning models. A more recent extensiveliterature overview is given by Weron (2014), in which the author distinguishesand describes five model classes for electricity price forecasting, namely game-theoretic, fundamental, reduced-form, statistical and machine learning models.The article closes with a discussion of future challenges in the field, including theissues of feature selection, probabilistic forecasts, combined estimators, modelcomparability and multivariate factor models. Regarding this last aspect, Zieland Weron (2018) conduct an empirical comparison of different univariate andmultivariate model structures for price forecasting. Comparing a total of 58models on several datasets, they find that there is no single modelling frameworkthat consistently achieves the best results.

Statistical methods which have been applied to price forecasting include,for example, dynamic regression and transfer functions (Nogales, Contreras,J. Conejo, and Espinola (2002)), wavelet transformation followed by an ARIMAmodel (Conejo, Plazas, Espinola, and Molina (2005)) and weighted nearestneighbor techniques (Troncoso, Santos, Gomez-Exposito, Martinez-Ramos, andRiquelme (2007)). There are many applications of machine learning methodsin electricity price forecasting. Amjady (2006) compare the performance of afuzzy neural network with one hidden layer to ARIMA, wavelet-ARIMA, mul-tilayer perceptron and radial basis function network models for the Spanishmarket. Chen et al. (2012) also use a neural network with one hidden layerand a special training technique called extreme learning machine on Australian

3

data. On the same market, Mosbah and El-Hawary (2016) train a multilayerneural network on temperature, total demand, gas price and electricity pricedata of the year 2005 to predict hourly electricity prices for January 2006. Inorder to show the superior performance of neural networks compared to timeseries approaches, Keles, Scelle, Paraschiv, and Fichtner (2016) conduct an ex-tensive study focussing on the important topics of variable selection and hyper-parameter optimisation. They select the most predictive features via a k-nearestneighbor backward elimination approach and employ 6-fold cross-validation tooptimise forecasting performance over several hyperparameters of the neuralnetwork. The resulting network is found to outperform the benchmark mod-els substantially. Recently, more sophisticated types of neural networks havebeen used: In a benchmark study, Lago, Ridder, and Schutter (2018) comparefeed-forward neural networks with up to 2 hidden layers, radial basis functionnetworks, deep belief networks, convolutional neural networks, simple recur-rent neural networks, LSTM and GRU networks to several statistical and alsoto other machine learning methods like random forests and gradient boosting.Using the Diebold-Mariano test, they show the deep feed-forward, GRU andLSTM network approaches to perform significantly better than most of theother methods on Belgium market data. Marcjasz, Uniejewski, and Weron(2018) consider a non-linear autoregressive (NARX) neural network-type modelwhich especially accounts for the long-term price seasonality. Also using theDiebold-Mariano test, they show that this approach can improve the accuracyof day-ahead forecasts relative to the corresponding ARX benchmark.

Among the features considered in the aforementioned studies historical elec-tricity prices, total demand series, total demand prognoses, renewable infeedforecasts, weather data and calendar information appear on a regular basis. Onthe other hand, to the best of our knowledge, the first to use supply and demandcurves for price prediction are Ziel and Steinert (2016). Their goal is to fill thegap between time series analysis and structural analysis by setting up a timeseries model for these curves and then forecasting the future market clearingprice as the intersection of the corresponding forecasted curves. They comparemultiple time series prediction methods based on this approach. However, theydo not investigate whether the performance of their model can be enhanced bymachine learning techniques.

3 Methodology

Data preparation and feature extraction from order book Our datasetranges from 1/2/2015 to 18/9/2018 (31823 single auctions) and includes orderbook data from the EPEX German/Austrian electricity spot market, trans-parency data from EEX on expected wind and solar power infeed, and ex-pected total demand data from ENTSO-E. To avoid data dredging, 20% (about9 months) of the available data at the end of the time period are held back foran out-of-sample model evaluation (see Section 4).

For feature extraction, i.e., translating the order book into a vector of num-bers, we use ideas from Coulon, Jacobsson, and Strojby (2014) and Ziel andSteinert (2016). Let P := {−500,−499.9, . . . , 2999.9, 3000} be the set of pos-sible prices and T := {t1, t2, . . . , tT−1, tT } the set of time points for whichthere are data available. Each t = (d, h) ∈ T is a tuple consisting of a date

4

d ∈ {1/2/2015, 2/2/2015, . . . , 18/9/2018} and an hour h ∈ {0, 1, . . . , 23}. Werepresent the supply and demand data at time t ∈ T as vectors

(V St (P )

)P∈P

and(V Dt (P )

)P∈P, where V S

t (P ) and V Dt (P ) denote the supply and demand vol-

ume, respectively, bid at price level P ∈ P. The market clearing price at timet is determined by EPEX via the EUPHEMIA algorithm, which also considerscomplex orders. There is no information about such orders in our dataset, soit would be unreasonable to expect any learning algorithm to incorporate theminto its price prediction. Therefore, we calculate the market clearing price thatwould result from considering only the available supply and demand data anduse this as the target value for price prediction. To this end, we define theso-called supply and demand curves

St(P ) :=∑

p∈P, p≤PV St (p) and (1)

Dt(P ) :=∑

p∈P, p≥PV Dt (p). (2)

The MCP P ∗t lies at the intersection of the supply and demand curves. AsSt and Dt are step functions, explicit formulae for P ∗t are quite technical andtherefore omitted. We refer to Figure 2 for a graphical illustration. To reducethe dimensionality, we partition P into price classes and use the volumes perprice class as features. To determine these classes we use a heuristic which aimsto achieve that they all contain approximatively the same amount of volume onaverage. This ensures that there are more price classes at the interesting partsof the curve, i.e., in the price regions with many bids. We begin by averagingthe supply curves over all time points, obtaining

S(P ) :=1

T

∑t∈T

St(P ) for P ∈ P. (3)

Then, we fix a volume V∗ > 0 that each price class is supposed to contain onaverage and partition P accordingly. We compute the number of classes,

MS := min{i ∈ N : iV∗ > S(3000)

}, (4)

and set

cSi :=

{min

{P ∈ P : S(P ) ≥ iV∗

}for i = 0, . . . , MS − 1

3000 for i = MS. (5)

Now, we define the classes

CSi :=

{[cSi−1, c

Si ] for i = 1

(cSi−1, cSi ] for i = 2, . . . , MS

, (6)

where we take (c, c] or [c, c] to denote the singleton {c}. By removing intervalsthat occur multiple times, we get pairwise different price classes CS

i = (cSi−1, cSi ]

as well as corresponding volume features

V St (CS

i ) :=∑

P∈CSi

V St (P ) for i ∈ {1, . . . ,MS}, (7)

5

Figure 4: Averaged supply curve with V∗ = 1000, figure taken from Ziel andSteinert (2016).

where MS ≤ MS denotes the numbers of unique price classes. While thedetails are rather technical (see also Ziel and Steinert (2016)), the graphicalillustration in Figure 4 should make the idea intuitively clear. Following thesame logic and using analogous notation, we obtain boundaries cD0 , c

D1 , . . . , c

DMD

and corresponding price classes for the demand curves. Similarly as with theoriginal supply and demand curves, one can calculate the price that results fromthe price classes and of course, in general, does not exactly coincide with theactual market clearing price P ∗t .

Finally, in order to simplify both implementation and interpretation withoutlosing any essential information, we transform the supply and demand featuresinto a so-called price curve. For this, let {cX0 , cX1 , . . . , cXMX} be the ascendinglyordered union of the supply and demand price class boundaries. Proceeding fromthis, similarly to (6) and (7), we define new price classes CX

1 ,CX2 , . . . ,CX

MX andvolume features

V Xt

(CX

k

):=

∑P∈CX

k

(V St (P ) + V D

t (P )). (8)

We use these price curve features and additionally the total demand Dt(−500)as inputs for the price prediction. Figure 5 shows an example of such a pricecurve calculated from given supply and demand curves.

There is also an economic interpretation for this transformation: In fact,electricity demand is highly price-inelastic, so the constant inelastic demand isthe expected total demand for electricity at that hour. The price curve is theso-called merit order, which represents the electricity production units sorted bytheir variable production costs. For more details, we refer to standard literatureon electricity markets like Burger, Graeber, and Schindlmayr (2014). Note thatthe price curve still contains the information that is necessary to calculate theresulting price: The MCP lies at the intersection of the cumulative price curve

6

Figure 5: Transformed supply and demand curves to price curve (dashed black)and inelastic demand (dashed green vertical line).

and the constant inelastic demand. In addition to the price curve and inelasticdemand, we use as features the renewable infeed and total demand forecasts aswell as some calendar information, namely

• year as a numerical variable,

• a binary variable on daylight saving time,

• type of day as a one-hot-encoded categorical variable with three differentvalues (workday, Saturday/bridge day, Sunday/holiday),

• month, and

• hour.

To account for the periodicity of months and hours, we project these values ona circle and use the two-dimensional projections as features. More precisely, ifdate t ∈ T lies in month m ∈ {1, . . . , 12} and refers to hour h ∈ {0, . . . , 23}, thisis encoded as

monthx(t) := sin

(2πm

12

), monthy(t) := cos

(2πm

12

)and (9)

hourx(t) := sin

(2πh

24

), houry(t) := cos

(2πh

24

). (10)

For prediction, we use the price curve features of a preceding day, the so-called reference date r(d). For notational convenience, we write r(t) := (r(d), h).As a reference date for d we use the nearest day before d which is of the same typeof day as d. This is a simple but efficient technique in energy economics. Moresophisticated methods to define a reference date may incorporate similarities inrenewable infeed and demand profile.

7

Training of learning algorithms We employ ordinary linear regression, ran-dom forests and feed-forward neural networks to predict hourly electricity prices.Note that we use the prices which are implied by the volume features V X

t (CXk )

as target values, which means that the prices we aim to forecast attain thevalues cX0 , c

X1 , . . . , c

XMX . On the whole dataset, the absolute difference between

these price approximations and the real prices is 2.07 EUR/MWh on average(corresponding to a median absolute percentage deviation of 4.28%). While weassume ordinary linear regression to be well-known, we give a brief descriptionof the machine learning algorithms we consider. In each case, our goal is toapproximate the function f : RN → R which maps the features described aboveto the corresponding electricity price. To this end, we assume to be given a setof training data {(x1, y1), . . . , (xn, yn)} where

yi = f(xi) + εi, xi ∈ RN , i = 1, . . . , n, (11)

and (ε1, . . . , εn) is a vector of realizations of independent, homoskedastic randomvariables with zero expectation.

Random forests Random forests are based on a simpler machine learningmethod called decision trees ((Hastie, Tibshirani, & Friedman, 2001, chapter9.2)). While decision trees are easy to understand, they often perform ratherpoorly because of their high dependence on the training data. Random forestsaim to overcome this drawback by averaging the predictions of several deci-sion trees that are trained in a randomized way proceeding from the same data(Breiman (2001)). As part of their training process, random forests offer a con-venient way to assess the influence of each feature on the output. Therefore,they can deliver a ranking of the features according to their relevance for elec-tricity price prediction. While it is quite interesting in its own right, we also usethis ranking for feature selection, i.e., for training a feed-forward neural networkonly on the NF ∈ N most important features (e.g. NF = 10).

Feed-forward neural networks Feed-forward neural networks can be viewedas a far-reaching non-linear extension to ordinary linear regression. They consistof several layers, through which the input is fed via the composition of non-linearactivation functions and weighted sums in order to generate the output. Thesmallest unit (one vector component) of such a layer is called a neuron. A cen-tral result in the theory of neural networks states that, using a non-constant,bounded and continuous activation function, a neural network with just one hid-den layer can in principle approximate any continuous function arbitrarily wellwhen there are sufficiently many neurons and appropriate weights are chosen(Hornik (1991)). In practice, a higher number of layers has been found to im-prove performance for many applications (deep learning). Besides the numberof hidden layers and the number of neurons per layer, there are other so-calledhyperparameters on which forecasting performance can critically depend. Forinstance, the optimisation algorithm that is used to train the network has tobe chosen. Typically, some variant of stochastic gradient descent (SGD) likermsprop (Thieleman and Hinton (2012)) or Adam (Kingma and Ba (2015)) isused. Furthermore, SGD-type algorithms work with batches of training data.The batch size can be varied in order to improve performance. Other hyperpa-rameters which we consider include the number of epochs, i.e., the number of

8

times the training data are fed into the optimisation algorithm, the activationfunction (tangens hyperbolicus, rectified linear unit, identity) and whether toemploy dropout to avoid overfitting (Srivastava, Hinton, Krizhevsky, Sutskever,and Salakhutdinov (2014)) and batch normalization to avoid internal covariateshift (Ioffe and Szegedy (2015)).

Hyperparameter optimisation via cross-validation We choose the hy-perparameter values for the neural networks and random forests using five-foldcross-validation. First, we define a grid of hyperparameter combinations to beevaluated. Then, for every combination of hyperparameters in the grid, we splitour training dataset into five parts or folds of equal size, train a model withthese values on four of the folds and evaluate its performance on the remainingone. After repeating this five times, each time with a different validation fold,we average performances. Finally, once the whole grid has been evaluated, wechoose the hyperparameter combination that performs best on average.

Summarising, the features we use to forecast the spot price of a time pointt with reference time point r(t) are

• the total demand Dr(t)(−500) and the price curve features of the samehour on the reference day, i.e., V X

r(t)(CXk ), k = 1, . . . ,MX ,

• the solar and wind infeed forecasts as well as the total demand forecastfor the time points t and r(t),

• the calendar features year, daylight saving time, type of day, month andhour for the time points t and r(t).

We considered about 100 different parameter combinations for the randomforests with the number of trees equal to 10, 100, 1000, 5000, 10000 or 50000. Forthe neural networks, we tested over 1000 parameter combinations with about20 different network sizes ranging from one hidden layer with 5 neurons to 100hidden layers with 25 neurons each (see Table 1).

4 Results

To evaluate model performance, we primarily use the root-mean-square error

RMSE :=

(1

n

n∑i=1

(yi − yi)2)1/2

,

where yi are the predictions, yi are the true target values and n is the numberof observations for which a prediction is made. Furthermore, we consider themean absolute error

MAE :=1

n

n∑i=1

|yi − yi|

as a more interpretable measure of how far off the prediction is on average.The RMSE is the error measure which the machine learning algorithms aim tominimize during training. Accordingly, we select the model architecture thatperforms best in the 5-fold cross-validation with respect to the RMSE. In theelectricity forecasting literature, sometimes the mean absolute percentage error

9

(MAPE) is used. This is unsuitable for the German market, as often the MCPis at or close to zero. Therefore, we report the median absolute percentage error

MdAPE := med

{|yi − yi||yi|

, i = 1, . . . , n

}for comparison.

Aside from the methods which were described in Section 3, we consider twobenchmarks. The first one is called the naive benchmark (Nogales et al. (2002)).Its forecast for hour h of date d is the price at hour h of the previous day if d isa workday other than Monday and the price at hour h of the same type of dayin the previous week otherwise.

The second benchmark is based on a different market, the Energy ExchangeAustria (EXAA), where the electricity price is fixed two hours before the EPEXauction takes place. Therefore, the EXAA price at a time point t can directlybe used as a predictor for the EPEX price at the same time point. In fact, Ziel,Steinert, and Husmann (2015) show this benchmark to be highly competitive.However, note that it is not really appropriate to compare the remaining fore-casting methods to the EXAA benchmark because they are based on differentinformation (see also Ziel and Steinert (2016)). Nonetheless, the EXAA bench-mark can provide some orientation on how well other models perform and howmuch improvement could be expected.

The best-performing random forest consists of 1000 decision trees where ateach step in the training of the underlying decision trees a randomly chosensubset of size 23 (corresponding to 25%) of all available features is used andwhere a tree node is only split further if it contains at least 1% of all trainingdata. We also use the random forest to support feature selection for the followingneural network approach.

For the neural networks under consideration we use different feature vectorrealizations:

• all available features,

• all but the price curve features of the reference date,

• the 10 most influential features according to the best-performing randomforest,

• the 20 most influential features according to the best-performing randomforest.

For each case we use different network architectures determined by hyperpa-rameter optimisation as described above. These are reported in Table 1 whereeach column corresponds to a choice of features and each row corresponds to ahyperparameter. The notation [5, 5, 5] for the network architecture means thata 3-layer network consisting of 5 nodes per layer is used. For the networks thatare trained on the selected features we find a deeper architecture to performbest: [25] * 25 denotes a 25-layer network with 25 nodes per layer. Analogously,in the dropout row, [0, 0.25, 0] means that dropout is employed with a probabil-ity of 25% after the second layer and [0.1] * 25 means that dropout is employedafter each of the 25 layers with a probability of 10%. It is noteworthy that thebest-performing network when using all features is rather small. Thus, as an ad-ditional plausibility check, we also consider the network architectures proposed

10

by Keles et al. (2016) (network size [48, 48], sigmoid activation, no dropout) andLago et al. (2018) (network size [239, 162], relu activation, no dropout). Notethat their models do not consider price curve features, i.e., order book data.

Table 1: The hyperparameters which were used when training feed-forwardneural networks with different features (all, without curve features, only withthe 10 or 20 most influential features as chosen by the best-performing randomforest).

Hyperparameter All featuresWithout curvefeatures

Selected features(NF = 10)

Selected features(NF = 20)

Networkarchitecture

[5, 5, 5] [5, 5] [25] * 25 [25] * 25

Optimiser rmsprop Adam Adam AdamNumber ofepochs

100 100 100 100

Batch size 128 64 128 128Activationfunction

tanh relu relu relu

Dropout [0, 0.25, 0] [0, 0.25, 0] [0.1] * 25 [0.1] * 25Batchnormalization

no yes yes yes

The results of the chosen model configurations are shown in Table 2. Theerrors we report are measured both on the training set (in-sample error) toevaluate how well the model describes the given data and on the test set (out-of-sample error) to assess model performance on previously unseen data (20%of our whole dataset).

Alternative: More sophisticated network architectures Apart fromfeed-forward neural networks we also analysed recurrent neural networks. Aselectricity spot prices can be expected to depend on previous days’ features andprices, it seems reasonable to model them as a multivariate time series. Whileclassical approaches like ARIMA or GARCH models are possible, this also isa typical application for recurrent neural networks because they explicitly in-corporate the sequential structure of the inputs. In this case, the goal was topredict the 24-dimensional vector of spot prices at some date d based on infor-mation available up to date d − 1. For each date e ≤ d − 1 this informationconsists of the curve features for date e as well as the calendar features, ex-pected renewable infeed and total demand for date e+ 1. We implemented thisapproach using the long short-term memory (LSTM) architecture that allowsfor efficient training of recurrent neural networks (Hochreiter and Schmidhuber(1997)), but the results were not as convincing as with the other methods. Thismight be due to the high dimensionality of the multivariate time series underconsideration. Therefore, we focused on the random forest and feed-forwardneural network approaches where the temporal dependence structure is moreexplicitly incorporated as a feature by means of the reference day.

11

Table 2: Comparison of in-sample and out-of-sample errors in EUR per MWhor % for various price forecasting techniques.

Forecasting techniquein-sample error out-of-sample errorRMSE MAE MdAPE RMSE MAE MdAPE

Naive model 13.55 7.87 15.31% 12.68 7.71 11.61%Ordinary linear regression 6.85 4.25 10.93% 9.60 7.52 16.95%Random forest 6.77 4.17 9.73% 11.92 9.32 19.9%Feed-forward neural networkwith architecture from Keles et al. (2016)

6.72 4.51 11.49% 14.87 12.81 30.63%

Feed-forward neural networkwith architecture from Lago et al. (2018)

2.27 1.65 4.45% 21.05 8.94 15.22%

Feed-forward neural network 5.45 3.57 8.89% 9.59 7.08 14.18%Feed-forward neural networkwithout curve features

6.63 4.41 11.22% 10.11 7.85 16.12%

Feed-forward neural networkwith feature selection (NF = 10)

7.69 5.06 11.68% 9.41 7.34 15.57%

Feed-forward neural networkwith feature selection (NF = 20)

7.71 4.95 11.27% 13.65 10.18 21.48%

EXAA 6.47 3.53 7.56% 5.58 3.92 7.22%

5 Conclusion

Our results show that neural networks can indeed provide order-book-basedprice forecasts with competitive results. However, they do not perform signifi-cantly better than simpler methods like ordinary linear regression. Whereas theclassical order-book-based forecasting technique requires a lot of statistical anal-ysis, the network architecture optimisation also demands significant resources.We also found that reducing the number of features generally improves results.In regard to the RMSE, we find that the feed-forward neural network with only10 features as selected by the random forest performs best. Considering theMAE (a measure directly linked to revenues from financial trading), the feed-forward neural network without feature selection is in the lead. However, thenaive model shows good results as well, supporting this traditional and oftenapplied heuristic in energy economics. The neural network architectures fromliterature show competitive in-sample results, but their performance drops sig-nificantly in an out-of-sample analysis. This indicates overfitting.

The posed research questions have been answered. We have shown how toincorporate order book features using volume-based partitioning, a transforma-tion to price curves and feature selection based on random forests. We have alsoshown that machine learning cannot significantly reduce the work effort neededin the model set-up, but gives competitive results.

The models do have a lot of potential for improvement. For instance, thereare much more accurate wind and solar infeed forecasts available in the marketcompared to the data from EEX transparency (unfortunately they are not freeof charge). We see the largest potential in a daily recalibration of the mod-els including an updated feature selection which allows the model to react tofundamental changes in the market (coal and gas prices, power plant outages,...).

12

In addition, we also analysed different applications of machine learning onEPEX order books, which are not outlined in detail: We employed neural net-works to reconstruct renewable infeed from the order book and used the net-works to generate price forward curves.

References

Aggarwal, S. K., Saini, L. M., & Kumar, A. (2009). Electricity price forecastingin deregulated markets: A review and evaluation. International Journalof Electrical Power & Energy Systems, 31 (1), 13–22. doi: 10.1016/j.ijepes.2008.09.003

Amjady, N. (2006, 06). Day-Ahead Price Forecasting of Electricity Markets bya New Fuzzy Neural Network. IEEE Transactions on Power Systems, 21 ,887–896. doi: 10.1109/TPWRS.2006.873409

Breiman, L. (2001). Random Forests. Machine Learning , 45 , 5–32. doi:10.1023/A:1010933404324

Burger, M., Graeber, B., & Schindlmayr, G. (2014). Managing Energy Risk:An Integrated View on Power and Other Energy Markets. Wiley FinanceSeries. doi: 10.1002/9781118618509

Chen, X., Dong, Z., Meng, K., Xu, Y., Wong, K., & Ngan, H. (2012, 11).Electricity Price Forecasting With Extreme Learning Machine and Boot-strapping. IEEE Transactions on Power Systems, 27 , 2055–2062. doi:10.1109/TPWRS.2012.2190627

Cludius, J., Hermann, H., Matthes, F. C., & Graichen, V. (2014). Themerit order effect of wind and photovoltaic electricity generation in Ger-many 2008–2016: Estimation and distributional implications. Energy Eco-nomics, 44 , 302–313. doi: 10.1016/j.eneco.2014.04.020

Conejo, A. J., Plazas, M. A., Espinola, R., & Molina, A. B. (2005, May). Day-ahead electricity price forecasting using the wavelet transform and ARIMAmodels. IEEE Transactions on Power Systems, 20 (2), 1035–1042. doi:10.1109/TPWRS.2005.846054

Coulon, M., Jacobsson, C., & Strojby, J. (2014, 01). Hourly Resolution ForwardCurves for Power: Statistical Modeling Meets Market Fundamentals. InM. Prokopczuk (Ed.), Energy Pricing Models. Recent Advances, Methods,and Tools (pp. 147–193). doi: 10.1007/978-1-137-37027-3\ 6

Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statisticallearning. Springer. doi: 10.1007/978-0-387-84858-7

Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. NeuralComputation, 9 (8), 1735–1780. doi: 10.1162\%2Fneco.1997.9.8.1735

Hornik, K. (1991). Approximation capabilities of multilayer feedforward net-works. Neural Networks, 4 (2), 251–257. doi: 10.1016/0893-6080(91)90009-T

Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep networktraining by reducing internal covariate shift. CoRR, abs/1502.03167 .

Keles, D., Scelle, J., Paraschiv, F., & Fichtner, W. (2016, 01). Extendedforecast methods for day-ahead electricity prices applying artificial neuralnetworks. Applied Energy , 162 , 218–230. doi: 10.1016/j.apenergy.2015.09.087

13

Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization.In Y. Bengio & Y. LeCun (Eds.), 3rd international conference on learningrepresentations, ICLR 2015, san diego, ca, usa, may 7-9, 2015, conferencetrack proceedings. Retrieved from http://arxiv.org/abs/1412.6980

Lago, J., Ridder, F. D., & Schutter, B. D. (2018). Forecasting spot electricityprices: Deep learning approaches and empirical comparison of traditionalalgorithms. Applied Energy , 221 , 386–405. doi: 10.1016/j.apenergy.2018.02.069

Marcjasz, G., Uniejewski, B., & Weron, R. (2018). On the importance of thelong-term seasonal component in day-ahead electricity price forecastingwith NARX neural networks. International Journal of Forecasting . doi:10.1016/j.ijforecast.2017.11.009

Mosbah, H., & El-Hawary, M. E. (2016). Hourly Electricity Price Forecastingfor the Next Month Using Multilayer Neural Network. Canadian Journalof Electrical and Computer Engineering , 39 , 283-291.

Nogales, F., Contreras, J., J. Conejo, A., & Espinola, R. (2002, 04). ForecastingNext-Day Electricity Prices by Time Series Models. Power EngineeringReview, IEEE , 22 , 58–58. doi: 10.1109/MPER.2002.4312063

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R.(2014, January). Dropout: A simple way to prevent neural networksfrom overfitting. J. Mach. Learn. Res., 15 (1), 1929–1958. Retrieved fromhttp://dl.acm.org/citation.cfm?id=2627435.2670313

Thieleman, T., & Hinton, G. (2012). Lecture 6.5 – rmsprop: Divide the gradientby a running average of its recent magnitude.

Troncoso, A., Santos, J., Gomez-Exposito, A., Martinez-Ramos, J., & Riquelme,J. (2007, 09). Electricity Market Price Forecasting Based on WeightedNearest Neighbors Techniques. IEEE Transactions on Power Systems,22 , 1294–1301. doi: 10.1109/TPWRS.2007.901670

Wagner, A. (2014). Residual demand modeling and application to electricitypricing. The Energy Journal , 35 (2), 45–73. doi: 10.2139/ssrn.2018908

Weron, R. (2014). Electricity price forecasting: A review of the state-of-the-artwith a look into the future. International Journal of Forecasting , 30 (4),1030–1081. doi: 10.1016/j.ijforecast.2014.08.008

Wurzburg, K., Labandeira, X., & Linares, P. (2013). Renewable generationand electricity prices: Taking stock and new evidence for Germany andAustria. Energy Economics, 40 (S1), 159–171. doi: 10.1016/j.eneco.2013.09.011

Ziel, F., & Steinert, R. (2016). Electricity Price Forecasting using Sale andPurchase Curves: The X-Model. Energy Economics, 59 , 435–454. doi:10.1016/j.eneco.2016.08.008

Ziel, F., Steinert, R., & Husmann, S. (2015). Forecasting day ahead electricityspot prices: The impact of the EXAA to other European electricity mar-kets. Energy Economics, 51 , 430–444. doi: 10.1016/j.eneco.2015.08.005

Ziel, F., & Weron, R. (2018). Day-ahead electricity price forecasting with high-dimensional structures: Univariate vs. multivariate modeling frameworks.Energy Economics, 70 , 396–420. doi: 10.1016/j.eneco.2017.12.016

14

http://arxiv.org/abs/1412.6980

http://dl.acm.org/citation.cfm?id=2627435.2670313

Machine Learning on EPEX Order Books: Insights and Forecasts · Fraunhofer-Platz 1, 67663...

Documents

Transcript of Machine Learning on EPEX Order Books: Insights and Forecasts · Fraunhofer-Platz 1, 67663...