Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study...

89
DEGREE PROJECT, IN , SECOND LEVEL COMPUTER SCIENCE STOCKHOLM, SWEDEN 2015 Short-term wind power forecasting using artificial neural networks MORGAN SVENSSON KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

Transcript of Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study...

Page 1: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

DEGREE PROJECT IN SECOND LEVELCOMPUTER SCIENCE

STOCKHOLM SWEDEN 2015

Short-term wind power forecastingusing artificial neural networks

MORGAN SVENSSON

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

Short-term wind power forecasting usingartificial neural networks

Naumlrtidsprognos av vindkraftsproduktion genom anvaumlndandet av artificiella neurala naumltverk

MORGAN SVENSSONMasterrsquos Thesis

atSchool of Computer Science and Communication

Royal Institute of Technology KTHMachine Learning Programme

KTHSupervisor Pawel HermanExaminer Anders Lansner

ExpektraSupervisor Niclas Ehn

June 2015

Abstract

Wind power has seen a tremendous growth in recent years and isexpected to grow even more in years to come In order to better scheduleand utilize this energy source good forecasting techniques are necessaryThis thesis investigates the use of artificial neural networks for short-termwind power prediction It compares two different networks the so calledMultilayer Perceptron and the Hierarchical Temporal Memory CorticalLearning Algorithm These two networks are validated and compared ona benchmark dataset published in the Global Energy Forecasting Com-petition a competition used for short-term wind power prediction Theresults of this study show that the Multilayer Perceptron is able to competewith previously published models and that Hierarchical Temporal Memory Cortical Learning Algorithm is able to beat the reference model

Keywords neural networks wind power generation forecasting machinelearning

ReferatNaumlrtidsprognos av vindkraftsproduktion genomanvaumlndandet av artificiella neurala naumltverk

Vindkraft aumlr just nu den mest oumlkande foumlrnybara energikaumlllan i vaumlrldenoch med denna oumlkning aumlr det viktigt att vi utvecklar bra prognosverktygDet haumlr examensarbetet undersoumlker anvaumlndandet av artificiella neuralanaumltverk applicerat paring naumlrtidsprognoser av vindkraftsproduktion Algorit-merna som undersoumlks aumlr byggda paring den saring kallade flerlagersperceptronen(MLP) och det hierarkiska temporala minnet (HTMCLA) Dessa metodervalideras och jaumlmfoumlrs genom data som publicerats inom GEFCom entaumlvling foumlr energiprognoser Resultatet fraringn studien visar att MLP metodenkan konkurrera med andra publicerade metoder samt att HTMCLA kanslaring referensmodellen

Nyckelord flerlagersperceptronen maskininlaumlrning hierarkiska temporalaminnen

AcknowledgementsI wish to thank first and foremost my great parents Yvonne Svens-son and Benkt Svensson for always being there to support my inter-est

Secondly I want to thank my supervisor Pawel Herman not justfor the help I have received during this thesis but for the things Ihave learned from him studying at KTH

I also want to thank the people at Expektra Niclas Ehn GustavBergman Mattias Jonsson Andreas Johansson Per Aringslund JoelEkeloumlf for introducing me to their area of expertise and energy fore-casting in general

A special thanks to my good friends and classmates Andrea deGiorgio and Vanya Avramova for all the ideas and discussions wehave shared

MORGAN SVENSSON - SUMMER 2015

NomenclatureIndices Constants an Variables

X1n A sequence of values X = [x1 x2 xn]k = 1 kmax Lead time or look-ahead timekmax Maximum prediction horizonN Total number of data pointse Prediction errorε Normalized prediction errorpt Measure of power generation at time t

pt+k|tForecast power generation made at time tfor look-ahead time t+ k

wij Weight of a synapse in the neural network row i layer jb binary values Weighted sum including bias of a perceptronx middot y dot product between x and yx y Row by row element-wise multiplication

Unit of measurements

MW MegawattsGW Gigawatts

Notes

Keys and Dates are given in this form Year Month Day Hour

Contents

Contents

1 Introduction 111 Problem Formulation 312 The scope of the problem 4

2 Background 521 Neural Networks and Time Series Prediction 8

3 Method and Materials 1131 Preliminaries 11

311 Remarks 11312 Definitions 12313 Reference models 13314 Error metrics 13315 Model selection 15316 Evaluation 15

32 Experiments 1633 Holdback Input Randomization 1834 Optimization methods 1835 Neural Networks 19

351 Multilayer Perceptron 19352 Numenta Platform for Intelligent Computing 22

4 Result 3141 Experimental results 4042 Input Importance 41

421 Adaptation and Optimization 4243 Summary 42

5 Discussion 4551 Method development issues 4552 Future improvements and directions 4653 Conclusions 47

Bibliography 49

Appendices 53

A Hyper-parameters 55

B Wind characteristics 59

C Error Distribution 61

List of Figures 76

List of Tables 78

Chapter 1

Introduction

It has been estimated by the World Wind Energy Association (WWEA) that by theyear of 2020 around 12 of the worldrsquos electricity will be available through windpower making wind energy one of the fastest growing energy resources [WWEA2014 Fan et al 2009] but integrating wind energy into existing electricity supplysystems has been a challenge and numerous objections have been put forwardby traditional energy suppliers and grid operators especially for large-scale useof this energy source The biggest concern is that availability mainly depends onmeteorological conditions and production cannot be adjusted as conveniently asother more conventional energy sources this is because of our inability to controlthe wind A single Wind Turbine (WT) is highly variable and its dependency onwind conditions can result in zero output for more than thousands of hours duringthe course of a year however aggregating wind power generation over bigger areasdecreases this chance

This is where wind power forecasting systems come into play a technology thatcan greatly improve the integration of wind energy into electricity supply systems asforecasting systems provide information on how much wind power can be expectedat any given point within the next few days This results in the removal of some therandomness attributed to wind energy and allows a more accurate way to utilizethis clean energy source while offsetting some of our dependency from other moreunfriendly environmentally sources which in the long turn will cause a smallerdegenerative impact on the environment

There are many commercial forecasting models available Prediktor1 [Landbergand Watson 1994] is a physical model developed by the Risoslash National LaboratoryDenmark It is constructed to refine Numerical Weather Prediction (NWP) in order

1httpwwwprediktorno

1

CHAPTER 1 INTRODUCTION

to transform data trough a power curve to produce the forecast while improving theerror rate with Model Output Statistics (MOS) Previento [Focken et al 2001] devel-oped in the University Oldenburg Germany uses a similar approach to Prediktor butwith regional forecasting and uncertainty estimation The Wind Power PredictionTool (WPPT)2 [Nielsen et al 2002] is a statistical model developed by TechnicalUniversity of Denmark and it consists of semi-parametric power curve model forwind farms taking into account both wind speed and direction It uses dynamicalpredictions models describing the dynamics of the wind power and any diurnalvariations Zephyr [Giebel et al 2000] is a hybrid model that is a combination ofboth the WPPT and Prediktor in this model each wind farm is assigned a forecastmodel according to the available data Sipreoacutelico [Gonzalez et al 2004] developedby Red Eleacutectrica de Espantildea is a statistical model that was designed in order to behighly flexible depending on available data It achieves this by switching between 9different models Aiolos Wind3 is a hybrid model developed by Vitec that createsforecasts by combining a statistical model with physical factors such as wind speedat different altitudes with wind direction and air density

Expektra4 is a serviced-based company founded in 2010 that provide and developa new method for short-term demand and supply forecasting based on an ArtificialNeural Network (ANN) and traditional time series analysis They are currentlyexpanding into the area of wind power forecasting and looking for suitable methodsANN have been used successfully on both wind speed forecasting [Lawan et al2014] and wind power forecasting [Kariniotakis et al 1996] It was demonstratedin Liu et al [2012] that a complex valued recurrent neural network was able topredict output with high accuracy and Huang et al [2015] showed that the initialweights being optimized with a genetic algorithm in a back propagated neuralnetwork gave impressive results so there are good reasons to think that part of themethod Expektra are developing can be used for wind power forecasting

Neural networks in general are highly flexible and recent advancements indeep learning have shown to outperform previous models in different domains[Schmidhuber 2015] These networks are very good at automatically findingfeatures that by hand would take a lot of time and effort to achieve One networkthat shares similarities with deep learning and have received less attention isthe Cortical Learning Algorithm (CLA) Hierarchical Temporal Memory (HTM)developed by Numenta [Numenta 2011] This network is also built around the ideaof having hierarchical structures creating a deep neural network CLA HTM is

2httpwwwenforeu3httpwwwvitecsoftwarecom4httpwwwexpektrase

2

11 PROBLEM FORMULATION

currently tailored very specifically for time-series problems and have at the momentlittle research published around it making it an ideal candidate for time seriesprediction with unknown potential on wind power forecasting problems

Even though there is a lot of prominent methods already developed for windpower forecasting there are still room for improvements Energy forecasting is suchan important topic that competitions have been developed The Global EnergyForecasting Competition (GEFCom) [Hong et al 2014] is a competition that canbe used to help evaluate the performance of new forecasting models It is acompetition that has attracted hundreds of contestants from all over the worldwhich has resulted in the contribution of many new and novel ideas The GEFComwas created in order to

1 Bring together state-of-the-art techniques for energy forecasting

2 Bridge the gap between academic research and industry practice

3 Promote analytical approaches in power energy education

4 Prepare the industry to overcome forecasting challenges posed by the smartgrid world

5 Improve energy forecasting practices

Benchmark datasets and competitions are a valuable source when evaluating newmodels as it allows for a common ground to stand on With the publication of Honget al [2014] the dataset in GEFCom2012 was also published This dataset consistsof data from 7 different wind farms that span a time period of tree years It consistsof observational data from the energy production and weather forecasts

The GEFCom is divided into four tracks load forecasting price forecastingwind power forecasting and solar power forecasting The specified problem in thewind power forecasting track is built around the real-time operation of wind farmsand the dataset is structured accordingly Participants try to predict hourly powergeneration up to 1minus 48 hours ahead given meteorological forecasts and historicallyproduced power

11 Problem FormulationThe objective of this thesis is to adapt and analyse the robustness and suitability ofExpektras method when it is applied to the domain of forecasting short-term wind

3

CHAPTER 1 INTRODUCTION

power generation ie What is needed to adapt this model to wind power forecastingproblems

The second objective of this thesis will be to investigate HTM CLA [Numenta2011] a modern state-of-the-art computational theory of the neocortex ie Is itpossible to use Numenta Platform for Intelligent Computing (NuPIC) for wind powerforecasting problems

These questions will be addressed by evaluating the models against other modelspublished in the area of short-term wind power forecasting more specifically thosethat have been published in GEFCom

12 The scope of the problem1 This study will focus on short-term forecasting ie forecasts done for 1minus 48

hours ahead How to preform well on longer forecasts are left to furtherinvestigations

2 Wind power forecasting is closely related to wind speed forecasting ie tryingto use local information at the wind turbine instead of using nwp data toforecast wind power This thesis will not go into any specific details on how-toforecast wind speeds Readers can take a look at Li and Shi [2010] Cadenasand Rivera [2009] Akinci [2015]

3 There are a lot of neural networks that is worth studying but this thesis willfocus on Expektrarsquos method and HTMCLA inside NuPIC NuPIC was pickedover other deep learning methods because its direct relation to time seriesprediction

4

Chapter 2

Background

Wind Power Forecasting (WPF) has applications in generation and transmissionmaintenance planning energy optimization as well as in energy trading WPFmodels exists at different scales and they can be used to predict the productionfor a single WT to a whole Wind Farm (WF) WPF models are generally dividedinto two main groups physical and statistical but hybrids state-of-the-art methodsare also common [Giebel et al 2001 Lang et al 2006] The physical approach[Landberg and Watson 1994 Gaertner et al 2003] focus on integrating well-knownphysical aspects into the model such-as information about surrounding terrain andproperties of the WT These models try to get as good estimate of local wind speedas possible before finally reducing the remaining error with some form of MOSStatistical approaches [Rodrigues et al 2007 Gonzalez et al 2004] relies moreon historical observations and their statistical relation to meteorological predictionsas well as measurements from Supervisory Control And Data Acquisition (SCADA)

SCADA is a control system that is used in most industrial processes and is themain tool to evaluate the state of individual power plants As the name indicates itis not a full control system but rather a system for supervision In the case of windturbines it allows remote access to online data that has been gathered by sensorsinside and around the plant Variables accessible through these systems are thosedirectly related to the wind flow as well as the produced power generated by theplant ie things like rotor wind speed nacelle position pitch angle active poweretc

5

CHAPTER 2 BACKGROUND

Figure 21 A figure that presents the general outline when forecasting using thestatistical approach

Forecast models that use SCADA data as their primary input source usually have agood forecast accuracy at least for the first few hours but they tend to be less usefulfor longer prediction horizons [Giebel et al 2011] SCADA data can also be usedto detect problems in WT something that can be helpful to improve the reliability ofWT [Yang et al 2013 Wang et al 2014]

Statistical models seen in figure 21 are usually built around Numerical WeatherPrediction (NWP) and SCADA data NWP models are often used to build forecastingmodels as they introduce weather forecasts for the region where the wind turbinesare located NWP data usually contains information about thing like wind speedwind direction temperature and humidity These models are operated twice orfour times a day by a number of large weather services The main forecasts usuallystarts at 00 and 12 UTC corresponding to the world radiosonde launching which isthe only direct observation of the atmospheric state and has been the backbone ofatmospheric monitoring for many decades1 extra forecasts usually start at 06 and18 UTC Physical models as those seen in figure 22 include additional informationabout physical characteristics of the wind turbine and its surrounding ie terraindata information about obstacles capacity and layout were the turbine is locatedand so on Other useful information for physical models include the theoretical

1some problems with this approach is that it results with less information over large oceans andpoorer countries

6

power curve how much power is expected to be produced given a specific windspeed

The time scale of WPF methods are generally divided into 3 main groups Very-short-term (up to 9 hours) short-term (up to 72 hours) medium-term (up to 7days) [Costa et al 2008] while the time step for these models are in the rangeof seconds to days depending on its application Very short-term models used forwind power forecasting consists of statistical methods like Kalman Filters Auto-Regressive Moving Average (ARMA) Auto-Regressive with Exogenous Input (ARX)Box-Jenkins etc Input to these models are historical observations of wind speedwind directions temperature etc and common applications for this forecast horizonincludes things like intraday market trading Since these methods are merely basedon past production they are generally not useful for longer horizons

SCADAData

Physical Model

NWPData

WFCData

Forecastwind power generation

Downscaling

Transformation to Hub Height

Spatial Refinements

Conversion to power

WT Power Curve

Model Output Statistics (MOS)

Figure 22 A figure that presents the general steps when forecasting using aphysical model

Short-term forecasting models includes additional data about historical observationsusually obtained from SCADA systems but more importantly weather forecastsfrom NWP models Machine learning methods used in this area include NeuralNetworks [Kusiak et al 2009] Support Vector Machines [Fugon et al 2008]Nearest Neighbour Search [Jursa et al 2007] Random Forests etc Medium-termforecasting models usually incorporate all the methods used for shorter forecasts aswell as physical characteristics

7

CHAPTER 2 BACKGROUND

21 Neural Networks and Time Series PredictionOne of the most common ANN is the so called Multilayer Perceptron (MLP) a net-work built around some very simple properties of a cortical neuron This networkhave been around since the 50rsquos and have matured with a solid mathematical foun-dation This network have been applied successfully to many different applicationsIt is built around the idea of having multiple layers of neurons Each layer feedinginformation forward to the next layer ie a feed forward neural networks

The main advantage of using a multiple layer instead of just a single layer one isto overcome limitations pointed out by Minsky and Selfridge [1960] Minsky andSeymour [1969] were a single layer perceptron is only capable of learning linearlyseparable patterns and thus unable to learn all functions The MLP is not limitedin this way and has been shown to be able to represent a wide variety of functionsgiven appropriate parameters [Hornik et al 1989]

NuPIC is a platform actively developed and maintained by Numenta It is aplatform that introduce a collection of ideas and algorithms that are inspired bythe structural organization of the neocortex Two of the core concepts found insideNuPIC are the so called Hierarchical Temporal Memory and the Cortical LearningAlgorithm HTM was introduced in Hawkins and Blakeslee [2007] and refers to ahierarchy of cortical regions in the brain

The neocortex is classically divided into 6 different layers2 the current imple-mentation in NuPIC is focused on emulating layer 3-4 (specifically layer 3) withextensions and research code being done to include more layers

A CLA region consists of a collection of columns and each column consists of ahandful of cells This structure is based on the minicolumn hypothesis [Buxhoevedenand Casanova 2002] Each column in the CLA region have its own semanticmeaning and the sparse activity of a handful of active columns will tell us somethingabout the input The CLA is modelled so that specific cells within each columnreflect a temporal context of a pattern and a single cortical region (a CLA region) istrained with the CLA algorithm

A typical CLA region consists of around 2K columns containing around 60Kneurons in total while a typical MLP may have less then 100 neurons Each neuronin a HTM network grow new synapses over time and its not uncommon to havearound 5K synapses per neuron meaning we would have around 300M synapses intotal A single region of this size uses around 100 MB of memory and it will takearound 10 msec to do 1 inference and learning step3

2These layers should not to be confused with the hierarchy of regions3These values are given in a talk by ldquoSensor-Motor Integration in the Neocortex 2013 Hackathonrdquo

8

21 NEURAL NETWORKS AND TIME SERIES PREDICTION

The HTMCLA also differs from the MLP in that it does not directly use scalarweights to represent synaptic connectivity Synapses inside a HTM have binaryweights with a scalar permanence each synapse will either be connected or discon-nected while MLP uses scalar weights The connectedness in HTMCLA is basedon a permanence which is a value between 00 and 10 If the permanence is overcertain threshold we have a connected synapse ie we have weights of either 1 or0 This binary mechanism is there to simulate synapses that are able to form andunform during learning So essentially we have a weight change network vs wiringchange network [Chklovskii et al 2004]

NuPIC today is naturally geared towards time-series prediction which makes itan interesting candidate for wind forecasting NuPIC have been shown to outper-form MLP on financial time series [Gabrielsson et al 2013] and it has been usedsuccessfully to balance traffic [Sinkevicius et al 2011] It has also been used on theNASA Aviation Dataset [Lee and Rajabi 2014] but with less promising result MLPhave been used successfully for many kinds of time series problems [Azoff 1994Niska et al 2004 Jain and Kumar 2007]

9

Chapter 3

Method and Materials

The purpose of this chapter is to provide an explanation of the method used inthis thesis It will present terms and concepts needed and used in the field ofMachine Learning with the focus on how to create forecasts based on time series Itpresents a motivational reason of why these methods are used as well as how theyare used This chapter also contains as a description of the implemented artificialneural network how it is structured and optimized It provides details about theexperimental setup and how these experiments relates to GEFCom and finally adescription on how the datasets is structured and processed

31 Preliminaries311 RemarksThe methodology used to evaluate the prediction models presented in this thesisis based on the protocol presented in Madsen et al [2005]1 which is a completeprotocol that can be used to evaluate the performance of short-term WPF andWind-to-Power (W2P) models The reason this protocol was chosen for this thesiswas because it has been successfully used before as a guideline to evaluate a widevariety of forecast models such as AWPPS Prediktor Previento Sipreoacutelico WPPTThe protocol has been used for both on-shore and off-shore wind farms and wasdeveloped in the frame of the ANEMOS research project [Kariniotakis et al 2006]which brought together many relevant groups involved in the field The aim ofANEMOS was to develop accurate and robust models that substantially outperformcurrent state-of-the-art methods and one of its goals was to establish a common set

1It should be pointed out that no widely agreed standardization exists

11

CHAPTER 3 METHOD AND MATERIALS

of performance measures that can be used to compare forecasts across systems andlocations

312 DefinitionsTime-series

A time series sequence X of observations xt with a particular time-stamp t ieX = [x1 x2 xt] or in short X1t = X1t is a time dependent collection ofvariables

Forecast horizon

A multi-step-ahead prediction is assignment of predicting k forecasts Xt+1t+k givena collection of historical observations Xtminusp+1t In the case of GEFCom we want aforecast for 1minus 48 steps ahead

Point or spot forecast

In this paper we model the forecast pt+k|t as a so called point forecast (or spotforecasts) ie a single value for each forecast (compared to having a probabilitydistribution)

Prediction error

The Prediction Error e for lead time t+ k is defined in equation 31 as the differencebetween the forecast and the actual value where t denote the time index and kis the look-ahead time p is the actualmeasuredtrue wind power and p is thepredicted wind power

et+k|t = pt+k minus pt+k|t (31)

and the normalized prediction error ε as seen in equation 32

εt+k|t = 1pinst

et+k|t = 1pinst

(pt+k minus pt+k|t) (32)

where pinst is the installed capacity of the wind farm (in kW or MW) which isunknown in the competition

12

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 2: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

Short-term wind power forecasting usingartificial neural networks

Naumlrtidsprognos av vindkraftsproduktion genom anvaumlndandet av artificiella neurala naumltverk

MORGAN SVENSSONMasterrsquos Thesis

atSchool of Computer Science and Communication

Royal Institute of Technology KTHMachine Learning Programme

KTHSupervisor Pawel HermanExaminer Anders Lansner

ExpektraSupervisor Niclas Ehn

June 2015

Abstract

Wind power has seen a tremendous growth in recent years and isexpected to grow even more in years to come In order to better scheduleand utilize this energy source good forecasting techniques are necessaryThis thesis investigates the use of artificial neural networks for short-termwind power prediction It compares two different networks the so calledMultilayer Perceptron and the Hierarchical Temporal Memory CorticalLearning Algorithm These two networks are validated and compared ona benchmark dataset published in the Global Energy Forecasting Com-petition a competition used for short-term wind power prediction Theresults of this study show that the Multilayer Perceptron is able to competewith previously published models and that Hierarchical Temporal Memory Cortical Learning Algorithm is able to beat the reference model

Keywords neural networks wind power generation forecasting machinelearning

ReferatNaumlrtidsprognos av vindkraftsproduktion genomanvaumlndandet av artificiella neurala naumltverk

Vindkraft aumlr just nu den mest oumlkande foumlrnybara energikaumlllan i vaumlrldenoch med denna oumlkning aumlr det viktigt att vi utvecklar bra prognosverktygDet haumlr examensarbetet undersoumlker anvaumlndandet av artificiella neuralanaumltverk applicerat paring naumlrtidsprognoser av vindkraftsproduktion Algorit-merna som undersoumlks aumlr byggda paring den saring kallade flerlagersperceptronen(MLP) och det hierarkiska temporala minnet (HTMCLA) Dessa metodervalideras och jaumlmfoumlrs genom data som publicerats inom GEFCom entaumlvling foumlr energiprognoser Resultatet fraringn studien visar att MLP metodenkan konkurrera med andra publicerade metoder samt att HTMCLA kanslaring referensmodellen

Nyckelord flerlagersperceptronen maskininlaumlrning hierarkiska temporalaminnen

AcknowledgementsI wish to thank first and foremost my great parents Yvonne Svens-son and Benkt Svensson for always being there to support my inter-est

Secondly I want to thank my supervisor Pawel Herman not justfor the help I have received during this thesis but for the things Ihave learned from him studying at KTH

I also want to thank the people at Expektra Niclas Ehn GustavBergman Mattias Jonsson Andreas Johansson Per Aringslund JoelEkeloumlf for introducing me to their area of expertise and energy fore-casting in general

A special thanks to my good friends and classmates Andrea deGiorgio and Vanya Avramova for all the ideas and discussions wehave shared

MORGAN SVENSSON - SUMMER 2015

NomenclatureIndices Constants an Variables

X1n A sequence of values X = [x1 x2 xn]k = 1 kmax Lead time or look-ahead timekmax Maximum prediction horizonN Total number of data pointse Prediction errorε Normalized prediction errorpt Measure of power generation at time t

pt+k|tForecast power generation made at time tfor look-ahead time t+ k

wij Weight of a synapse in the neural network row i layer jb binary values Weighted sum including bias of a perceptronx middot y dot product between x and yx y Row by row element-wise multiplication

Unit of measurements

MW MegawattsGW Gigawatts

Notes

Keys and Dates are given in this form Year Month Day Hour

Contents

Contents

1 Introduction 111 Problem Formulation 312 The scope of the problem 4

2 Background 521 Neural Networks and Time Series Prediction 8

3 Method and Materials 1131 Preliminaries 11

311 Remarks 11312 Definitions 12313 Reference models 13314 Error metrics 13315 Model selection 15316 Evaluation 15

32 Experiments 1633 Holdback Input Randomization 1834 Optimization methods 1835 Neural Networks 19

351 Multilayer Perceptron 19352 Numenta Platform for Intelligent Computing 22

4 Result 3141 Experimental results 4042 Input Importance 41

421 Adaptation and Optimization 4243 Summary 42

5 Discussion 4551 Method development issues 4552 Future improvements and directions 4653 Conclusions 47

Bibliography 49

Appendices 53

A Hyper-parameters 55

B Wind characteristics 59

C Error Distribution 61

List of Figures 76

List of Tables 78

Chapter 1

Introduction

It has been estimated by the World Wind Energy Association (WWEA) that by theyear of 2020 around 12 of the worldrsquos electricity will be available through windpower making wind energy one of the fastest growing energy resources [WWEA2014 Fan et al 2009] but integrating wind energy into existing electricity supplysystems has been a challenge and numerous objections have been put forwardby traditional energy suppliers and grid operators especially for large-scale useof this energy source The biggest concern is that availability mainly depends onmeteorological conditions and production cannot be adjusted as conveniently asother more conventional energy sources this is because of our inability to controlthe wind A single Wind Turbine (WT) is highly variable and its dependency onwind conditions can result in zero output for more than thousands of hours duringthe course of a year however aggregating wind power generation over bigger areasdecreases this chance

This is where wind power forecasting systems come into play a technology thatcan greatly improve the integration of wind energy into electricity supply systems asforecasting systems provide information on how much wind power can be expectedat any given point within the next few days This results in the removal of some therandomness attributed to wind energy and allows a more accurate way to utilizethis clean energy source while offsetting some of our dependency from other moreunfriendly environmentally sources which in the long turn will cause a smallerdegenerative impact on the environment

There are many commercial forecasting models available Prediktor1 [Landbergand Watson 1994] is a physical model developed by the Risoslash National LaboratoryDenmark It is constructed to refine Numerical Weather Prediction (NWP) in order

1httpwwwprediktorno

1

CHAPTER 1 INTRODUCTION

to transform data trough a power curve to produce the forecast while improving theerror rate with Model Output Statistics (MOS) Previento [Focken et al 2001] devel-oped in the University Oldenburg Germany uses a similar approach to Prediktor butwith regional forecasting and uncertainty estimation The Wind Power PredictionTool (WPPT)2 [Nielsen et al 2002] is a statistical model developed by TechnicalUniversity of Denmark and it consists of semi-parametric power curve model forwind farms taking into account both wind speed and direction It uses dynamicalpredictions models describing the dynamics of the wind power and any diurnalvariations Zephyr [Giebel et al 2000] is a hybrid model that is a combination ofboth the WPPT and Prediktor in this model each wind farm is assigned a forecastmodel according to the available data Sipreoacutelico [Gonzalez et al 2004] developedby Red Eleacutectrica de Espantildea is a statistical model that was designed in order to behighly flexible depending on available data It achieves this by switching between 9different models Aiolos Wind3 is a hybrid model developed by Vitec that createsforecasts by combining a statistical model with physical factors such as wind speedat different altitudes with wind direction and air density

Expektra4 is a serviced-based company founded in 2010 that provide and developa new method for short-term demand and supply forecasting based on an ArtificialNeural Network (ANN) and traditional time series analysis They are currentlyexpanding into the area of wind power forecasting and looking for suitable methodsANN have been used successfully on both wind speed forecasting [Lawan et al2014] and wind power forecasting [Kariniotakis et al 1996] It was demonstratedin Liu et al [2012] that a complex valued recurrent neural network was able topredict output with high accuracy and Huang et al [2015] showed that the initialweights being optimized with a genetic algorithm in a back propagated neuralnetwork gave impressive results so there are good reasons to think that part of themethod Expektra are developing can be used for wind power forecasting

Neural networks in general are highly flexible and recent advancements indeep learning have shown to outperform previous models in different domains[Schmidhuber 2015] These networks are very good at automatically findingfeatures that by hand would take a lot of time and effort to achieve One networkthat shares similarities with deep learning and have received less attention isthe Cortical Learning Algorithm (CLA) Hierarchical Temporal Memory (HTM)developed by Numenta [Numenta 2011] This network is also built around the ideaof having hierarchical structures creating a deep neural network CLA HTM is

2httpwwwenforeu3httpwwwvitecsoftwarecom4httpwwwexpektrase

2

11 PROBLEM FORMULATION

currently tailored very specifically for time-series problems and have at the momentlittle research published around it making it an ideal candidate for time seriesprediction with unknown potential on wind power forecasting problems

Even though there is a lot of prominent methods already developed for windpower forecasting there are still room for improvements Energy forecasting is suchan important topic that competitions have been developed The Global EnergyForecasting Competition (GEFCom) [Hong et al 2014] is a competition that canbe used to help evaluate the performance of new forecasting models It is acompetition that has attracted hundreds of contestants from all over the worldwhich has resulted in the contribution of many new and novel ideas The GEFComwas created in order to

1 Bring together state-of-the-art techniques for energy forecasting

2 Bridge the gap between academic research and industry practice

3 Promote analytical approaches in power energy education

4 Prepare the industry to overcome forecasting challenges posed by the smartgrid world

5 Improve energy forecasting practices

Benchmark datasets and competitions are a valuable source when evaluating newmodels as it allows for a common ground to stand on With the publication of Honget al [2014] the dataset in GEFCom2012 was also published This dataset consistsof data from 7 different wind farms that span a time period of tree years It consistsof observational data from the energy production and weather forecasts

The GEFCom is divided into four tracks load forecasting price forecastingwind power forecasting and solar power forecasting The specified problem in thewind power forecasting track is built around the real-time operation of wind farmsand the dataset is structured accordingly Participants try to predict hourly powergeneration up to 1minus 48 hours ahead given meteorological forecasts and historicallyproduced power

11 Problem FormulationThe objective of this thesis is to adapt and analyse the robustness and suitability ofExpektras method when it is applied to the domain of forecasting short-term wind

3

CHAPTER 1 INTRODUCTION

power generation ie What is needed to adapt this model to wind power forecastingproblems

The second objective of this thesis will be to investigate HTM CLA [Numenta2011] a modern state-of-the-art computational theory of the neocortex ie Is itpossible to use Numenta Platform for Intelligent Computing (NuPIC) for wind powerforecasting problems

These questions will be addressed by evaluating the models against other modelspublished in the area of short-term wind power forecasting more specifically thosethat have been published in GEFCom

12 The scope of the problem1 This study will focus on short-term forecasting ie forecasts done for 1minus 48

hours ahead How to preform well on longer forecasts are left to furtherinvestigations

2 Wind power forecasting is closely related to wind speed forecasting ie tryingto use local information at the wind turbine instead of using nwp data toforecast wind power This thesis will not go into any specific details on how-toforecast wind speeds Readers can take a look at Li and Shi [2010] Cadenasand Rivera [2009] Akinci [2015]

3 There are a lot of neural networks that is worth studying but this thesis willfocus on Expektrarsquos method and HTMCLA inside NuPIC NuPIC was pickedover other deep learning methods because its direct relation to time seriesprediction

4

Chapter 2

Background

Wind Power Forecasting (WPF) has applications in generation and transmissionmaintenance planning energy optimization as well as in energy trading WPFmodels exists at different scales and they can be used to predict the productionfor a single WT to a whole Wind Farm (WF) WPF models are generally dividedinto two main groups physical and statistical but hybrids state-of-the-art methodsare also common [Giebel et al 2001 Lang et al 2006] The physical approach[Landberg and Watson 1994 Gaertner et al 2003] focus on integrating well-knownphysical aspects into the model such-as information about surrounding terrain andproperties of the WT These models try to get as good estimate of local wind speedas possible before finally reducing the remaining error with some form of MOSStatistical approaches [Rodrigues et al 2007 Gonzalez et al 2004] relies moreon historical observations and their statistical relation to meteorological predictionsas well as measurements from Supervisory Control And Data Acquisition (SCADA)

SCADA is a control system that is used in most industrial processes and is themain tool to evaluate the state of individual power plants As the name indicates itis not a full control system but rather a system for supervision In the case of windturbines it allows remote access to online data that has been gathered by sensorsinside and around the plant Variables accessible through these systems are thosedirectly related to the wind flow as well as the produced power generated by theplant ie things like rotor wind speed nacelle position pitch angle active poweretc

5

CHAPTER 2 BACKGROUND

Figure 21 A figure that presents the general outline when forecasting using thestatistical approach

Forecast models that use SCADA data as their primary input source usually have agood forecast accuracy at least for the first few hours but they tend to be less usefulfor longer prediction horizons [Giebel et al 2011] SCADA data can also be usedto detect problems in WT something that can be helpful to improve the reliability ofWT [Yang et al 2013 Wang et al 2014]

Statistical models seen in figure 21 are usually built around Numerical WeatherPrediction (NWP) and SCADA data NWP models are often used to build forecastingmodels as they introduce weather forecasts for the region where the wind turbinesare located NWP data usually contains information about thing like wind speedwind direction temperature and humidity These models are operated twice orfour times a day by a number of large weather services The main forecasts usuallystarts at 00 and 12 UTC corresponding to the world radiosonde launching which isthe only direct observation of the atmospheric state and has been the backbone ofatmospheric monitoring for many decades1 extra forecasts usually start at 06 and18 UTC Physical models as those seen in figure 22 include additional informationabout physical characteristics of the wind turbine and its surrounding ie terraindata information about obstacles capacity and layout were the turbine is locatedand so on Other useful information for physical models include the theoretical

1some problems with this approach is that it results with less information over large oceans andpoorer countries

6

power curve how much power is expected to be produced given a specific windspeed

The time scale of WPF methods are generally divided into 3 main groups Very-short-term (up to 9 hours) short-term (up to 72 hours) medium-term (up to 7days) [Costa et al 2008] while the time step for these models are in the rangeof seconds to days depending on its application Very short-term models used forwind power forecasting consists of statistical methods like Kalman Filters Auto-Regressive Moving Average (ARMA) Auto-Regressive with Exogenous Input (ARX)Box-Jenkins etc Input to these models are historical observations of wind speedwind directions temperature etc and common applications for this forecast horizonincludes things like intraday market trading Since these methods are merely basedon past production they are generally not useful for longer horizons

SCADAData

Physical Model

NWPData

WFCData

Forecastwind power generation

Downscaling

Transformation to Hub Height

Spatial Refinements

Conversion to power

WT Power Curve

Model Output Statistics (MOS)

Figure 22 A figure that presents the general steps when forecasting using aphysical model

Short-term forecasting models includes additional data about historical observationsusually obtained from SCADA systems but more importantly weather forecastsfrom NWP models Machine learning methods used in this area include NeuralNetworks [Kusiak et al 2009] Support Vector Machines [Fugon et al 2008]Nearest Neighbour Search [Jursa et al 2007] Random Forests etc Medium-termforecasting models usually incorporate all the methods used for shorter forecasts aswell as physical characteristics

7

CHAPTER 2 BACKGROUND

21 Neural Networks and Time Series PredictionOne of the most common ANN is the so called Multilayer Perceptron (MLP) a net-work built around some very simple properties of a cortical neuron This networkhave been around since the 50rsquos and have matured with a solid mathematical foun-dation This network have been applied successfully to many different applicationsIt is built around the idea of having multiple layers of neurons Each layer feedinginformation forward to the next layer ie a feed forward neural networks

The main advantage of using a multiple layer instead of just a single layer one isto overcome limitations pointed out by Minsky and Selfridge [1960] Minsky andSeymour [1969] were a single layer perceptron is only capable of learning linearlyseparable patterns and thus unable to learn all functions The MLP is not limitedin this way and has been shown to be able to represent a wide variety of functionsgiven appropriate parameters [Hornik et al 1989]

NuPIC is a platform actively developed and maintained by Numenta It is aplatform that introduce a collection of ideas and algorithms that are inspired bythe structural organization of the neocortex Two of the core concepts found insideNuPIC are the so called Hierarchical Temporal Memory and the Cortical LearningAlgorithm HTM was introduced in Hawkins and Blakeslee [2007] and refers to ahierarchy of cortical regions in the brain

The neocortex is classically divided into 6 different layers2 the current imple-mentation in NuPIC is focused on emulating layer 3-4 (specifically layer 3) withextensions and research code being done to include more layers

A CLA region consists of a collection of columns and each column consists of ahandful of cells This structure is based on the minicolumn hypothesis [Buxhoevedenand Casanova 2002] Each column in the CLA region have its own semanticmeaning and the sparse activity of a handful of active columns will tell us somethingabout the input The CLA is modelled so that specific cells within each columnreflect a temporal context of a pattern and a single cortical region (a CLA region) istrained with the CLA algorithm

A typical CLA region consists of around 2K columns containing around 60Kneurons in total while a typical MLP may have less then 100 neurons Each neuronin a HTM network grow new synapses over time and its not uncommon to havearound 5K synapses per neuron meaning we would have around 300M synapses intotal A single region of this size uses around 100 MB of memory and it will takearound 10 msec to do 1 inference and learning step3

2These layers should not to be confused with the hierarchy of regions3These values are given in a talk by ldquoSensor-Motor Integration in the Neocortex 2013 Hackathonrdquo

8

21 NEURAL NETWORKS AND TIME SERIES PREDICTION

The HTMCLA also differs from the MLP in that it does not directly use scalarweights to represent synaptic connectivity Synapses inside a HTM have binaryweights with a scalar permanence each synapse will either be connected or discon-nected while MLP uses scalar weights The connectedness in HTMCLA is basedon a permanence which is a value between 00 and 10 If the permanence is overcertain threshold we have a connected synapse ie we have weights of either 1 or0 This binary mechanism is there to simulate synapses that are able to form andunform during learning So essentially we have a weight change network vs wiringchange network [Chklovskii et al 2004]

NuPIC today is naturally geared towards time-series prediction which makes itan interesting candidate for wind forecasting NuPIC have been shown to outper-form MLP on financial time series [Gabrielsson et al 2013] and it has been usedsuccessfully to balance traffic [Sinkevicius et al 2011] It has also been used on theNASA Aviation Dataset [Lee and Rajabi 2014] but with less promising result MLPhave been used successfully for many kinds of time series problems [Azoff 1994Niska et al 2004 Jain and Kumar 2007]

9

Chapter 3

Method and Materials

The purpose of this chapter is to provide an explanation of the method used inthis thesis It will present terms and concepts needed and used in the field ofMachine Learning with the focus on how to create forecasts based on time series Itpresents a motivational reason of why these methods are used as well as how theyare used This chapter also contains as a description of the implemented artificialneural network how it is structured and optimized It provides details about theexperimental setup and how these experiments relates to GEFCom and finally adescription on how the datasets is structured and processed

31 Preliminaries311 RemarksThe methodology used to evaluate the prediction models presented in this thesisis based on the protocol presented in Madsen et al [2005]1 which is a completeprotocol that can be used to evaluate the performance of short-term WPF andWind-to-Power (W2P) models The reason this protocol was chosen for this thesiswas because it has been successfully used before as a guideline to evaluate a widevariety of forecast models such as AWPPS Prediktor Previento Sipreoacutelico WPPTThe protocol has been used for both on-shore and off-shore wind farms and wasdeveloped in the frame of the ANEMOS research project [Kariniotakis et al 2006]which brought together many relevant groups involved in the field The aim ofANEMOS was to develop accurate and robust models that substantially outperformcurrent state-of-the-art methods and one of its goals was to establish a common set

1It should be pointed out that no widely agreed standardization exists

11

CHAPTER 3 METHOD AND MATERIALS

of performance measures that can be used to compare forecasts across systems andlocations

312 DefinitionsTime-series

A time series sequence X of observations xt with a particular time-stamp t ieX = [x1 x2 xt] or in short X1t = X1t is a time dependent collection ofvariables

Forecast horizon

A multi-step-ahead prediction is assignment of predicting k forecasts Xt+1t+k givena collection of historical observations Xtminusp+1t In the case of GEFCom we want aforecast for 1minus 48 steps ahead

Point or spot forecast

In this paper we model the forecast pt+k|t as a so called point forecast (or spotforecasts) ie a single value for each forecast (compared to having a probabilitydistribution)

Prediction error

The Prediction Error e for lead time t+ k is defined in equation 31 as the differencebetween the forecast and the actual value where t denote the time index and kis the look-ahead time p is the actualmeasuredtrue wind power and p is thepredicted wind power

et+k|t = pt+k minus pt+k|t (31)

and the normalized prediction error ε as seen in equation 32

εt+k|t = 1pinst

et+k|t = 1pinst

(pt+k minus pt+k|t) (32)

where pinst is the installed capacity of the wind farm (in kW or MW) which isunknown in the competition

12

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 3: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

Abstract

Wind power has seen a tremendous growth in recent years and isexpected to grow even more in years to come In order to better scheduleand utilize this energy source good forecasting techniques are necessaryThis thesis investigates the use of artificial neural networks for short-termwind power prediction It compares two different networks the so calledMultilayer Perceptron and the Hierarchical Temporal Memory CorticalLearning Algorithm These two networks are validated and compared ona benchmark dataset published in the Global Energy Forecasting Com-petition a competition used for short-term wind power prediction Theresults of this study show that the Multilayer Perceptron is able to competewith previously published models and that Hierarchical Temporal Memory Cortical Learning Algorithm is able to beat the reference model

Keywords neural networks wind power generation forecasting machinelearning

ReferatNaumlrtidsprognos av vindkraftsproduktion genomanvaumlndandet av artificiella neurala naumltverk

Vindkraft aumlr just nu den mest oumlkande foumlrnybara energikaumlllan i vaumlrldenoch med denna oumlkning aumlr det viktigt att vi utvecklar bra prognosverktygDet haumlr examensarbetet undersoumlker anvaumlndandet av artificiella neuralanaumltverk applicerat paring naumlrtidsprognoser av vindkraftsproduktion Algorit-merna som undersoumlks aumlr byggda paring den saring kallade flerlagersperceptronen(MLP) och det hierarkiska temporala minnet (HTMCLA) Dessa metodervalideras och jaumlmfoumlrs genom data som publicerats inom GEFCom entaumlvling foumlr energiprognoser Resultatet fraringn studien visar att MLP metodenkan konkurrera med andra publicerade metoder samt att HTMCLA kanslaring referensmodellen

Nyckelord flerlagersperceptronen maskininlaumlrning hierarkiska temporalaminnen

AcknowledgementsI wish to thank first and foremost my great parents Yvonne Svens-son and Benkt Svensson for always being there to support my inter-est

Secondly I want to thank my supervisor Pawel Herman not justfor the help I have received during this thesis but for the things Ihave learned from him studying at KTH

I also want to thank the people at Expektra Niclas Ehn GustavBergman Mattias Jonsson Andreas Johansson Per Aringslund JoelEkeloumlf for introducing me to their area of expertise and energy fore-casting in general

A special thanks to my good friends and classmates Andrea deGiorgio and Vanya Avramova for all the ideas and discussions wehave shared

MORGAN SVENSSON - SUMMER 2015

NomenclatureIndices Constants an Variables

X1n A sequence of values X = [x1 x2 xn]k = 1 kmax Lead time or look-ahead timekmax Maximum prediction horizonN Total number of data pointse Prediction errorε Normalized prediction errorpt Measure of power generation at time t

pt+k|tForecast power generation made at time tfor look-ahead time t+ k

wij Weight of a synapse in the neural network row i layer jb binary values Weighted sum including bias of a perceptronx middot y dot product between x and yx y Row by row element-wise multiplication

Unit of measurements

MW MegawattsGW Gigawatts

Notes

Keys and Dates are given in this form Year Month Day Hour

Contents

Contents

1 Introduction 111 Problem Formulation 312 The scope of the problem 4

2 Background 521 Neural Networks and Time Series Prediction 8

3 Method and Materials 1131 Preliminaries 11

311 Remarks 11312 Definitions 12313 Reference models 13314 Error metrics 13315 Model selection 15316 Evaluation 15

32 Experiments 1633 Holdback Input Randomization 1834 Optimization methods 1835 Neural Networks 19

351 Multilayer Perceptron 19352 Numenta Platform for Intelligent Computing 22

4 Result 3141 Experimental results 4042 Input Importance 41

421 Adaptation and Optimization 4243 Summary 42

5 Discussion 4551 Method development issues 4552 Future improvements and directions 4653 Conclusions 47

Bibliography 49

Appendices 53

A Hyper-parameters 55

B Wind characteristics 59

C Error Distribution 61

List of Figures 76

List of Tables 78

Chapter 1

Introduction

It has been estimated by the World Wind Energy Association (WWEA) that by theyear of 2020 around 12 of the worldrsquos electricity will be available through windpower making wind energy one of the fastest growing energy resources [WWEA2014 Fan et al 2009] but integrating wind energy into existing electricity supplysystems has been a challenge and numerous objections have been put forwardby traditional energy suppliers and grid operators especially for large-scale useof this energy source The biggest concern is that availability mainly depends onmeteorological conditions and production cannot be adjusted as conveniently asother more conventional energy sources this is because of our inability to controlthe wind A single Wind Turbine (WT) is highly variable and its dependency onwind conditions can result in zero output for more than thousands of hours duringthe course of a year however aggregating wind power generation over bigger areasdecreases this chance

This is where wind power forecasting systems come into play a technology thatcan greatly improve the integration of wind energy into electricity supply systems asforecasting systems provide information on how much wind power can be expectedat any given point within the next few days This results in the removal of some therandomness attributed to wind energy and allows a more accurate way to utilizethis clean energy source while offsetting some of our dependency from other moreunfriendly environmentally sources which in the long turn will cause a smallerdegenerative impact on the environment

There are many commercial forecasting models available Prediktor1 [Landbergand Watson 1994] is a physical model developed by the Risoslash National LaboratoryDenmark It is constructed to refine Numerical Weather Prediction (NWP) in order

1httpwwwprediktorno

1

CHAPTER 1 INTRODUCTION

to transform data trough a power curve to produce the forecast while improving theerror rate with Model Output Statistics (MOS) Previento [Focken et al 2001] devel-oped in the University Oldenburg Germany uses a similar approach to Prediktor butwith regional forecasting and uncertainty estimation The Wind Power PredictionTool (WPPT)2 [Nielsen et al 2002] is a statistical model developed by TechnicalUniversity of Denmark and it consists of semi-parametric power curve model forwind farms taking into account both wind speed and direction It uses dynamicalpredictions models describing the dynamics of the wind power and any diurnalvariations Zephyr [Giebel et al 2000] is a hybrid model that is a combination ofboth the WPPT and Prediktor in this model each wind farm is assigned a forecastmodel according to the available data Sipreoacutelico [Gonzalez et al 2004] developedby Red Eleacutectrica de Espantildea is a statistical model that was designed in order to behighly flexible depending on available data It achieves this by switching between 9different models Aiolos Wind3 is a hybrid model developed by Vitec that createsforecasts by combining a statistical model with physical factors such as wind speedat different altitudes with wind direction and air density

Expektra4 is a serviced-based company founded in 2010 that provide and developa new method for short-term demand and supply forecasting based on an ArtificialNeural Network (ANN) and traditional time series analysis They are currentlyexpanding into the area of wind power forecasting and looking for suitable methodsANN have been used successfully on both wind speed forecasting [Lawan et al2014] and wind power forecasting [Kariniotakis et al 1996] It was demonstratedin Liu et al [2012] that a complex valued recurrent neural network was able topredict output with high accuracy and Huang et al [2015] showed that the initialweights being optimized with a genetic algorithm in a back propagated neuralnetwork gave impressive results so there are good reasons to think that part of themethod Expektra are developing can be used for wind power forecasting

Neural networks in general are highly flexible and recent advancements indeep learning have shown to outperform previous models in different domains[Schmidhuber 2015] These networks are very good at automatically findingfeatures that by hand would take a lot of time and effort to achieve One networkthat shares similarities with deep learning and have received less attention isthe Cortical Learning Algorithm (CLA) Hierarchical Temporal Memory (HTM)developed by Numenta [Numenta 2011] This network is also built around the ideaof having hierarchical structures creating a deep neural network CLA HTM is

2httpwwwenforeu3httpwwwvitecsoftwarecom4httpwwwexpektrase

2

11 PROBLEM FORMULATION

currently tailored very specifically for time-series problems and have at the momentlittle research published around it making it an ideal candidate for time seriesprediction with unknown potential on wind power forecasting problems

Even though there is a lot of prominent methods already developed for windpower forecasting there are still room for improvements Energy forecasting is suchan important topic that competitions have been developed The Global EnergyForecasting Competition (GEFCom) [Hong et al 2014] is a competition that canbe used to help evaluate the performance of new forecasting models It is acompetition that has attracted hundreds of contestants from all over the worldwhich has resulted in the contribution of many new and novel ideas The GEFComwas created in order to

1 Bring together state-of-the-art techniques for energy forecasting

2 Bridge the gap between academic research and industry practice

3 Promote analytical approaches in power energy education

4 Prepare the industry to overcome forecasting challenges posed by the smartgrid world

5 Improve energy forecasting practices

Benchmark datasets and competitions are a valuable source when evaluating newmodels as it allows for a common ground to stand on With the publication of Honget al [2014] the dataset in GEFCom2012 was also published This dataset consistsof data from 7 different wind farms that span a time period of tree years It consistsof observational data from the energy production and weather forecasts

The GEFCom is divided into four tracks load forecasting price forecastingwind power forecasting and solar power forecasting The specified problem in thewind power forecasting track is built around the real-time operation of wind farmsand the dataset is structured accordingly Participants try to predict hourly powergeneration up to 1minus 48 hours ahead given meteorological forecasts and historicallyproduced power

11 Problem FormulationThe objective of this thesis is to adapt and analyse the robustness and suitability ofExpektras method when it is applied to the domain of forecasting short-term wind

3

CHAPTER 1 INTRODUCTION

power generation ie What is needed to adapt this model to wind power forecastingproblems

The second objective of this thesis will be to investigate HTM CLA [Numenta2011] a modern state-of-the-art computational theory of the neocortex ie Is itpossible to use Numenta Platform for Intelligent Computing (NuPIC) for wind powerforecasting problems

These questions will be addressed by evaluating the models against other modelspublished in the area of short-term wind power forecasting more specifically thosethat have been published in GEFCom

12 The scope of the problem1 This study will focus on short-term forecasting ie forecasts done for 1minus 48

hours ahead How to preform well on longer forecasts are left to furtherinvestigations

2 Wind power forecasting is closely related to wind speed forecasting ie tryingto use local information at the wind turbine instead of using nwp data toforecast wind power This thesis will not go into any specific details on how-toforecast wind speeds Readers can take a look at Li and Shi [2010] Cadenasand Rivera [2009] Akinci [2015]

3 There are a lot of neural networks that is worth studying but this thesis willfocus on Expektrarsquos method and HTMCLA inside NuPIC NuPIC was pickedover other deep learning methods because its direct relation to time seriesprediction

4

Chapter 2

Background

Wind Power Forecasting (WPF) has applications in generation and transmissionmaintenance planning energy optimization as well as in energy trading WPFmodels exists at different scales and they can be used to predict the productionfor a single WT to a whole Wind Farm (WF) WPF models are generally dividedinto two main groups physical and statistical but hybrids state-of-the-art methodsare also common [Giebel et al 2001 Lang et al 2006] The physical approach[Landberg and Watson 1994 Gaertner et al 2003] focus on integrating well-knownphysical aspects into the model such-as information about surrounding terrain andproperties of the WT These models try to get as good estimate of local wind speedas possible before finally reducing the remaining error with some form of MOSStatistical approaches [Rodrigues et al 2007 Gonzalez et al 2004] relies moreon historical observations and their statistical relation to meteorological predictionsas well as measurements from Supervisory Control And Data Acquisition (SCADA)

SCADA is a control system that is used in most industrial processes and is themain tool to evaluate the state of individual power plants As the name indicates itis not a full control system but rather a system for supervision In the case of windturbines it allows remote access to online data that has been gathered by sensorsinside and around the plant Variables accessible through these systems are thosedirectly related to the wind flow as well as the produced power generated by theplant ie things like rotor wind speed nacelle position pitch angle active poweretc

5

CHAPTER 2 BACKGROUND

Figure 21 A figure that presents the general outline when forecasting using thestatistical approach

Forecast models that use SCADA data as their primary input source usually have agood forecast accuracy at least for the first few hours but they tend to be less usefulfor longer prediction horizons [Giebel et al 2011] SCADA data can also be usedto detect problems in WT something that can be helpful to improve the reliability ofWT [Yang et al 2013 Wang et al 2014]

Statistical models seen in figure 21 are usually built around Numerical WeatherPrediction (NWP) and SCADA data NWP models are often used to build forecastingmodels as they introduce weather forecasts for the region where the wind turbinesare located NWP data usually contains information about thing like wind speedwind direction temperature and humidity These models are operated twice orfour times a day by a number of large weather services The main forecasts usuallystarts at 00 and 12 UTC corresponding to the world radiosonde launching which isthe only direct observation of the atmospheric state and has been the backbone ofatmospheric monitoring for many decades1 extra forecasts usually start at 06 and18 UTC Physical models as those seen in figure 22 include additional informationabout physical characteristics of the wind turbine and its surrounding ie terraindata information about obstacles capacity and layout were the turbine is locatedand so on Other useful information for physical models include the theoretical

1some problems with this approach is that it results with less information over large oceans andpoorer countries

6

power curve how much power is expected to be produced given a specific windspeed

The time scale of WPF methods are generally divided into 3 main groups Very-short-term (up to 9 hours) short-term (up to 72 hours) medium-term (up to 7days) [Costa et al 2008] while the time step for these models are in the rangeof seconds to days depending on its application Very short-term models used forwind power forecasting consists of statistical methods like Kalman Filters Auto-Regressive Moving Average (ARMA) Auto-Regressive with Exogenous Input (ARX)Box-Jenkins etc Input to these models are historical observations of wind speedwind directions temperature etc and common applications for this forecast horizonincludes things like intraday market trading Since these methods are merely basedon past production they are generally not useful for longer horizons

SCADAData

Physical Model

NWPData

WFCData

Forecastwind power generation

Downscaling

Transformation to Hub Height

Spatial Refinements

Conversion to power

WT Power Curve

Model Output Statistics (MOS)

Figure 22 A figure that presents the general steps when forecasting using aphysical model

Short-term forecasting models includes additional data about historical observationsusually obtained from SCADA systems but more importantly weather forecastsfrom NWP models Machine learning methods used in this area include NeuralNetworks [Kusiak et al 2009] Support Vector Machines [Fugon et al 2008]Nearest Neighbour Search [Jursa et al 2007] Random Forests etc Medium-termforecasting models usually incorporate all the methods used for shorter forecasts aswell as physical characteristics

7

CHAPTER 2 BACKGROUND

21 Neural Networks and Time Series PredictionOne of the most common ANN is the so called Multilayer Perceptron (MLP) a net-work built around some very simple properties of a cortical neuron This networkhave been around since the 50rsquos and have matured with a solid mathematical foun-dation This network have been applied successfully to many different applicationsIt is built around the idea of having multiple layers of neurons Each layer feedinginformation forward to the next layer ie a feed forward neural networks

The main advantage of using a multiple layer instead of just a single layer one isto overcome limitations pointed out by Minsky and Selfridge [1960] Minsky andSeymour [1969] were a single layer perceptron is only capable of learning linearlyseparable patterns and thus unable to learn all functions The MLP is not limitedin this way and has been shown to be able to represent a wide variety of functionsgiven appropriate parameters [Hornik et al 1989]

NuPIC is a platform actively developed and maintained by Numenta It is aplatform that introduce a collection of ideas and algorithms that are inspired bythe structural organization of the neocortex Two of the core concepts found insideNuPIC are the so called Hierarchical Temporal Memory and the Cortical LearningAlgorithm HTM was introduced in Hawkins and Blakeslee [2007] and refers to ahierarchy of cortical regions in the brain

The neocortex is classically divided into 6 different layers2 the current imple-mentation in NuPIC is focused on emulating layer 3-4 (specifically layer 3) withextensions and research code being done to include more layers

A CLA region consists of a collection of columns and each column consists of ahandful of cells This structure is based on the minicolumn hypothesis [Buxhoevedenand Casanova 2002] Each column in the CLA region have its own semanticmeaning and the sparse activity of a handful of active columns will tell us somethingabout the input The CLA is modelled so that specific cells within each columnreflect a temporal context of a pattern and a single cortical region (a CLA region) istrained with the CLA algorithm

A typical CLA region consists of around 2K columns containing around 60Kneurons in total while a typical MLP may have less then 100 neurons Each neuronin a HTM network grow new synapses over time and its not uncommon to havearound 5K synapses per neuron meaning we would have around 300M synapses intotal A single region of this size uses around 100 MB of memory and it will takearound 10 msec to do 1 inference and learning step3

2These layers should not to be confused with the hierarchy of regions3These values are given in a talk by ldquoSensor-Motor Integration in the Neocortex 2013 Hackathonrdquo

8

21 NEURAL NETWORKS AND TIME SERIES PREDICTION

The HTMCLA also differs from the MLP in that it does not directly use scalarweights to represent synaptic connectivity Synapses inside a HTM have binaryweights with a scalar permanence each synapse will either be connected or discon-nected while MLP uses scalar weights The connectedness in HTMCLA is basedon a permanence which is a value between 00 and 10 If the permanence is overcertain threshold we have a connected synapse ie we have weights of either 1 or0 This binary mechanism is there to simulate synapses that are able to form andunform during learning So essentially we have a weight change network vs wiringchange network [Chklovskii et al 2004]

NuPIC today is naturally geared towards time-series prediction which makes itan interesting candidate for wind forecasting NuPIC have been shown to outper-form MLP on financial time series [Gabrielsson et al 2013] and it has been usedsuccessfully to balance traffic [Sinkevicius et al 2011] It has also been used on theNASA Aviation Dataset [Lee and Rajabi 2014] but with less promising result MLPhave been used successfully for many kinds of time series problems [Azoff 1994Niska et al 2004 Jain and Kumar 2007]

9

Chapter 3

Method and Materials

The purpose of this chapter is to provide an explanation of the method used inthis thesis It will present terms and concepts needed and used in the field ofMachine Learning with the focus on how to create forecasts based on time series Itpresents a motivational reason of why these methods are used as well as how theyare used This chapter also contains as a description of the implemented artificialneural network how it is structured and optimized It provides details about theexperimental setup and how these experiments relates to GEFCom and finally adescription on how the datasets is structured and processed

31 Preliminaries311 RemarksThe methodology used to evaluate the prediction models presented in this thesisis based on the protocol presented in Madsen et al [2005]1 which is a completeprotocol that can be used to evaluate the performance of short-term WPF andWind-to-Power (W2P) models The reason this protocol was chosen for this thesiswas because it has been successfully used before as a guideline to evaluate a widevariety of forecast models such as AWPPS Prediktor Previento Sipreoacutelico WPPTThe protocol has been used for both on-shore and off-shore wind farms and wasdeveloped in the frame of the ANEMOS research project [Kariniotakis et al 2006]which brought together many relevant groups involved in the field The aim ofANEMOS was to develop accurate and robust models that substantially outperformcurrent state-of-the-art methods and one of its goals was to establish a common set

1It should be pointed out that no widely agreed standardization exists

11

CHAPTER 3 METHOD AND MATERIALS

of performance measures that can be used to compare forecasts across systems andlocations

312 DefinitionsTime-series

A time series sequence X of observations xt with a particular time-stamp t ieX = [x1 x2 xt] or in short X1t = X1t is a time dependent collection ofvariables

Forecast horizon

A multi-step-ahead prediction is assignment of predicting k forecasts Xt+1t+k givena collection of historical observations Xtminusp+1t In the case of GEFCom we want aforecast for 1minus 48 steps ahead

Point or spot forecast

In this paper we model the forecast pt+k|t as a so called point forecast (or spotforecasts) ie a single value for each forecast (compared to having a probabilitydistribution)

Prediction error

The Prediction Error e for lead time t+ k is defined in equation 31 as the differencebetween the forecast and the actual value where t denote the time index and kis the look-ahead time p is the actualmeasuredtrue wind power and p is thepredicted wind power

et+k|t = pt+k minus pt+k|t (31)

and the normalized prediction error ε as seen in equation 32

εt+k|t = 1pinst

et+k|t = 1pinst

(pt+k minus pt+k|t) (32)

where pinst is the installed capacity of the wind farm (in kW or MW) which isunknown in the competition

12

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 4: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

ReferatNaumlrtidsprognos av vindkraftsproduktion genomanvaumlndandet av artificiella neurala naumltverk

Vindkraft aumlr just nu den mest oumlkande foumlrnybara energikaumlllan i vaumlrldenoch med denna oumlkning aumlr det viktigt att vi utvecklar bra prognosverktygDet haumlr examensarbetet undersoumlker anvaumlndandet av artificiella neuralanaumltverk applicerat paring naumlrtidsprognoser av vindkraftsproduktion Algorit-merna som undersoumlks aumlr byggda paring den saring kallade flerlagersperceptronen(MLP) och det hierarkiska temporala minnet (HTMCLA) Dessa metodervalideras och jaumlmfoumlrs genom data som publicerats inom GEFCom entaumlvling foumlr energiprognoser Resultatet fraringn studien visar att MLP metodenkan konkurrera med andra publicerade metoder samt att HTMCLA kanslaring referensmodellen

Nyckelord flerlagersperceptronen maskininlaumlrning hierarkiska temporalaminnen

AcknowledgementsI wish to thank first and foremost my great parents Yvonne Svens-son and Benkt Svensson for always being there to support my inter-est

Secondly I want to thank my supervisor Pawel Herman not justfor the help I have received during this thesis but for the things Ihave learned from him studying at KTH

I also want to thank the people at Expektra Niclas Ehn GustavBergman Mattias Jonsson Andreas Johansson Per Aringslund JoelEkeloumlf for introducing me to their area of expertise and energy fore-casting in general

A special thanks to my good friends and classmates Andrea deGiorgio and Vanya Avramova for all the ideas and discussions wehave shared

MORGAN SVENSSON - SUMMER 2015

NomenclatureIndices Constants an Variables

X1n A sequence of values X = [x1 x2 xn]k = 1 kmax Lead time or look-ahead timekmax Maximum prediction horizonN Total number of data pointse Prediction errorε Normalized prediction errorpt Measure of power generation at time t

pt+k|tForecast power generation made at time tfor look-ahead time t+ k

wij Weight of a synapse in the neural network row i layer jb binary values Weighted sum including bias of a perceptronx middot y dot product between x and yx y Row by row element-wise multiplication

Unit of measurements

MW MegawattsGW Gigawatts

Notes

Keys and Dates are given in this form Year Month Day Hour

Contents

Contents

1 Introduction 111 Problem Formulation 312 The scope of the problem 4

2 Background 521 Neural Networks and Time Series Prediction 8

3 Method and Materials 1131 Preliminaries 11

311 Remarks 11312 Definitions 12313 Reference models 13314 Error metrics 13315 Model selection 15316 Evaluation 15

32 Experiments 1633 Holdback Input Randomization 1834 Optimization methods 1835 Neural Networks 19

351 Multilayer Perceptron 19352 Numenta Platform for Intelligent Computing 22

4 Result 3141 Experimental results 4042 Input Importance 41

421 Adaptation and Optimization 4243 Summary 42

5 Discussion 4551 Method development issues 4552 Future improvements and directions 4653 Conclusions 47

Bibliography 49

Appendices 53

A Hyper-parameters 55

B Wind characteristics 59

C Error Distribution 61

List of Figures 76

List of Tables 78

Chapter 1

Introduction

It has been estimated by the World Wind Energy Association (WWEA) that by theyear of 2020 around 12 of the worldrsquos electricity will be available through windpower making wind energy one of the fastest growing energy resources [WWEA2014 Fan et al 2009] but integrating wind energy into existing electricity supplysystems has been a challenge and numerous objections have been put forwardby traditional energy suppliers and grid operators especially for large-scale useof this energy source The biggest concern is that availability mainly depends onmeteorological conditions and production cannot be adjusted as conveniently asother more conventional energy sources this is because of our inability to controlthe wind A single Wind Turbine (WT) is highly variable and its dependency onwind conditions can result in zero output for more than thousands of hours duringthe course of a year however aggregating wind power generation over bigger areasdecreases this chance

This is where wind power forecasting systems come into play a technology thatcan greatly improve the integration of wind energy into electricity supply systems asforecasting systems provide information on how much wind power can be expectedat any given point within the next few days This results in the removal of some therandomness attributed to wind energy and allows a more accurate way to utilizethis clean energy source while offsetting some of our dependency from other moreunfriendly environmentally sources which in the long turn will cause a smallerdegenerative impact on the environment

There are many commercial forecasting models available Prediktor1 [Landbergand Watson 1994] is a physical model developed by the Risoslash National LaboratoryDenmark It is constructed to refine Numerical Weather Prediction (NWP) in order

1httpwwwprediktorno

1

CHAPTER 1 INTRODUCTION

to transform data trough a power curve to produce the forecast while improving theerror rate with Model Output Statistics (MOS) Previento [Focken et al 2001] devel-oped in the University Oldenburg Germany uses a similar approach to Prediktor butwith regional forecasting and uncertainty estimation The Wind Power PredictionTool (WPPT)2 [Nielsen et al 2002] is a statistical model developed by TechnicalUniversity of Denmark and it consists of semi-parametric power curve model forwind farms taking into account both wind speed and direction It uses dynamicalpredictions models describing the dynamics of the wind power and any diurnalvariations Zephyr [Giebel et al 2000] is a hybrid model that is a combination ofboth the WPPT and Prediktor in this model each wind farm is assigned a forecastmodel according to the available data Sipreoacutelico [Gonzalez et al 2004] developedby Red Eleacutectrica de Espantildea is a statistical model that was designed in order to behighly flexible depending on available data It achieves this by switching between 9different models Aiolos Wind3 is a hybrid model developed by Vitec that createsforecasts by combining a statistical model with physical factors such as wind speedat different altitudes with wind direction and air density

Expektra4 is a serviced-based company founded in 2010 that provide and developa new method for short-term demand and supply forecasting based on an ArtificialNeural Network (ANN) and traditional time series analysis They are currentlyexpanding into the area of wind power forecasting and looking for suitable methodsANN have been used successfully on both wind speed forecasting [Lawan et al2014] and wind power forecasting [Kariniotakis et al 1996] It was demonstratedin Liu et al [2012] that a complex valued recurrent neural network was able topredict output with high accuracy and Huang et al [2015] showed that the initialweights being optimized with a genetic algorithm in a back propagated neuralnetwork gave impressive results so there are good reasons to think that part of themethod Expektra are developing can be used for wind power forecasting

Neural networks in general are highly flexible and recent advancements indeep learning have shown to outperform previous models in different domains[Schmidhuber 2015] These networks are very good at automatically findingfeatures that by hand would take a lot of time and effort to achieve One networkthat shares similarities with deep learning and have received less attention isthe Cortical Learning Algorithm (CLA) Hierarchical Temporal Memory (HTM)developed by Numenta [Numenta 2011] This network is also built around the ideaof having hierarchical structures creating a deep neural network CLA HTM is

2httpwwwenforeu3httpwwwvitecsoftwarecom4httpwwwexpektrase

2

11 PROBLEM FORMULATION

currently tailored very specifically for time-series problems and have at the momentlittle research published around it making it an ideal candidate for time seriesprediction with unknown potential on wind power forecasting problems

Even though there is a lot of prominent methods already developed for windpower forecasting there are still room for improvements Energy forecasting is suchan important topic that competitions have been developed The Global EnergyForecasting Competition (GEFCom) [Hong et al 2014] is a competition that canbe used to help evaluate the performance of new forecasting models It is acompetition that has attracted hundreds of contestants from all over the worldwhich has resulted in the contribution of many new and novel ideas The GEFComwas created in order to

1 Bring together state-of-the-art techniques for energy forecasting

2 Bridge the gap between academic research and industry practice

3 Promote analytical approaches in power energy education

4 Prepare the industry to overcome forecasting challenges posed by the smartgrid world

5 Improve energy forecasting practices

Benchmark datasets and competitions are a valuable source when evaluating newmodels as it allows for a common ground to stand on With the publication of Honget al [2014] the dataset in GEFCom2012 was also published This dataset consistsof data from 7 different wind farms that span a time period of tree years It consistsof observational data from the energy production and weather forecasts

The GEFCom is divided into four tracks load forecasting price forecastingwind power forecasting and solar power forecasting The specified problem in thewind power forecasting track is built around the real-time operation of wind farmsand the dataset is structured accordingly Participants try to predict hourly powergeneration up to 1minus 48 hours ahead given meteorological forecasts and historicallyproduced power

11 Problem FormulationThe objective of this thesis is to adapt and analyse the robustness and suitability ofExpektras method when it is applied to the domain of forecasting short-term wind

3

CHAPTER 1 INTRODUCTION

power generation ie What is needed to adapt this model to wind power forecastingproblems

The second objective of this thesis will be to investigate HTM CLA [Numenta2011] a modern state-of-the-art computational theory of the neocortex ie Is itpossible to use Numenta Platform for Intelligent Computing (NuPIC) for wind powerforecasting problems

These questions will be addressed by evaluating the models against other modelspublished in the area of short-term wind power forecasting more specifically thosethat have been published in GEFCom

12 The scope of the problem1 This study will focus on short-term forecasting ie forecasts done for 1minus 48

hours ahead How to preform well on longer forecasts are left to furtherinvestigations

2 Wind power forecasting is closely related to wind speed forecasting ie tryingto use local information at the wind turbine instead of using nwp data toforecast wind power This thesis will not go into any specific details on how-toforecast wind speeds Readers can take a look at Li and Shi [2010] Cadenasand Rivera [2009] Akinci [2015]

3 There are a lot of neural networks that is worth studying but this thesis willfocus on Expektrarsquos method and HTMCLA inside NuPIC NuPIC was pickedover other deep learning methods because its direct relation to time seriesprediction

4

Chapter 2

Background

Wind Power Forecasting (WPF) has applications in generation and transmissionmaintenance planning energy optimization as well as in energy trading WPFmodels exists at different scales and they can be used to predict the productionfor a single WT to a whole Wind Farm (WF) WPF models are generally dividedinto two main groups physical and statistical but hybrids state-of-the-art methodsare also common [Giebel et al 2001 Lang et al 2006] The physical approach[Landberg and Watson 1994 Gaertner et al 2003] focus on integrating well-knownphysical aspects into the model such-as information about surrounding terrain andproperties of the WT These models try to get as good estimate of local wind speedas possible before finally reducing the remaining error with some form of MOSStatistical approaches [Rodrigues et al 2007 Gonzalez et al 2004] relies moreon historical observations and their statistical relation to meteorological predictionsas well as measurements from Supervisory Control And Data Acquisition (SCADA)

SCADA is a control system that is used in most industrial processes and is themain tool to evaluate the state of individual power plants As the name indicates itis not a full control system but rather a system for supervision In the case of windturbines it allows remote access to online data that has been gathered by sensorsinside and around the plant Variables accessible through these systems are thosedirectly related to the wind flow as well as the produced power generated by theplant ie things like rotor wind speed nacelle position pitch angle active poweretc

5

CHAPTER 2 BACKGROUND

Figure 21 A figure that presents the general outline when forecasting using thestatistical approach

Forecast models that use SCADA data as their primary input source usually have agood forecast accuracy at least for the first few hours but they tend to be less usefulfor longer prediction horizons [Giebel et al 2011] SCADA data can also be usedto detect problems in WT something that can be helpful to improve the reliability ofWT [Yang et al 2013 Wang et al 2014]

Statistical models seen in figure 21 are usually built around Numerical WeatherPrediction (NWP) and SCADA data NWP models are often used to build forecastingmodels as they introduce weather forecasts for the region where the wind turbinesare located NWP data usually contains information about thing like wind speedwind direction temperature and humidity These models are operated twice orfour times a day by a number of large weather services The main forecasts usuallystarts at 00 and 12 UTC corresponding to the world radiosonde launching which isthe only direct observation of the atmospheric state and has been the backbone ofatmospheric monitoring for many decades1 extra forecasts usually start at 06 and18 UTC Physical models as those seen in figure 22 include additional informationabout physical characteristics of the wind turbine and its surrounding ie terraindata information about obstacles capacity and layout were the turbine is locatedand so on Other useful information for physical models include the theoretical

1some problems with this approach is that it results with less information over large oceans andpoorer countries

6

power curve how much power is expected to be produced given a specific windspeed

The time scale of WPF methods are generally divided into 3 main groups Very-short-term (up to 9 hours) short-term (up to 72 hours) medium-term (up to 7days) [Costa et al 2008] while the time step for these models are in the rangeof seconds to days depending on its application Very short-term models used forwind power forecasting consists of statistical methods like Kalman Filters Auto-Regressive Moving Average (ARMA) Auto-Regressive with Exogenous Input (ARX)Box-Jenkins etc Input to these models are historical observations of wind speedwind directions temperature etc and common applications for this forecast horizonincludes things like intraday market trading Since these methods are merely basedon past production they are generally not useful for longer horizons

SCADAData

Physical Model

NWPData

WFCData

Forecastwind power generation

Downscaling

Transformation to Hub Height

Spatial Refinements

Conversion to power

WT Power Curve

Model Output Statistics (MOS)

Figure 22 A figure that presents the general steps when forecasting using aphysical model

Short-term forecasting models includes additional data about historical observationsusually obtained from SCADA systems but more importantly weather forecastsfrom NWP models Machine learning methods used in this area include NeuralNetworks [Kusiak et al 2009] Support Vector Machines [Fugon et al 2008]Nearest Neighbour Search [Jursa et al 2007] Random Forests etc Medium-termforecasting models usually incorporate all the methods used for shorter forecasts aswell as physical characteristics

7

CHAPTER 2 BACKGROUND

21 Neural Networks and Time Series PredictionOne of the most common ANN is the so called Multilayer Perceptron (MLP) a net-work built around some very simple properties of a cortical neuron This networkhave been around since the 50rsquos and have matured with a solid mathematical foun-dation This network have been applied successfully to many different applicationsIt is built around the idea of having multiple layers of neurons Each layer feedinginformation forward to the next layer ie a feed forward neural networks

The main advantage of using a multiple layer instead of just a single layer one isto overcome limitations pointed out by Minsky and Selfridge [1960] Minsky andSeymour [1969] were a single layer perceptron is only capable of learning linearlyseparable patterns and thus unable to learn all functions The MLP is not limitedin this way and has been shown to be able to represent a wide variety of functionsgiven appropriate parameters [Hornik et al 1989]

NuPIC is a platform actively developed and maintained by Numenta It is aplatform that introduce a collection of ideas and algorithms that are inspired bythe structural organization of the neocortex Two of the core concepts found insideNuPIC are the so called Hierarchical Temporal Memory and the Cortical LearningAlgorithm HTM was introduced in Hawkins and Blakeslee [2007] and refers to ahierarchy of cortical regions in the brain

The neocortex is classically divided into 6 different layers2 the current imple-mentation in NuPIC is focused on emulating layer 3-4 (specifically layer 3) withextensions and research code being done to include more layers

A CLA region consists of a collection of columns and each column consists of ahandful of cells This structure is based on the minicolumn hypothesis [Buxhoevedenand Casanova 2002] Each column in the CLA region have its own semanticmeaning and the sparse activity of a handful of active columns will tell us somethingabout the input The CLA is modelled so that specific cells within each columnreflect a temporal context of a pattern and a single cortical region (a CLA region) istrained with the CLA algorithm

A typical CLA region consists of around 2K columns containing around 60Kneurons in total while a typical MLP may have less then 100 neurons Each neuronin a HTM network grow new synapses over time and its not uncommon to havearound 5K synapses per neuron meaning we would have around 300M synapses intotal A single region of this size uses around 100 MB of memory and it will takearound 10 msec to do 1 inference and learning step3

2These layers should not to be confused with the hierarchy of regions3These values are given in a talk by ldquoSensor-Motor Integration in the Neocortex 2013 Hackathonrdquo

8

21 NEURAL NETWORKS AND TIME SERIES PREDICTION

The HTMCLA also differs from the MLP in that it does not directly use scalarweights to represent synaptic connectivity Synapses inside a HTM have binaryweights with a scalar permanence each synapse will either be connected or discon-nected while MLP uses scalar weights The connectedness in HTMCLA is basedon a permanence which is a value between 00 and 10 If the permanence is overcertain threshold we have a connected synapse ie we have weights of either 1 or0 This binary mechanism is there to simulate synapses that are able to form andunform during learning So essentially we have a weight change network vs wiringchange network [Chklovskii et al 2004]

NuPIC today is naturally geared towards time-series prediction which makes itan interesting candidate for wind forecasting NuPIC have been shown to outper-form MLP on financial time series [Gabrielsson et al 2013] and it has been usedsuccessfully to balance traffic [Sinkevicius et al 2011] It has also been used on theNASA Aviation Dataset [Lee and Rajabi 2014] but with less promising result MLPhave been used successfully for many kinds of time series problems [Azoff 1994Niska et al 2004 Jain and Kumar 2007]

9

Chapter 3

Method and Materials

The purpose of this chapter is to provide an explanation of the method used inthis thesis It will present terms and concepts needed and used in the field ofMachine Learning with the focus on how to create forecasts based on time series Itpresents a motivational reason of why these methods are used as well as how theyare used This chapter also contains as a description of the implemented artificialneural network how it is structured and optimized It provides details about theexperimental setup and how these experiments relates to GEFCom and finally adescription on how the datasets is structured and processed

31 Preliminaries311 RemarksThe methodology used to evaluate the prediction models presented in this thesisis based on the protocol presented in Madsen et al [2005]1 which is a completeprotocol that can be used to evaluate the performance of short-term WPF andWind-to-Power (W2P) models The reason this protocol was chosen for this thesiswas because it has been successfully used before as a guideline to evaluate a widevariety of forecast models such as AWPPS Prediktor Previento Sipreoacutelico WPPTThe protocol has been used for both on-shore and off-shore wind farms and wasdeveloped in the frame of the ANEMOS research project [Kariniotakis et al 2006]which brought together many relevant groups involved in the field The aim ofANEMOS was to develop accurate and robust models that substantially outperformcurrent state-of-the-art methods and one of its goals was to establish a common set

1It should be pointed out that no widely agreed standardization exists

11

CHAPTER 3 METHOD AND MATERIALS

of performance measures that can be used to compare forecasts across systems andlocations

312 DefinitionsTime-series

A time series sequence X of observations xt with a particular time-stamp t ieX = [x1 x2 xt] or in short X1t = X1t is a time dependent collection ofvariables

Forecast horizon

A multi-step-ahead prediction is assignment of predicting k forecasts Xt+1t+k givena collection of historical observations Xtminusp+1t In the case of GEFCom we want aforecast for 1minus 48 steps ahead

Point or spot forecast

In this paper we model the forecast pt+k|t as a so called point forecast (or spotforecasts) ie a single value for each forecast (compared to having a probabilitydistribution)

Prediction error

The Prediction Error e for lead time t+ k is defined in equation 31 as the differencebetween the forecast and the actual value where t denote the time index and kis the look-ahead time p is the actualmeasuredtrue wind power and p is thepredicted wind power

et+k|t = pt+k minus pt+k|t (31)

and the normalized prediction error ε as seen in equation 32

εt+k|t = 1pinst

et+k|t = 1pinst

(pt+k minus pt+k|t) (32)

where pinst is the installed capacity of the wind farm (in kW or MW) which isunknown in the competition

12

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 5: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

AcknowledgementsI wish to thank first and foremost my great parents Yvonne Svens-son and Benkt Svensson for always being there to support my inter-est

Secondly I want to thank my supervisor Pawel Herman not justfor the help I have received during this thesis but for the things Ihave learned from him studying at KTH

I also want to thank the people at Expektra Niclas Ehn GustavBergman Mattias Jonsson Andreas Johansson Per Aringslund JoelEkeloumlf for introducing me to their area of expertise and energy fore-casting in general

A special thanks to my good friends and classmates Andrea deGiorgio and Vanya Avramova for all the ideas and discussions wehave shared

MORGAN SVENSSON - SUMMER 2015

NomenclatureIndices Constants an Variables

X1n A sequence of values X = [x1 x2 xn]k = 1 kmax Lead time or look-ahead timekmax Maximum prediction horizonN Total number of data pointse Prediction errorε Normalized prediction errorpt Measure of power generation at time t

pt+k|tForecast power generation made at time tfor look-ahead time t+ k

wij Weight of a synapse in the neural network row i layer jb binary values Weighted sum including bias of a perceptronx middot y dot product between x and yx y Row by row element-wise multiplication

Unit of measurements

MW MegawattsGW Gigawatts

Notes

Keys and Dates are given in this form Year Month Day Hour

Contents

Contents

1 Introduction 111 Problem Formulation 312 The scope of the problem 4

2 Background 521 Neural Networks and Time Series Prediction 8

3 Method and Materials 1131 Preliminaries 11

311 Remarks 11312 Definitions 12313 Reference models 13314 Error metrics 13315 Model selection 15316 Evaluation 15

32 Experiments 1633 Holdback Input Randomization 1834 Optimization methods 1835 Neural Networks 19

351 Multilayer Perceptron 19352 Numenta Platform for Intelligent Computing 22

4 Result 3141 Experimental results 4042 Input Importance 41

421 Adaptation and Optimization 4243 Summary 42

5 Discussion 4551 Method development issues 4552 Future improvements and directions 4653 Conclusions 47

Bibliography 49

Appendices 53

A Hyper-parameters 55

B Wind characteristics 59

C Error Distribution 61

List of Figures 76

List of Tables 78

Chapter 1

Introduction

It has been estimated by the World Wind Energy Association (WWEA) that by theyear of 2020 around 12 of the worldrsquos electricity will be available through windpower making wind energy one of the fastest growing energy resources [WWEA2014 Fan et al 2009] but integrating wind energy into existing electricity supplysystems has been a challenge and numerous objections have been put forwardby traditional energy suppliers and grid operators especially for large-scale useof this energy source The biggest concern is that availability mainly depends onmeteorological conditions and production cannot be adjusted as conveniently asother more conventional energy sources this is because of our inability to controlthe wind A single Wind Turbine (WT) is highly variable and its dependency onwind conditions can result in zero output for more than thousands of hours duringthe course of a year however aggregating wind power generation over bigger areasdecreases this chance

This is where wind power forecasting systems come into play a technology thatcan greatly improve the integration of wind energy into electricity supply systems asforecasting systems provide information on how much wind power can be expectedat any given point within the next few days This results in the removal of some therandomness attributed to wind energy and allows a more accurate way to utilizethis clean energy source while offsetting some of our dependency from other moreunfriendly environmentally sources which in the long turn will cause a smallerdegenerative impact on the environment

There are many commercial forecasting models available Prediktor1 [Landbergand Watson 1994] is a physical model developed by the Risoslash National LaboratoryDenmark It is constructed to refine Numerical Weather Prediction (NWP) in order

1httpwwwprediktorno

1

CHAPTER 1 INTRODUCTION

to transform data trough a power curve to produce the forecast while improving theerror rate with Model Output Statistics (MOS) Previento [Focken et al 2001] devel-oped in the University Oldenburg Germany uses a similar approach to Prediktor butwith regional forecasting and uncertainty estimation The Wind Power PredictionTool (WPPT)2 [Nielsen et al 2002] is a statistical model developed by TechnicalUniversity of Denmark and it consists of semi-parametric power curve model forwind farms taking into account both wind speed and direction It uses dynamicalpredictions models describing the dynamics of the wind power and any diurnalvariations Zephyr [Giebel et al 2000] is a hybrid model that is a combination ofboth the WPPT and Prediktor in this model each wind farm is assigned a forecastmodel according to the available data Sipreoacutelico [Gonzalez et al 2004] developedby Red Eleacutectrica de Espantildea is a statistical model that was designed in order to behighly flexible depending on available data It achieves this by switching between 9different models Aiolos Wind3 is a hybrid model developed by Vitec that createsforecasts by combining a statistical model with physical factors such as wind speedat different altitudes with wind direction and air density

Expektra4 is a serviced-based company founded in 2010 that provide and developa new method for short-term demand and supply forecasting based on an ArtificialNeural Network (ANN) and traditional time series analysis They are currentlyexpanding into the area of wind power forecasting and looking for suitable methodsANN have been used successfully on both wind speed forecasting [Lawan et al2014] and wind power forecasting [Kariniotakis et al 1996] It was demonstratedin Liu et al [2012] that a complex valued recurrent neural network was able topredict output with high accuracy and Huang et al [2015] showed that the initialweights being optimized with a genetic algorithm in a back propagated neuralnetwork gave impressive results so there are good reasons to think that part of themethod Expektra are developing can be used for wind power forecasting

Neural networks in general are highly flexible and recent advancements indeep learning have shown to outperform previous models in different domains[Schmidhuber 2015] These networks are very good at automatically findingfeatures that by hand would take a lot of time and effort to achieve One networkthat shares similarities with deep learning and have received less attention isthe Cortical Learning Algorithm (CLA) Hierarchical Temporal Memory (HTM)developed by Numenta [Numenta 2011] This network is also built around the ideaof having hierarchical structures creating a deep neural network CLA HTM is

2httpwwwenforeu3httpwwwvitecsoftwarecom4httpwwwexpektrase

2

11 PROBLEM FORMULATION

currently tailored very specifically for time-series problems and have at the momentlittle research published around it making it an ideal candidate for time seriesprediction with unknown potential on wind power forecasting problems

Even though there is a lot of prominent methods already developed for windpower forecasting there are still room for improvements Energy forecasting is suchan important topic that competitions have been developed The Global EnergyForecasting Competition (GEFCom) [Hong et al 2014] is a competition that canbe used to help evaluate the performance of new forecasting models It is acompetition that has attracted hundreds of contestants from all over the worldwhich has resulted in the contribution of many new and novel ideas The GEFComwas created in order to

1 Bring together state-of-the-art techniques for energy forecasting

2 Bridge the gap between academic research and industry practice

3 Promote analytical approaches in power energy education

4 Prepare the industry to overcome forecasting challenges posed by the smartgrid world

5 Improve energy forecasting practices

Benchmark datasets and competitions are a valuable source when evaluating newmodels as it allows for a common ground to stand on With the publication of Honget al [2014] the dataset in GEFCom2012 was also published This dataset consistsof data from 7 different wind farms that span a time period of tree years It consistsof observational data from the energy production and weather forecasts

The GEFCom is divided into four tracks load forecasting price forecastingwind power forecasting and solar power forecasting The specified problem in thewind power forecasting track is built around the real-time operation of wind farmsand the dataset is structured accordingly Participants try to predict hourly powergeneration up to 1minus 48 hours ahead given meteorological forecasts and historicallyproduced power

11 Problem FormulationThe objective of this thesis is to adapt and analyse the robustness and suitability ofExpektras method when it is applied to the domain of forecasting short-term wind

3

CHAPTER 1 INTRODUCTION

power generation ie What is needed to adapt this model to wind power forecastingproblems

The second objective of this thesis will be to investigate HTM CLA [Numenta2011] a modern state-of-the-art computational theory of the neocortex ie Is itpossible to use Numenta Platform for Intelligent Computing (NuPIC) for wind powerforecasting problems

These questions will be addressed by evaluating the models against other modelspublished in the area of short-term wind power forecasting more specifically thosethat have been published in GEFCom

12 The scope of the problem1 This study will focus on short-term forecasting ie forecasts done for 1minus 48

hours ahead How to preform well on longer forecasts are left to furtherinvestigations

2 Wind power forecasting is closely related to wind speed forecasting ie tryingto use local information at the wind turbine instead of using nwp data toforecast wind power This thesis will not go into any specific details on how-toforecast wind speeds Readers can take a look at Li and Shi [2010] Cadenasand Rivera [2009] Akinci [2015]

3 There are a lot of neural networks that is worth studying but this thesis willfocus on Expektrarsquos method and HTMCLA inside NuPIC NuPIC was pickedover other deep learning methods because its direct relation to time seriesprediction

4

Chapter 2

Background

Wind Power Forecasting (WPF) has applications in generation and transmissionmaintenance planning energy optimization as well as in energy trading WPFmodels exists at different scales and they can be used to predict the productionfor a single WT to a whole Wind Farm (WF) WPF models are generally dividedinto two main groups physical and statistical but hybrids state-of-the-art methodsare also common [Giebel et al 2001 Lang et al 2006] The physical approach[Landberg and Watson 1994 Gaertner et al 2003] focus on integrating well-knownphysical aspects into the model such-as information about surrounding terrain andproperties of the WT These models try to get as good estimate of local wind speedas possible before finally reducing the remaining error with some form of MOSStatistical approaches [Rodrigues et al 2007 Gonzalez et al 2004] relies moreon historical observations and their statistical relation to meteorological predictionsas well as measurements from Supervisory Control And Data Acquisition (SCADA)

SCADA is a control system that is used in most industrial processes and is themain tool to evaluate the state of individual power plants As the name indicates itis not a full control system but rather a system for supervision In the case of windturbines it allows remote access to online data that has been gathered by sensorsinside and around the plant Variables accessible through these systems are thosedirectly related to the wind flow as well as the produced power generated by theplant ie things like rotor wind speed nacelle position pitch angle active poweretc

5

CHAPTER 2 BACKGROUND

Figure 21 A figure that presents the general outline when forecasting using thestatistical approach

Forecast models that use SCADA data as their primary input source usually have agood forecast accuracy at least for the first few hours but they tend to be less usefulfor longer prediction horizons [Giebel et al 2011] SCADA data can also be usedto detect problems in WT something that can be helpful to improve the reliability ofWT [Yang et al 2013 Wang et al 2014]

Statistical models seen in figure 21 are usually built around Numerical WeatherPrediction (NWP) and SCADA data NWP models are often used to build forecastingmodels as they introduce weather forecasts for the region where the wind turbinesare located NWP data usually contains information about thing like wind speedwind direction temperature and humidity These models are operated twice orfour times a day by a number of large weather services The main forecasts usuallystarts at 00 and 12 UTC corresponding to the world radiosonde launching which isthe only direct observation of the atmospheric state and has been the backbone ofatmospheric monitoring for many decades1 extra forecasts usually start at 06 and18 UTC Physical models as those seen in figure 22 include additional informationabout physical characteristics of the wind turbine and its surrounding ie terraindata information about obstacles capacity and layout were the turbine is locatedand so on Other useful information for physical models include the theoretical

1some problems with this approach is that it results with less information over large oceans andpoorer countries

6

power curve how much power is expected to be produced given a specific windspeed

The time scale of WPF methods are generally divided into 3 main groups Very-short-term (up to 9 hours) short-term (up to 72 hours) medium-term (up to 7days) [Costa et al 2008] while the time step for these models are in the rangeof seconds to days depending on its application Very short-term models used forwind power forecasting consists of statistical methods like Kalman Filters Auto-Regressive Moving Average (ARMA) Auto-Regressive with Exogenous Input (ARX)Box-Jenkins etc Input to these models are historical observations of wind speedwind directions temperature etc and common applications for this forecast horizonincludes things like intraday market trading Since these methods are merely basedon past production they are generally not useful for longer horizons

SCADAData

Physical Model

NWPData

WFCData

Forecastwind power generation

Downscaling

Transformation to Hub Height

Spatial Refinements

Conversion to power

WT Power Curve

Model Output Statistics (MOS)

Figure 22 A figure that presents the general steps when forecasting using aphysical model

Short-term forecasting models includes additional data about historical observationsusually obtained from SCADA systems but more importantly weather forecastsfrom NWP models Machine learning methods used in this area include NeuralNetworks [Kusiak et al 2009] Support Vector Machines [Fugon et al 2008]Nearest Neighbour Search [Jursa et al 2007] Random Forests etc Medium-termforecasting models usually incorporate all the methods used for shorter forecasts aswell as physical characteristics

7

CHAPTER 2 BACKGROUND

21 Neural Networks and Time Series PredictionOne of the most common ANN is the so called Multilayer Perceptron (MLP) a net-work built around some very simple properties of a cortical neuron This networkhave been around since the 50rsquos and have matured with a solid mathematical foun-dation This network have been applied successfully to many different applicationsIt is built around the idea of having multiple layers of neurons Each layer feedinginformation forward to the next layer ie a feed forward neural networks

The main advantage of using a multiple layer instead of just a single layer one isto overcome limitations pointed out by Minsky and Selfridge [1960] Minsky andSeymour [1969] were a single layer perceptron is only capable of learning linearlyseparable patterns and thus unable to learn all functions The MLP is not limitedin this way and has been shown to be able to represent a wide variety of functionsgiven appropriate parameters [Hornik et al 1989]

NuPIC is a platform actively developed and maintained by Numenta It is aplatform that introduce a collection of ideas and algorithms that are inspired bythe structural organization of the neocortex Two of the core concepts found insideNuPIC are the so called Hierarchical Temporal Memory and the Cortical LearningAlgorithm HTM was introduced in Hawkins and Blakeslee [2007] and refers to ahierarchy of cortical regions in the brain

The neocortex is classically divided into 6 different layers2 the current imple-mentation in NuPIC is focused on emulating layer 3-4 (specifically layer 3) withextensions and research code being done to include more layers

A CLA region consists of a collection of columns and each column consists of ahandful of cells This structure is based on the minicolumn hypothesis [Buxhoevedenand Casanova 2002] Each column in the CLA region have its own semanticmeaning and the sparse activity of a handful of active columns will tell us somethingabout the input The CLA is modelled so that specific cells within each columnreflect a temporal context of a pattern and a single cortical region (a CLA region) istrained with the CLA algorithm

A typical CLA region consists of around 2K columns containing around 60Kneurons in total while a typical MLP may have less then 100 neurons Each neuronin a HTM network grow new synapses over time and its not uncommon to havearound 5K synapses per neuron meaning we would have around 300M synapses intotal A single region of this size uses around 100 MB of memory and it will takearound 10 msec to do 1 inference and learning step3

2These layers should not to be confused with the hierarchy of regions3These values are given in a talk by ldquoSensor-Motor Integration in the Neocortex 2013 Hackathonrdquo

8

21 NEURAL NETWORKS AND TIME SERIES PREDICTION

The HTMCLA also differs from the MLP in that it does not directly use scalarweights to represent synaptic connectivity Synapses inside a HTM have binaryweights with a scalar permanence each synapse will either be connected or discon-nected while MLP uses scalar weights The connectedness in HTMCLA is basedon a permanence which is a value between 00 and 10 If the permanence is overcertain threshold we have a connected synapse ie we have weights of either 1 or0 This binary mechanism is there to simulate synapses that are able to form andunform during learning So essentially we have a weight change network vs wiringchange network [Chklovskii et al 2004]

NuPIC today is naturally geared towards time-series prediction which makes itan interesting candidate for wind forecasting NuPIC have been shown to outper-form MLP on financial time series [Gabrielsson et al 2013] and it has been usedsuccessfully to balance traffic [Sinkevicius et al 2011] It has also been used on theNASA Aviation Dataset [Lee and Rajabi 2014] but with less promising result MLPhave been used successfully for many kinds of time series problems [Azoff 1994Niska et al 2004 Jain and Kumar 2007]

9

Chapter 3

Method and Materials

The purpose of this chapter is to provide an explanation of the method used inthis thesis It will present terms and concepts needed and used in the field ofMachine Learning with the focus on how to create forecasts based on time series Itpresents a motivational reason of why these methods are used as well as how theyare used This chapter also contains as a description of the implemented artificialneural network how it is structured and optimized It provides details about theexperimental setup and how these experiments relates to GEFCom and finally adescription on how the datasets is structured and processed

31 Preliminaries311 RemarksThe methodology used to evaluate the prediction models presented in this thesisis based on the protocol presented in Madsen et al [2005]1 which is a completeprotocol that can be used to evaluate the performance of short-term WPF andWind-to-Power (W2P) models The reason this protocol was chosen for this thesiswas because it has been successfully used before as a guideline to evaluate a widevariety of forecast models such as AWPPS Prediktor Previento Sipreoacutelico WPPTThe protocol has been used for both on-shore and off-shore wind farms and wasdeveloped in the frame of the ANEMOS research project [Kariniotakis et al 2006]which brought together many relevant groups involved in the field The aim ofANEMOS was to develop accurate and robust models that substantially outperformcurrent state-of-the-art methods and one of its goals was to establish a common set

1It should be pointed out that no widely agreed standardization exists

11

CHAPTER 3 METHOD AND MATERIALS

of performance measures that can be used to compare forecasts across systems andlocations

312 DefinitionsTime-series

A time series sequence X of observations xt with a particular time-stamp t ieX = [x1 x2 xt] or in short X1t = X1t is a time dependent collection ofvariables

Forecast horizon

A multi-step-ahead prediction is assignment of predicting k forecasts Xt+1t+k givena collection of historical observations Xtminusp+1t In the case of GEFCom we want aforecast for 1minus 48 steps ahead

Point or spot forecast

In this paper we model the forecast pt+k|t as a so called point forecast (or spotforecasts) ie a single value for each forecast (compared to having a probabilitydistribution)

Prediction error

The Prediction Error e for lead time t+ k is defined in equation 31 as the differencebetween the forecast and the actual value where t denote the time index and kis the look-ahead time p is the actualmeasuredtrue wind power and p is thepredicted wind power

et+k|t = pt+k minus pt+k|t (31)

and the normalized prediction error ε as seen in equation 32

εt+k|t = 1pinst

et+k|t = 1pinst

(pt+k minus pt+k|t) (32)

where pinst is the installed capacity of the wind farm (in kW or MW) which isunknown in the competition

12

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 6: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

NomenclatureIndices Constants an Variables

X1n A sequence of values X = [x1 x2 xn]k = 1 kmax Lead time or look-ahead timekmax Maximum prediction horizonN Total number of data pointse Prediction errorε Normalized prediction errorpt Measure of power generation at time t

pt+k|tForecast power generation made at time tfor look-ahead time t+ k

wij Weight of a synapse in the neural network row i layer jb binary values Weighted sum including bias of a perceptronx middot y dot product between x and yx y Row by row element-wise multiplication

Unit of measurements

MW MegawattsGW Gigawatts

Notes

Keys and Dates are given in this form Year Month Day Hour

Contents

Contents

1 Introduction 111 Problem Formulation 312 The scope of the problem 4

2 Background 521 Neural Networks and Time Series Prediction 8

3 Method and Materials 1131 Preliminaries 11

311 Remarks 11312 Definitions 12313 Reference models 13314 Error metrics 13315 Model selection 15316 Evaluation 15

32 Experiments 1633 Holdback Input Randomization 1834 Optimization methods 1835 Neural Networks 19

351 Multilayer Perceptron 19352 Numenta Platform for Intelligent Computing 22

4 Result 3141 Experimental results 4042 Input Importance 41

421 Adaptation and Optimization 4243 Summary 42

5 Discussion 4551 Method development issues 4552 Future improvements and directions 4653 Conclusions 47

Bibliography 49

Appendices 53

A Hyper-parameters 55

B Wind characteristics 59

C Error Distribution 61

List of Figures 76

List of Tables 78

Chapter 1

Introduction

It has been estimated by the World Wind Energy Association (WWEA) that by theyear of 2020 around 12 of the worldrsquos electricity will be available through windpower making wind energy one of the fastest growing energy resources [WWEA2014 Fan et al 2009] but integrating wind energy into existing electricity supplysystems has been a challenge and numerous objections have been put forwardby traditional energy suppliers and grid operators especially for large-scale useof this energy source The biggest concern is that availability mainly depends onmeteorological conditions and production cannot be adjusted as conveniently asother more conventional energy sources this is because of our inability to controlthe wind A single Wind Turbine (WT) is highly variable and its dependency onwind conditions can result in zero output for more than thousands of hours duringthe course of a year however aggregating wind power generation over bigger areasdecreases this chance

This is where wind power forecasting systems come into play a technology thatcan greatly improve the integration of wind energy into electricity supply systems asforecasting systems provide information on how much wind power can be expectedat any given point within the next few days This results in the removal of some therandomness attributed to wind energy and allows a more accurate way to utilizethis clean energy source while offsetting some of our dependency from other moreunfriendly environmentally sources which in the long turn will cause a smallerdegenerative impact on the environment

There are many commercial forecasting models available Prediktor1 [Landbergand Watson 1994] is a physical model developed by the Risoslash National LaboratoryDenmark It is constructed to refine Numerical Weather Prediction (NWP) in order

1httpwwwprediktorno

1

CHAPTER 1 INTRODUCTION

to transform data trough a power curve to produce the forecast while improving theerror rate with Model Output Statistics (MOS) Previento [Focken et al 2001] devel-oped in the University Oldenburg Germany uses a similar approach to Prediktor butwith regional forecasting and uncertainty estimation The Wind Power PredictionTool (WPPT)2 [Nielsen et al 2002] is a statistical model developed by TechnicalUniversity of Denmark and it consists of semi-parametric power curve model forwind farms taking into account both wind speed and direction It uses dynamicalpredictions models describing the dynamics of the wind power and any diurnalvariations Zephyr [Giebel et al 2000] is a hybrid model that is a combination ofboth the WPPT and Prediktor in this model each wind farm is assigned a forecastmodel according to the available data Sipreoacutelico [Gonzalez et al 2004] developedby Red Eleacutectrica de Espantildea is a statistical model that was designed in order to behighly flexible depending on available data It achieves this by switching between 9different models Aiolos Wind3 is a hybrid model developed by Vitec that createsforecasts by combining a statistical model with physical factors such as wind speedat different altitudes with wind direction and air density

Expektra4 is a serviced-based company founded in 2010 that provide and developa new method for short-term demand and supply forecasting based on an ArtificialNeural Network (ANN) and traditional time series analysis They are currentlyexpanding into the area of wind power forecasting and looking for suitable methodsANN have been used successfully on both wind speed forecasting [Lawan et al2014] and wind power forecasting [Kariniotakis et al 1996] It was demonstratedin Liu et al [2012] that a complex valued recurrent neural network was able topredict output with high accuracy and Huang et al [2015] showed that the initialweights being optimized with a genetic algorithm in a back propagated neuralnetwork gave impressive results so there are good reasons to think that part of themethod Expektra are developing can be used for wind power forecasting

Neural networks in general are highly flexible and recent advancements indeep learning have shown to outperform previous models in different domains[Schmidhuber 2015] These networks are very good at automatically findingfeatures that by hand would take a lot of time and effort to achieve One networkthat shares similarities with deep learning and have received less attention isthe Cortical Learning Algorithm (CLA) Hierarchical Temporal Memory (HTM)developed by Numenta [Numenta 2011] This network is also built around the ideaof having hierarchical structures creating a deep neural network CLA HTM is

2httpwwwenforeu3httpwwwvitecsoftwarecom4httpwwwexpektrase

2

11 PROBLEM FORMULATION

currently tailored very specifically for time-series problems and have at the momentlittle research published around it making it an ideal candidate for time seriesprediction with unknown potential on wind power forecasting problems

Even though there is a lot of prominent methods already developed for windpower forecasting there are still room for improvements Energy forecasting is suchan important topic that competitions have been developed The Global EnergyForecasting Competition (GEFCom) [Hong et al 2014] is a competition that canbe used to help evaluate the performance of new forecasting models It is acompetition that has attracted hundreds of contestants from all over the worldwhich has resulted in the contribution of many new and novel ideas The GEFComwas created in order to

1 Bring together state-of-the-art techniques for energy forecasting

2 Bridge the gap between academic research and industry practice

3 Promote analytical approaches in power energy education

4 Prepare the industry to overcome forecasting challenges posed by the smartgrid world

5 Improve energy forecasting practices

Benchmark datasets and competitions are a valuable source when evaluating newmodels as it allows for a common ground to stand on With the publication of Honget al [2014] the dataset in GEFCom2012 was also published This dataset consistsof data from 7 different wind farms that span a time period of tree years It consistsof observational data from the energy production and weather forecasts

The GEFCom is divided into four tracks load forecasting price forecastingwind power forecasting and solar power forecasting The specified problem in thewind power forecasting track is built around the real-time operation of wind farmsand the dataset is structured accordingly Participants try to predict hourly powergeneration up to 1minus 48 hours ahead given meteorological forecasts and historicallyproduced power

11 Problem FormulationThe objective of this thesis is to adapt and analyse the robustness and suitability ofExpektras method when it is applied to the domain of forecasting short-term wind

3

CHAPTER 1 INTRODUCTION

power generation ie What is needed to adapt this model to wind power forecastingproblems

The second objective of this thesis will be to investigate HTM CLA [Numenta2011] a modern state-of-the-art computational theory of the neocortex ie Is itpossible to use Numenta Platform for Intelligent Computing (NuPIC) for wind powerforecasting problems

These questions will be addressed by evaluating the models against other modelspublished in the area of short-term wind power forecasting more specifically thosethat have been published in GEFCom

12 The scope of the problem1 This study will focus on short-term forecasting ie forecasts done for 1minus 48

hours ahead How to preform well on longer forecasts are left to furtherinvestigations

2 Wind power forecasting is closely related to wind speed forecasting ie tryingto use local information at the wind turbine instead of using nwp data toforecast wind power This thesis will not go into any specific details on how-toforecast wind speeds Readers can take a look at Li and Shi [2010] Cadenasand Rivera [2009] Akinci [2015]

3 There are a lot of neural networks that is worth studying but this thesis willfocus on Expektrarsquos method and HTMCLA inside NuPIC NuPIC was pickedover other deep learning methods because its direct relation to time seriesprediction

4

Chapter 2

Background

Wind Power Forecasting (WPF) has applications in generation and transmissionmaintenance planning energy optimization as well as in energy trading WPFmodels exists at different scales and they can be used to predict the productionfor a single WT to a whole Wind Farm (WF) WPF models are generally dividedinto two main groups physical and statistical but hybrids state-of-the-art methodsare also common [Giebel et al 2001 Lang et al 2006] The physical approach[Landberg and Watson 1994 Gaertner et al 2003] focus on integrating well-knownphysical aspects into the model such-as information about surrounding terrain andproperties of the WT These models try to get as good estimate of local wind speedas possible before finally reducing the remaining error with some form of MOSStatistical approaches [Rodrigues et al 2007 Gonzalez et al 2004] relies moreon historical observations and their statistical relation to meteorological predictionsas well as measurements from Supervisory Control And Data Acquisition (SCADA)

SCADA is a control system that is used in most industrial processes and is themain tool to evaluate the state of individual power plants As the name indicates itis not a full control system but rather a system for supervision In the case of windturbines it allows remote access to online data that has been gathered by sensorsinside and around the plant Variables accessible through these systems are thosedirectly related to the wind flow as well as the produced power generated by theplant ie things like rotor wind speed nacelle position pitch angle active poweretc

5

CHAPTER 2 BACKGROUND

Figure 21 A figure that presents the general outline when forecasting using thestatistical approach

Forecast models that use SCADA data as their primary input source usually have agood forecast accuracy at least for the first few hours but they tend to be less usefulfor longer prediction horizons [Giebel et al 2011] SCADA data can also be usedto detect problems in WT something that can be helpful to improve the reliability ofWT [Yang et al 2013 Wang et al 2014]

Statistical models seen in figure 21 are usually built around Numerical WeatherPrediction (NWP) and SCADA data NWP models are often used to build forecastingmodels as they introduce weather forecasts for the region where the wind turbinesare located NWP data usually contains information about thing like wind speedwind direction temperature and humidity These models are operated twice orfour times a day by a number of large weather services The main forecasts usuallystarts at 00 and 12 UTC corresponding to the world radiosonde launching which isthe only direct observation of the atmospheric state and has been the backbone ofatmospheric monitoring for many decades1 extra forecasts usually start at 06 and18 UTC Physical models as those seen in figure 22 include additional informationabout physical characteristics of the wind turbine and its surrounding ie terraindata information about obstacles capacity and layout were the turbine is locatedand so on Other useful information for physical models include the theoretical

1some problems with this approach is that it results with less information over large oceans andpoorer countries

6

power curve how much power is expected to be produced given a specific windspeed

The time scale of WPF methods are generally divided into 3 main groups Very-short-term (up to 9 hours) short-term (up to 72 hours) medium-term (up to 7days) [Costa et al 2008] while the time step for these models are in the rangeof seconds to days depending on its application Very short-term models used forwind power forecasting consists of statistical methods like Kalman Filters Auto-Regressive Moving Average (ARMA) Auto-Regressive with Exogenous Input (ARX)Box-Jenkins etc Input to these models are historical observations of wind speedwind directions temperature etc and common applications for this forecast horizonincludes things like intraday market trading Since these methods are merely basedon past production they are generally not useful for longer horizons

SCADAData

Physical Model

NWPData

WFCData

Forecastwind power generation

Downscaling

Transformation to Hub Height

Spatial Refinements

Conversion to power

WT Power Curve

Model Output Statistics (MOS)

Figure 22 A figure that presents the general steps when forecasting using aphysical model

Short-term forecasting models includes additional data about historical observationsusually obtained from SCADA systems but more importantly weather forecastsfrom NWP models Machine learning methods used in this area include NeuralNetworks [Kusiak et al 2009] Support Vector Machines [Fugon et al 2008]Nearest Neighbour Search [Jursa et al 2007] Random Forests etc Medium-termforecasting models usually incorporate all the methods used for shorter forecasts aswell as physical characteristics

7

CHAPTER 2 BACKGROUND

21 Neural Networks and Time Series PredictionOne of the most common ANN is the so called Multilayer Perceptron (MLP) a net-work built around some very simple properties of a cortical neuron This networkhave been around since the 50rsquos and have matured with a solid mathematical foun-dation This network have been applied successfully to many different applicationsIt is built around the idea of having multiple layers of neurons Each layer feedinginformation forward to the next layer ie a feed forward neural networks

The main advantage of using a multiple layer instead of just a single layer one isto overcome limitations pointed out by Minsky and Selfridge [1960] Minsky andSeymour [1969] were a single layer perceptron is only capable of learning linearlyseparable patterns and thus unable to learn all functions The MLP is not limitedin this way and has been shown to be able to represent a wide variety of functionsgiven appropriate parameters [Hornik et al 1989]

NuPIC is a platform actively developed and maintained by Numenta It is aplatform that introduce a collection of ideas and algorithms that are inspired bythe structural organization of the neocortex Two of the core concepts found insideNuPIC are the so called Hierarchical Temporal Memory and the Cortical LearningAlgorithm HTM was introduced in Hawkins and Blakeslee [2007] and refers to ahierarchy of cortical regions in the brain

The neocortex is classically divided into 6 different layers2 the current imple-mentation in NuPIC is focused on emulating layer 3-4 (specifically layer 3) withextensions and research code being done to include more layers

A CLA region consists of a collection of columns and each column consists of ahandful of cells This structure is based on the minicolumn hypothesis [Buxhoevedenand Casanova 2002] Each column in the CLA region have its own semanticmeaning and the sparse activity of a handful of active columns will tell us somethingabout the input The CLA is modelled so that specific cells within each columnreflect a temporal context of a pattern and a single cortical region (a CLA region) istrained with the CLA algorithm

A typical CLA region consists of around 2K columns containing around 60Kneurons in total while a typical MLP may have less then 100 neurons Each neuronin a HTM network grow new synapses over time and its not uncommon to havearound 5K synapses per neuron meaning we would have around 300M synapses intotal A single region of this size uses around 100 MB of memory and it will takearound 10 msec to do 1 inference and learning step3

2These layers should not to be confused with the hierarchy of regions3These values are given in a talk by ldquoSensor-Motor Integration in the Neocortex 2013 Hackathonrdquo

8

21 NEURAL NETWORKS AND TIME SERIES PREDICTION

The HTMCLA also differs from the MLP in that it does not directly use scalarweights to represent synaptic connectivity Synapses inside a HTM have binaryweights with a scalar permanence each synapse will either be connected or discon-nected while MLP uses scalar weights The connectedness in HTMCLA is basedon a permanence which is a value between 00 and 10 If the permanence is overcertain threshold we have a connected synapse ie we have weights of either 1 or0 This binary mechanism is there to simulate synapses that are able to form andunform during learning So essentially we have a weight change network vs wiringchange network [Chklovskii et al 2004]

NuPIC today is naturally geared towards time-series prediction which makes itan interesting candidate for wind forecasting NuPIC have been shown to outper-form MLP on financial time series [Gabrielsson et al 2013] and it has been usedsuccessfully to balance traffic [Sinkevicius et al 2011] It has also been used on theNASA Aviation Dataset [Lee and Rajabi 2014] but with less promising result MLPhave been used successfully for many kinds of time series problems [Azoff 1994Niska et al 2004 Jain and Kumar 2007]

9

Chapter 3

Method and Materials

The purpose of this chapter is to provide an explanation of the method used inthis thesis It will present terms and concepts needed and used in the field ofMachine Learning with the focus on how to create forecasts based on time series Itpresents a motivational reason of why these methods are used as well as how theyare used This chapter also contains as a description of the implemented artificialneural network how it is structured and optimized It provides details about theexperimental setup and how these experiments relates to GEFCom and finally adescription on how the datasets is structured and processed

31 Preliminaries311 RemarksThe methodology used to evaluate the prediction models presented in this thesisis based on the protocol presented in Madsen et al [2005]1 which is a completeprotocol that can be used to evaluate the performance of short-term WPF andWind-to-Power (W2P) models The reason this protocol was chosen for this thesiswas because it has been successfully used before as a guideline to evaluate a widevariety of forecast models such as AWPPS Prediktor Previento Sipreoacutelico WPPTThe protocol has been used for both on-shore and off-shore wind farms and wasdeveloped in the frame of the ANEMOS research project [Kariniotakis et al 2006]which brought together many relevant groups involved in the field The aim ofANEMOS was to develop accurate and robust models that substantially outperformcurrent state-of-the-art methods and one of its goals was to establish a common set

1It should be pointed out that no widely agreed standardization exists

11

CHAPTER 3 METHOD AND MATERIALS

of performance measures that can be used to compare forecasts across systems andlocations

312 DefinitionsTime-series

A time series sequence X of observations xt with a particular time-stamp t ieX = [x1 x2 xt] or in short X1t = X1t is a time dependent collection ofvariables

Forecast horizon

A multi-step-ahead prediction is assignment of predicting k forecasts Xt+1t+k givena collection of historical observations Xtminusp+1t In the case of GEFCom we want aforecast for 1minus 48 steps ahead

Point or spot forecast

In this paper we model the forecast pt+k|t as a so called point forecast (or spotforecasts) ie a single value for each forecast (compared to having a probabilitydistribution)

Prediction error

The Prediction Error e for lead time t+ k is defined in equation 31 as the differencebetween the forecast and the actual value where t denote the time index and kis the look-ahead time p is the actualmeasuredtrue wind power and p is thepredicted wind power

et+k|t = pt+k minus pt+k|t (31)

and the normalized prediction error ε as seen in equation 32

εt+k|t = 1pinst

et+k|t = 1pinst

(pt+k minus pt+k|t) (32)

where pinst is the installed capacity of the wind farm (in kW or MW) which isunknown in the competition

12

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 7: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

Contents

Contents

1 Introduction 111 Problem Formulation 312 The scope of the problem 4

2 Background 521 Neural Networks and Time Series Prediction 8

3 Method and Materials 1131 Preliminaries 11

311 Remarks 11312 Definitions 12313 Reference models 13314 Error metrics 13315 Model selection 15316 Evaluation 15

32 Experiments 1633 Holdback Input Randomization 1834 Optimization methods 1835 Neural Networks 19

351 Multilayer Perceptron 19352 Numenta Platform for Intelligent Computing 22

4 Result 3141 Experimental results 4042 Input Importance 41

421 Adaptation and Optimization 4243 Summary 42

5 Discussion 4551 Method development issues 4552 Future improvements and directions 4653 Conclusions 47

Bibliography 49

Appendices 53

A Hyper-parameters 55

B Wind characteristics 59

C Error Distribution 61

List of Figures 76

List of Tables 78

Chapter 1

Introduction

It has been estimated by the World Wind Energy Association (WWEA) that by theyear of 2020 around 12 of the worldrsquos electricity will be available through windpower making wind energy one of the fastest growing energy resources [WWEA2014 Fan et al 2009] but integrating wind energy into existing electricity supplysystems has been a challenge and numerous objections have been put forwardby traditional energy suppliers and grid operators especially for large-scale useof this energy source The biggest concern is that availability mainly depends onmeteorological conditions and production cannot be adjusted as conveniently asother more conventional energy sources this is because of our inability to controlthe wind A single Wind Turbine (WT) is highly variable and its dependency onwind conditions can result in zero output for more than thousands of hours duringthe course of a year however aggregating wind power generation over bigger areasdecreases this chance

This is where wind power forecasting systems come into play a technology thatcan greatly improve the integration of wind energy into electricity supply systems asforecasting systems provide information on how much wind power can be expectedat any given point within the next few days This results in the removal of some therandomness attributed to wind energy and allows a more accurate way to utilizethis clean energy source while offsetting some of our dependency from other moreunfriendly environmentally sources which in the long turn will cause a smallerdegenerative impact on the environment

There are many commercial forecasting models available Prediktor1 [Landbergand Watson 1994] is a physical model developed by the Risoslash National LaboratoryDenmark It is constructed to refine Numerical Weather Prediction (NWP) in order

1httpwwwprediktorno

1

CHAPTER 1 INTRODUCTION

to transform data trough a power curve to produce the forecast while improving theerror rate with Model Output Statistics (MOS) Previento [Focken et al 2001] devel-oped in the University Oldenburg Germany uses a similar approach to Prediktor butwith regional forecasting and uncertainty estimation The Wind Power PredictionTool (WPPT)2 [Nielsen et al 2002] is a statistical model developed by TechnicalUniversity of Denmark and it consists of semi-parametric power curve model forwind farms taking into account both wind speed and direction It uses dynamicalpredictions models describing the dynamics of the wind power and any diurnalvariations Zephyr [Giebel et al 2000] is a hybrid model that is a combination ofboth the WPPT and Prediktor in this model each wind farm is assigned a forecastmodel according to the available data Sipreoacutelico [Gonzalez et al 2004] developedby Red Eleacutectrica de Espantildea is a statistical model that was designed in order to behighly flexible depending on available data It achieves this by switching between 9different models Aiolos Wind3 is a hybrid model developed by Vitec that createsforecasts by combining a statistical model with physical factors such as wind speedat different altitudes with wind direction and air density

Expektra4 is a serviced-based company founded in 2010 that provide and developa new method for short-term demand and supply forecasting based on an ArtificialNeural Network (ANN) and traditional time series analysis They are currentlyexpanding into the area of wind power forecasting and looking for suitable methodsANN have been used successfully on both wind speed forecasting [Lawan et al2014] and wind power forecasting [Kariniotakis et al 1996] It was demonstratedin Liu et al [2012] that a complex valued recurrent neural network was able topredict output with high accuracy and Huang et al [2015] showed that the initialweights being optimized with a genetic algorithm in a back propagated neuralnetwork gave impressive results so there are good reasons to think that part of themethod Expektra are developing can be used for wind power forecasting

Neural networks in general are highly flexible and recent advancements indeep learning have shown to outperform previous models in different domains[Schmidhuber 2015] These networks are very good at automatically findingfeatures that by hand would take a lot of time and effort to achieve One networkthat shares similarities with deep learning and have received less attention isthe Cortical Learning Algorithm (CLA) Hierarchical Temporal Memory (HTM)developed by Numenta [Numenta 2011] This network is also built around the ideaof having hierarchical structures creating a deep neural network CLA HTM is

2httpwwwenforeu3httpwwwvitecsoftwarecom4httpwwwexpektrase

2

11 PROBLEM FORMULATION

currently tailored very specifically for time-series problems and have at the momentlittle research published around it making it an ideal candidate for time seriesprediction with unknown potential on wind power forecasting problems

Even though there is a lot of prominent methods already developed for windpower forecasting there are still room for improvements Energy forecasting is suchan important topic that competitions have been developed The Global EnergyForecasting Competition (GEFCom) [Hong et al 2014] is a competition that canbe used to help evaluate the performance of new forecasting models It is acompetition that has attracted hundreds of contestants from all over the worldwhich has resulted in the contribution of many new and novel ideas The GEFComwas created in order to

1 Bring together state-of-the-art techniques for energy forecasting

2 Bridge the gap between academic research and industry practice

3 Promote analytical approaches in power energy education

4 Prepare the industry to overcome forecasting challenges posed by the smartgrid world

5 Improve energy forecasting practices

Benchmark datasets and competitions are a valuable source when evaluating newmodels as it allows for a common ground to stand on With the publication of Honget al [2014] the dataset in GEFCom2012 was also published This dataset consistsof data from 7 different wind farms that span a time period of tree years It consistsof observational data from the energy production and weather forecasts

The GEFCom is divided into four tracks load forecasting price forecastingwind power forecasting and solar power forecasting The specified problem in thewind power forecasting track is built around the real-time operation of wind farmsand the dataset is structured accordingly Participants try to predict hourly powergeneration up to 1minus 48 hours ahead given meteorological forecasts and historicallyproduced power

11 Problem FormulationThe objective of this thesis is to adapt and analyse the robustness and suitability ofExpektras method when it is applied to the domain of forecasting short-term wind

3

CHAPTER 1 INTRODUCTION

power generation ie What is needed to adapt this model to wind power forecastingproblems

The second objective of this thesis will be to investigate HTM CLA [Numenta2011] a modern state-of-the-art computational theory of the neocortex ie Is itpossible to use Numenta Platform for Intelligent Computing (NuPIC) for wind powerforecasting problems

These questions will be addressed by evaluating the models against other modelspublished in the area of short-term wind power forecasting more specifically thosethat have been published in GEFCom

12 The scope of the problem1 This study will focus on short-term forecasting ie forecasts done for 1minus 48

hours ahead How to preform well on longer forecasts are left to furtherinvestigations

2 Wind power forecasting is closely related to wind speed forecasting ie tryingto use local information at the wind turbine instead of using nwp data toforecast wind power This thesis will not go into any specific details on how-toforecast wind speeds Readers can take a look at Li and Shi [2010] Cadenasand Rivera [2009] Akinci [2015]

3 There are a lot of neural networks that is worth studying but this thesis willfocus on Expektrarsquos method and HTMCLA inside NuPIC NuPIC was pickedover other deep learning methods because its direct relation to time seriesprediction

4

Chapter 2

Background

Wind Power Forecasting (WPF) has applications in generation and transmissionmaintenance planning energy optimization as well as in energy trading WPFmodels exists at different scales and they can be used to predict the productionfor a single WT to a whole Wind Farm (WF) WPF models are generally dividedinto two main groups physical and statistical but hybrids state-of-the-art methodsare also common [Giebel et al 2001 Lang et al 2006] The physical approach[Landberg and Watson 1994 Gaertner et al 2003] focus on integrating well-knownphysical aspects into the model such-as information about surrounding terrain andproperties of the WT These models try to get as good estimate of local wind speedas possible before finally reducing the remaining error with some form of MOSStatistical approaches [Rodrigues et al 2007 Gonzalez et al 2004] relies moreon historical observations and their statistical relation to meteorological predictionsas well as measurements from Supervisory Control And Data Acquisition (SCADA)

SCADA is a control system that is used in most industrial processes and is themain tool to evaluate the state of individual power plants As the name indicates itis not a full control system but rather a system for supervision In the case of windturbines it allows remote access to online data that has been gathered by sensorsinside and around the plant Variables accessible through these systems are thosedirectly related to the wind flow as well as the produced power generated by theplant ie things like rotor wind speed nacelle position pitch angle active poweretc

5

CHAPTER 2 BACKGROUND

Figure 21 A figure that presents the general outline when forecasting using thestatistical approach

Forecast models that use SCADA data as their primary input source usually have agood forecast accuracy at least for the first few hours but they tend to be less usefulfor longer prediction horizons [Giebel et al 2011] SCADA data can also be usedto detect problems in WT something that can be helpful to improve the reliability ofWT [Yang et al 2013 Wang et al 2014]

Statistical models seen in figure 21 are usually built around Numerical WeatherPrediction (NWP) and SCADA data NWP models are often used to build forecastingmodels as they introduce weather forecasts for the region where the wind turbinesare located NWP data usually contains information about thing like wind speedwind direction temperature and humidity These models are operated twice orfour times a day by a number of large weather services The main forecasts usuallystarts at 00 and 12 UTC corresponding to the world radiosonde launching which isthe only direct observation of the atmospheric state and has been the backbone ofatmospheric monitoring for many decades1 extra forecasts usually start at 06 and18 UTC Physical models as those seen in figure 22 include additional informationabout physical characteristics of the wind turbine and its surrounding ie terraindata information about obstacles capacity and layout were the turbine is locatedand so on Other useful information for physical models include the theoretical

1some problems with this approach is that it results with less information over large oceans andpoorer countries

6

power curve how much power is expected to be produced given a specific windspeed

The time scale of WPF methods are generally divided into 3 main groups Very-short-term (up to 9 hours) short-term (up to 72 hours) medium-term (up to 7days) [Costa et al 2008] while the time step for these models are in the rangeof seconds to days depending on its application Very short-term models used forwind power forecasting consists of statistical methods like Kalman Filters Auto-Regressive Moving Average (ARMA) Auto-Regressive with Exogenous Input (ARX)Box-Jenkins etc Input to these models are historical observations of wind speedwind directions temperature etc and common applications for this forecast horizonincludes things like intraday market trading Since these methods are merely basedon past production they are generally not useful for longer horizons

SCADAData

Physical Model

NWPData

WFCData

Forecastwind power generation

Downscaling

Transformation to Hub Height

Spatial Refinements

Conversion to power

WT Power Curve

Model Output Statistics (MOS)

Figure 22 A figure that presents the general steps when forecasting using aphysical model

Short-term forecasting models includes additional data about historical observationsusually obtained from SCADA systems but more importantly weather forecastsfrom NWP models Machine learning methods used in this area include NeuralNetworks [Kusiak et al 2009] Support Vector Machines [Fugon et al 2008]Nearest Neighbour Search [Jursa et al 2007] Random Forests etc Medium-termforecasting models usually incorporate all the methods used for shorter forecasts aswell as physical characteristics

7

CHAPTER 2 BACKGROUND

21 Neural Networks and Time Series PredictionOne of the most common ANN is the so called Multilayer Perceptron (MLP) a net-work built around some very simple properties of a cortical neuron This networkhave been around since the 50rsquos and have matured with a solid mathematical foun-dation This network have been applied successfully to many different applicationsIt is built around the idea of having multiple layers of neurons Each layer feedinginformation forward to the next layer ie a feed forward neural networks

The main advantage of using a multiple layer instead of just a single layer one isto overcome limitations pointed out by Minsky and Selfridge [1960] Minsky andSeymour [1969] were a single layer perceptron is only capable of learning linearlyseparable patterns and thus unable to learn all functions The MLP is not limitedin this way and has been shown to be able to represent a wide variety of functionsgiven appropriate parameters [Hornik et al 1989]

NuPIC is a platform actively developed and maintained by Numenta It is aplatform that introduce a collection of ideas and algorithms that are inspired bythe structural organization of the neocortex Two of the core concepts found insideNuPIC are the so called Hierarchical Temporal Memory and the Cortical LearningAlgorithm HTM was introduced in Hawkins and Blakeslee [2007] and refers to ahierarchy of cortical regions in the brain

The neocortex is classically divided into 6 different layers2 the current imple-mentation in NuPIC is focused on emulating layer 3-4 (specifically layer 3) withextensions and research code being done to include more layers

A CLA region consists of a collection of columns and each column consists of ahandful of cells This structure is based on the minicolumn hypothesis [Buxhoevedenand Casanova 2002] Each column in the CLA region have its own semanticmeaning and the sparse activity of a handful of active columns will tell us somethingabout the input The CLA is modelled so that specific cells within each columnreflect a temporal context of a pattern and a single cortical region (a CLA region) istrained with the CLA algorithm

A typical CLA region consists of around 2K columns containing around 60Kneurons in total while a typical MLP may have less then 100 neurons Each neuronin a HTM network grow new synapses over time and its not uncommon to havearound 5K synapses per neuron meaning we would have around 300M synapses intotal A single region of this size uses around 100 MB of memory and it will takearound 10 msec to do 1 inference and learning step3

2These layers should not to be confused with the hierarchy of regions3These values are given in a talk by ldquoSensor-Motor Integration in the Neocortex 2013 Hackathonrdquo

8

21 NEURAL NETWORKS AND TIME SERIES PREDICTION

The HTMCLA also differs from the MLP in that it does not directly use scalarweights to represent synaptic connectivity Synapses inside a HTM have binaryweights with a scalar permanence each synapse will either be connected or discon-nected while MLP uses scalar weights The connectedness in HTMCLA is basedon a permanence which is a value between 00 and 10 If the permanence is overcertain threshold we have a connected synapse ie we have weights of either 1 or0 This binary mechanism is there to simulate synapses that are able to form andunform during learning So essentially we have a weight change network vs wiringchange network [Chklovskii et al 2004]

NuPIC today is naturally geared towards time-series prediction which makes itan interesting candidate for wind forecasting NuPIC have been shown to outper-form MLP on financial time series [Gabrielsson et al 2013] and it has been usedsuccessfully to balance traffic [Sinkevicius et al 2011] It has also been used on theNASA Aviation Dataset [Lee and Rajabi 2014] but with less promising result MLPhave been used successfully for many kinds of time series problems [Azoff 1994Niska et al 2004 Jain and Kumar 2007]

9

Chapter 3

Method and Materials

The purpose of this chapter is to provide an explanation of the method used inthis thesis It will present terms and concepts needed and used in the field ofMachine Learning with the focus on how to create forecasts based on time series Itpresents a motivational reason of why these methods are used as well as how theyare used This chapter also contains as a description of the implemented artificialneural network how it is structured and optimized It provides details about theexperimental setup and how these experiments relates to GEFCom and finally adescription on how the datasets is structured and processed

31 Preliminaries311 RemarksThe methodology used to evaluate the prediction models presented in this thesisis based on the protocol presented in Madsen et al [2005]1 which is a completeprotocol that can be used to evaluate the performance of short-term WPF andWind-to-Power (W2P) models The reason this protocol was chosen for this thesiswas because it has been successfully used before as a guideline to evaluate a widevariety of forecast models such as AWPPS Prediktor Previento Sipreoacutelico WPPTThe protocol has been used for both on-shore and off-shore wind farms and wasdeveloped in the frame of the ANEMOS research project [Kariniotakis et al 2006]which brought together many relevant groups involved in the field The aim ofANEMOS was to develop accurate and robust models that substantially outperformcurrent state-of-the-art methods and one of its goals was to establish a common set

1It should be pointed out that no widely agreed standardization exists

11

CHAPTER 3 METHOD AND MATERIALS

of performance measures that can be used to compare forecasts across systems andlocations

312 DefinitionsTime-series

A time series sequence X of observations xt with a particular time-stamp t ieX = [x1 x2 xt] or in short X1t = X1t is a time dependent collection ofvariables

Forecast horizon

A multi-step-ahead prediction is assignment of predicting k forecasts Xt+1t+k givena collection of historical observations Xtminusp+1t In the case of GEFCom we want aforecast for 1minus 48 steps ahead

Point or spot forecast

In this paper we model the forecast pt+k|t as a so called point forecast (or spotforecasts) ie a single value for each forecast (compared to having a probabilitydistribution)

Prediction error

The Prediction Error e for lead time t+ k is defined in equation 31 as the differencebetween the forecast and the actual value where t denote the time index and kis the look-ahead time p is the actualmeasuredtrue wind power and p is thepredicted wind power

et+k|t = pt+k minus pt+k|t (31)

and the normalized prediction error ε as seen in equation 32

εt+k|t = 1pinst

et+k|t = 1pinst

(pt+k minus pt+k|t) (32)

where pinst is the installed capacity of the wind farm (in kW or MW) which isunknown in the competition

12

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 8: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

5 Discussion 4551 Method development issues 4552 Future improvements and directions 4653 Conclusions 47

Bibliography 49

Appendices 53

A Hyper-parameters 55

B Wind characteristics 59

C Error Distribution 61

List of Figures 76

List of Tables 78

Chapter 1

Introduction

It has been estimated by the World Wind Energy Association (WWEA) that by theyear of 2020 around 12 of the worldrsquos electricity will be available through windpower making wind energy one of the fastest growing energy resources [WWEA2014 Fan et al 2009] but integrating wind energy into existing electricity supplysystems has been a challenge and numerous objections have been put forwardby traditional energy suppliers and grid operators especially for large-scale useof this energy source The biggest concern is that availability mainly depends onmeteorological conditions and production cannot be adjusted as conveniently asother more conventional energy sources this is because of our inability to controlthe wind A single Wind Turbine (WT) is highly variable and its dependency onwind conditions can result in zero output for more than thousands of hours duringthe course of a year however aggregating wind power generation over bigger areasdecreases this chance

This is where wind power forecasting systems come into play a technology thatcan greatly improve the integration of wind energy into electricity supply systems asforecasting systems provide information on how much wind power can be expectedat any given point within the next few days This results in the removal of some therandomness attributed to wind energy and allows a more accurate way to utilizethis clean energy source while offsetting some of our dependency from other moreunfriendly environmentally sources which in the long turn will cause a smallerdegenerative impact on the environment

There are many commercial forecasting models available Prediktor1 [Landbergand Watson 1994] is a physical model developed by the Risoslash National LaboratoryDenmark It is constructed to refine Numerical Weather Prediction (NWP) in order

1httpwwwprediktorno

1

CHAPTER 1 INTRODUCTION

to transform data trough a power curve to produce the forecast while improving theerror rate with Model Output Statistics (MOS) Previento [Focken et al 2001] devel-oped in the University Oldenburg Germany uses a similar approach to Prediktor butwith regional forecasting and uncertainty estimation The Wind Power PredictionTool (WPPT)2 [Nielsen et al 2002] is a statistical model developed by TechnicalUniversity of Denmark and it consists of semi-parametric power curve model forwind farms taking into account both wind speed and direction It uses dynamicalpredictions models describing the dynamics of the wind power and any diurnalvariations Zephyr [Giebel et al 2000] is a hybrid model that is a combination ofboth the WPPT and Prediktor in this model each wind farm is assigned a forecastmodel according to the available data Sipreoacutelico [Gonzalez et al 2004] developedby Red Eleacutectrica de Espantildea is a statistical model that was designed in order to behighly flexible depending on available data It achieves this by switching between 9different models Aiolos Wind3 is a hybrid model developed by Vitec that createsforecasts by combining a statistical model with physical factors such as wind speedat different altitudes with wind direction and air density

Expektra4 is a serviced-based company founded in 2010 that provide and developa new method for short-term demand and supply forecasting based on an ArtificialNeural Network (ANN) and traditional time series analysis They are currentlyexpanding into the area of wind power forecasting and looking for suitable methodsANN have been used successfully on both wind speed forecasting [Lawan et al2014] and wind power forecasting [Kariniotakis et al 1996] It was demonstratedin Liu et al [2012] that a complex valued recurrent neural network was able topredict output with high accuracy and Huang et al [2015] showed that the initialweights being optimized with a genetic algorithm in a back propagated neuralnetwork gave impressive results so there are good reasons to think that part of themethod Expektra are developing can be used for wind power forecasting

Neural networks in general are highly flexible and recent advancements indeep learning have shown to outperform previous models in different domains[Schmidhuber 2015] These networks are very good at automatically findingfeatures that by hand would take a lot of time and effort to achieve One networkthat shares similarities with deep learning and have received less attention isthe Cortical Learning Algorithm (CLA) Hierarchical Temporal Memory (HTM)developed by Numenta [Numenta 2011] This network is also built around the ideaof having hierarchical structures creating a deep neural network CLA HTM is

2httpwwwenforeu3httpwwwvitecsoftwarecom4httpwwwexpektrase

2

11 PROBLEM FORMULATION

currently tailored very specifically for time-series problems and have at the momentlittle research published around it making it an ideal candidate for time seriesprediction with unknown potential on wind power forecasting problems

Even though there is a lot of prominent methods already developed for windpower forecasting there are still room for improvements Energy forecasting is suchan important topic that competitions have been developed The Global EnergyForecasting Competition (GEFCom) [Hong et al 2014] is a competition that canbe used to help evaluate the performance of new forecasting models It is acompetition that has attracted hundreds of contestants from all over the worldwhich has resulted in the contribution of many new and novel ideas The GEFComwas created in order to

1 Bring together state-of-the-art techniques for energy forecasting

2 Bridge the gap between academic research and industry practice

3 Promote analytical approaches in power energy education

4 Prepare the industry to overcome forecasting challenges posed by the smartgrid world

5 Improve energy forecasting practices

Benchmark datasets and competitions are a valuable source when evaluating newmodels as it allows for a common ground to stand on With the publication of Honget al [2014] the dataset in GEFCom2012 was also published This dataset consistsof data from 7 different wind farms that span a time period of tree years It consistsof observational data from the energy production and weather forecasts

The GEFCom is divided into four tracks load forecasting price forecastingwind power forecasting and solar power forecasting The specified problem in thewind power forecasting track is built around the real-time operation of wind farmsand the dataset is structured accordingly Participants try to predict hourly powergeneration up to 1minus 48 hours ahead given meteorological forecasts and historicallyproduced power

11 Problem FormulationThe objective of this thesis is to adapt and analyse the robustness and suitability ofExpektras method when it is applied to the domain of forecasting short-term wind

3

CHAPTER 1 INTRODUCTION

power generation ie What is needed to adapt this model to wind power forecastingproblems

The second objective of this thesis will be to investigate HTM CLA [Numenta2011] a modern state-of-the-art computational theory of the neocortex ie Is itpossible to use Numenta Platform for Intelligent Computing (NuPIC) for wind powerforecasting problems

These questions will be addressed by evaluating the models against other modelspublished in the area of short-term wind power forecasting more specifically thosethat have been published in GEFCom

12 The scope of the problem1 This study will focus on short-term forecasting ie forecasts done for 1minus 48

hours ahead How to preform well on longer forecasts are left to furtherinvestigations

2 Wind power forecasting is closely related to wind speed forecasting ie tryingto use local information at the wind turbine instead of using nwp data toforecast wind power This thesis will not go into any specific details on how-toforecast wind speeds Readers can take a look at Li and Shi [2010] Cadenasand Rivera [2009] Akinci [2015]

3 There are a lot of neural networks that is worth studying but this thesis willfocus on Expektrarsquos method and HTMCLA inside NuPIC NuPIC was pickedover other deep learning methods because its direct relation to time seriesprediction

4

Chapter 2

Background

Wind Power Forecasting (WPF) has applications in generation and transmissionmaintenance planning energy optimization as well as in energy trading WPFmodels exists at different scales and they can be used to predict the productionfor a single WT to a whole Wind Farm (WF) WPF models are generally dividedinto two main groups physical and statistical but hybrids state-of-the-art methodsare also common [Giebel et al 2001 Lang et al 2006] The physical approach[Landberg and Watson 1994 Gaertner et al 2003] focus on integrating well-knownphysical aspects into the model such-as information about surrounding terrain andproperties of the WT These models try to get as good estimate of local wind speedas possible before finally reducing the remaining error with some form of MOSStatistical approaches [Rodrigues et al 2007 Gonzalez et al 2004] relies moreon historical observations and their statistical relation to meteorological predictionsas well as measurements from Supervisory Control And Data Acquisition (SCADA)

SCADA is a control system that is used in most industrial processes and is themain tool to evaluate the state of individual power plants As the name indicates itis not a full control system but rather a system for supervision In the case of windturbines it allows remote access to online data that has been gathered by sensorsinside and around the plant Variables accessible through these systems are thosedirectly related to the wind flow as well as the produced power generated by theplant ie things like rotor wind speed nacelle position pitch angle active poweretc

5

CHAPTER 2 BACKGROUND

Figure 21 A figure that presents the general outline when forecasting using thestatistical approach

Forecast models that use SCADA data as their primary input source usually have agood forecast accuracy at least for the first few hours but they tend to be less usefulfor longer prediction horizons [Giebel et al 2011] SCADA data can also be usedto detect problems in WT something that can be helpful to improve the reliability ofWT [Yang et al 2013 Wang et al 2014]

Statistical models seen in figure 21 are usually built around Numerical WeatherPrediction (NWP) and SCADA data NWP models are often used to build forecastingmodels as they introduce weather forecasts for the region where the wind turbinesare located NWP data usually contains information about thing like wind speedwind direction temperature and humidity These models are operated twice orfour times a day by a number of large weather services The main forecasts usuallystarts at 00 and 12 UTC corresponding to the world radiosonde launching which isthe only direct observation of the atmospheric state and has been the backbone ofatmospheric monitoring for many decades1 extra forecasts usually start at 06 and18 UTC Physical models as those seen in figure 22 include additional informationabout physical characteristics of the wind turbine and its surrounding ie terraindata information about obstacles capacity and layout were the turbine is locatedand so on Other useful information for physical models include the theoretical

1some problems with this approach is that it results with less information over large oceans andpoorer countries

6

power curve how much power is expected to be produced given a specific windspeed

The time scale of WPF methods are generally divided into 3 main groups Very-short-term (up to 9 hours) short-term (up to 72 hours) medium-term (up to 7days) [Costa et al 2008] while the time step for these models are in the rangeof seconds to days depending on its application Very short-term models used forwind power forecasting consists of statistical methods like Kalman Filters Auto-Regressive Moving Average (ARMA) Auto-Regressive with Exogenous Input (ARX)Box-Jenkins etc Input to these models are historical observations of wind speedwind directions temperature etc and common applications for this forecast horizonincludes things like intraday market trading Since these methods are merely basedon past production they are generally not useful for longer horizons

SCADAData

Physical Model

NWPData

WFCData

Forecastwind power generation

Downscaling

Transformation to Hub Height

Spatial Refinements

Conversion to power

WT Power Curve

Model Output Statistics (MOS)

Figure 22 A figure that presents the general steps when forecasting using aphysical model

Short-term forecasting models includes additional data about historical observationsusually obtained from SCADA systems but more importantly weather forecastsfrom NWP models Machine learning methods used in this area include NeuralNetworks [Kusiak et al 2009] Support Vector Machines [Fugon et al 2008]Nearest Neighbour Search [Jursa et al 2007] Random Forests etc Medium-termforecasting models usually incorporate all the methods used for shorter forecasts aswell as physical characteristics

7

CHAPTER 2 BACKGROUND

21 Neural Networks and Time Series PredictionOne of the most common ANN is the so called Multilayer Perceptron (MLP) a net-work built around some very simple properties of a cortical neuron This networkhave been around since the 50rsquos and have matured with a solid mathematical foun-dation This network have been applied successfully to many different applicationsIt is built around the idea of having multiple layers of neurons Each layer feedinginformation forward to the next layer ie a feed forward neural networks

The main advantage of using a multiple layer instead of just a single layer one isto overcome limitations pointed out by Minsky and Selfridge [1960] Minsky andSeymour [1969] were a single layer perceptron is only capable of learning linearlyseparable patterns and thus unable to learn all functions The MLP is not limitedin this way and has been shown to be able to represent a wide variety of functionsgiven appropriate parameters [Hornik et al 1989]

NuPIC is a platform actively developed and maintained by Numenta It is aplatform that introduce a collection of ideas and algorithms that are inspired bythe structural organization of the neocortex Two of the core concepts found insideNuPIC are the so called Hierarchical Temporal Memory and the Cortical LearningAlgorithm HTM was introduced in Hawkins and Blakeslee [2007] and refers to ahierarchy of cortical regions in the brain

The neocortex is classically divided into 6 different layers2 the current imple-mentation in NuPIC is focused on emulating layer 3-4 (specifically layer 3) withextensions and research code being done to include more layers

A CLA region consists of a collection of columns and each column consists of ahandful of cells This structure is based on the minicolumn hypothesis [Buxhoevedenand Casanova 2002] Each column in the CLA region have its own semanticmeaning and the sparse activity of a handful of active columns will tell us somethingabout the input The CLA is modelled so that specific cells within each columnreflect a temporal context of a pattern and a single cortical region (a CLA region) istrained with the CLA algorithm

A typical CLA region consists of around 2K columns containing around 60Kneurons in total while a typical MLP may have less then 100 neurons Each neuronin a HTM network grow new synapses over time and its not uncommon to havearound 5K synapses per neuron meaning we would have around 300M synapses intotal A single region of this size uses around 100 MB of memory and it will takearound 10 msec to do 1 inference and learning step3

2These layers should not to be confused with the hierarchy of regions3These values are given in a talk by ldquoSensor-Motor Integration in the Neocortex 2013 Hackathonrdquo

8

21 NEURAL NETWORKS AND TIME SERIES PREDICTION

The HTMCLA also differs from the MLP in that it does not directly use scalarweights to represent synaptic connectivity Synapses inside a HTM have binaryweights with a scalar permanence each synapse will either be connected or discon-nected while MLP uses scalar weights The connectedness in HTMCLA is basedon a permanence which is a value between 00 and 10 If the permanence is overcertain threshold we have a connected synapse ie we have weights of either 1 or0 This binary mechanism is there to simulate synapses that are able to form andunform during learning So essentially we have a weight change network vs wiringchange network [Chklovskii et al 2004]

NuPIC today is naturally geared towards time-series prediction which makes itan interesting candidate for wind forecasting NuPIC have been shown to outper-form MLP on financial time series [Gabrielsson et al 2013] and it has been usedsuccessfully to balance traffic [Sinkevicius et al 2011] It has also been used on theNASA Aviation Dataset [Lee and Rajabi 2014] but with less promising result MLPhave been used successfully for many kinds of time series problems [Azoff 1994Niska et al 2004 Jain and Kumar 2007]

9

Chapter 3

Method and Materials

The purpose of this chapter is to provide an explanation of the method used inthis thesis It will present terms and concepts needed and used in the field ofMachine Learning with the focus on how to create forecasts based on time series Itpresents a motivational reason of why these methods are used as well as how theyare used This chapter also contains as a description of the implemented artificialneural network how it is structured and optimized It provides details about theexperimental setup and how these experiments relates to GEFCom and finally adescription on how the datasets is structured and processed

31 Preliminaries311 RemarksThe methodology used to evaluate the prediction models presented in this thesisis based on the protocol presented in Madsen et al [2005]1 which is a completeprotocol that can be used to evaluate the performance of short-term WPF andWind-to-Power (W2P) models The reason this protocol was chosen for this thesiswas because it has been successfully used before as a guideline to evaluate a widevariety of forecast models such as AWPPS Prediktor Previento Sipreoacutelico WPPTThe protocol has been used for both on-shore and off-shore wind farms and wasdeveloped in the frame of the ANEMOS research project [Kariniotakis et al 2006]which brought together many relevant groups involved in the field The aim ofANEMOS was to develop accurate and robust models that substantially outperformcurrent state-of-the-art methods and one of its goals was to establish a common set

1It should be pointed out that no widely agreed standardization exists

11

CHAPTER 3 METHOD AND MATERIALS

of performance measures that can be used to compare forecasts across systems andlocations

312 DefinitionsTime-series

A time series sequence X of observations xt with a particular time-stamp t ieX = [x1 x2 xt] or in short X1t = X1t is a time dependent collection ofvariables

Forecast horizon

A multi-step-ahead prediction is assignment of predicting k forecasts Xt+1t+k givena collection of historical observations Xtminusp+1t In the case of GEFCom we want aforecast for 1minus 48 steps ahead

Point or spot forecast

In this paper we model the forecast pt+k|t as a so called point forecast (or spotforecasts) ie a single value for each forecast (compared to having a probabilitydistribution)

Prediction error

The Prediction Error e for lead time t+ k is defined in equation 31 as the differencebetween the forecast and the actual value where t denote the time index and kis the look-ahead time p is the actualmeasuredtrue wind power and p is thepredicted wind power

et+k|t = pt+k minus pt+k|t (31)

and the normalized prediction error ε as seen in equation 32

εt+k|t = 1pinst

et+k|t = 1pinst

(pt+k minus pt+k|t) (32)

where pinst is the installed capacity of the wind farm (in kW or MW) which isunknown in the competition

12

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 9: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

Chapter 1

Introduction

It has been estimated by the World Wind Energy Association (WWEA) that by theyear of 2020 around 12 of the worldrsquos electricity will be available through windpower making wind energy one of the fastest growing energy resources [WWEA2014 Fan et al 2009] but integrating wind energy into existing electricity supplysystems has been a challenge and numerous objections have been put forwardby traditional energy suppliers and grid operators especially for large-scale useof this energy source The biggest concern is that availability mainly depends onmeteorological conditions and production cannot be adjusted as conveniently asother more conventional energy sources this is because of our inability to controlthe wind A single Wind Turbine (WT) is highly variable and its dependency onwind conditions can result in zero output for more than thousands of hours duringthe course of a year however aggregating wind power generation over bigger areasdecreases this chance

This is where wind power forecasting systems come into play a technology thatcan greatly improve the integration of wind energy into electricity supply systems asforecasting systems provide information on how much wind power can be expectedat any given point within the next few days This results in the removal of some therandomness attributed to wind energy and allows a more accurate way to utilizethis clean energy source while offsetting some of our dependency from other moreunfriendly environmentally sources which in the long turn will cause a smallerdegenerative impact on the environment

There are many commercial forecasting models available Prediktor1 [Landbergand Watson 1994] is a physical model developed by the Risoslash National LaboratoryDenmark It is constructed to refine Numerical Weather Prediction (NWP) in order

1httpwwwprediktorno

1

CHAPTER 1 INTRODUCTION

to transform data trough a power curve to produce the forecast while improving theerror rate with Model Output Statistics (MOS) Previento [Focken et al 2001] devel-oped in the University Oldenburg Germany uses a similar approach to Prediktor butwith regional forecasting and uncertainty estimation The Wind Power PredictionTool (WPPT)2 [Nielsen et al 2002] is a statistical model developed by TechnicalUniversity of Denmark and it consists of semi-parametric power curve model forwind farms taking into account both wind speed and direction It uses dynamicalpredictions models describing the dynamics of the wind power and any diurnalvariations Zephyr [Giebel et al 2000] is a hybrid model that is a combination ofboth the WPPT and Prediktor in this model each wind farm is assigned a forecastmodel according to the available data Sipreoacutelico [Gonzalez et al 2004] developedby Red Eleacutectrica de Espantildea is a statistical model that was designed in order to behighly flexible depending on available data It achieves this by switching between 9different models Aiolos Wind3 is a hybrid model developed by Vitec that createsforecasts by combining a statistical model with physical factors such as wind speedat different altitudes with wind direction and air density

Expektra4 is a serviced-based company founded in 2010 that provide and developa new method for short-term demand and supply forecasting based on an ArtificialNeural Network (ANN) and traditional time series analysis They are currentlyexpanding into the area of wind power forecasting and looking for suitable methodsANN have been used successfully on both wind speed forecasting [Lawan et al2014] and wind power forecasting [Kariniotakis et al 1996] It was demonstratedin Liu et al [2012] that a complex valued recurrent neural network was able topredict output with high accuracy and Huang et al [2015] showed that the initialweights being optimized with a genetic algorithm in a back propagated neuralnetwork gave impressive results so there are good reasons to think that part of themethod Expektra are developing can be used for wind power forecasting

Neural networks in general are highly flexible and recent advancements indeep learning have shown to outperform previous models in different domains[Schmidhuber 2015] These networks are very good at automatically findingfeatures that by hand would take a lot of time and effort to achieve One networkthat shares similarities with deep learning and have received less attention isthe Cortical Learning Algorithm (CLA) Hierarchical Temporal Memory (HTM)developed by Numenta [Numenta 2011] This network is also built around the ideaof having hierarchical structures creating a deep neural network CLA HTM is

2httpwwwenforeu3httpwwwvitecsoftwarecom4httpwwwexpektrase

2

11 PROBLEM FORMULATION

currently tailored very specifically for time-series problems and have at the momentlittle research published around it making it an ideal candidate for time seriesprediction with unknown potential on wind power forecasting problems

Even though there is a lot of prominent methods already developed for windpower forecasting there are still room for improvements Energy forecasting is suchan important topic that competitions have been developed The Global EnergyForecasting Competition (GEFCom) [Hong et al 2014] is a competition that canbe used to help evaluate the performance of new forecasting models It is acompetition that has attracted hundreds of contestants from all over the worldwhich has resulted in the contribution of many new and novel ideas The GEFComwas created in order to

1 Bring together state-of-the-art techniques for energy forecasting

2 Bridge the gap between academic research and industry practice

3 Promote analytical approaches in power energy education

4 Prepare the industry to overcome forecasting challenges posed by the smartgrid world

5 Improve energy forecasting practices

Benchmark datasets and competitions are a valuable source when evaluating newmodels as it allows for a common ground to stand on With the publication of Honget al [2014] the dataset in GEFCom2012 was also published This dataset consistsof data from 7 different wind farms that span a time period of tree years It consistsof observational data from the energy production and weather forecasts

The GEFCom is divided into four tracks load forecasting price forecastingwind power forecasting and solar power forecasting The specified problem in thewind power forecasting track is built around the real-time operation of wind farmsand the dataset is structured accordingly Participants try to predict hourly powergeneration up to 1minus 48 hours ahead given meteorological forecasts and historicallyproduced power

11 Problem FormulationThe objective of this thesis is to adapt and analyse the robustness and suitability ofExpektras method when it is applied to the domain of forecasting short-term wind

3

CHAPTER 1 INTRODUCTION

power generation ie What is needed to adapt this model to wind power forecastingproblems

The second objective of this thesis will be to investigate HTM CLA [Numenta2011] a modern state-of-the-art computational theory of the neocortex ie Is itpossible to use Numenta Platform for Intelligent Computing (NuPIC) for wind powerforecasting problems

These questions will be addressed by evaluating the models against other modelspublished in the area of short-term wind power forecasting more specifically thosethat have been published in GEFCom

12 The scope of the problem1 This study will focus on short-term forecasting ie forecasts done for 1minus 48

hours ahead How to preform well on longer forecasts are left to furtherinvestigations

2 Wind power forecasting is closely related to wind speed forecasting ie tryingto use local information at the wind turbine instead of using nwp data toforecast wind power This thesis will not go into any specific details on how-toforecast wind speeds Readers can take a look at Li and Shi [2010] Cadenasand Rivera [2009] Akinci [2015]

3 There are a lot of neural networks that is worth studying but this thesis willfocus on Expektrarsquos method and HTMCLA inside NuPIC NuPIC was pickedover other deep learning methods because its direct relation to time seriesprediction

4

Chapter 2

Background

Wind Power Forecasting (WPF) has applications in generation and transmissionmaintenance planning energy optimization as well as in energy trading WPFmodels exists at different scales and they can be used to predict the productionfor a single WT to a whole Wind Farm (WF) WPF models are generally dividedinto two main groups physical and statistical but hybrids state-of-the-art methodsare also common [Giebel et al 2001 Lang et al 2006] The physical approach[Landberg and Watson 1994 Gaertner et al 2003] focus on integrating well-knownphysical aspects into the model such-as information about surrounding terrain andproperties of the WT These models try to get as good estimate of local wind speedas possible before finally reducing the remaining error with some form of MOSStatistical approaches [Rodrigues et al 2007 Gonzalez et al 2004] relies moreon historical observations and their statistical relation to meteorological predictionsas well as measurements from Supervisory Control And Data Acquisition (SCADA)

SCADA is a control system that is used in most industrial processes and is themain tool to evaluate the state of individual power plants As the name indicates itis not a full control system but rather a system for supervision In the case of windturbines it allows remote access to online data that has been gathered by sensorsinside and around the plant Variables accessible through these systems are thosedirectly related to the wind flow as well as the produced power generated by theplant ie things like rotor wind speed nacelle position pitch angle active poweretc

5

CHAPTER 2 BACKGROUND

Figure 21 A figure that presents the general outline when forecasting using thestatistical approach

Forecast models that use SCADA data as their primary input source usually have agood forecast accuracy at least for the first few hours but they tend to be less usefulfor longer prediction horizons [Giebel et al 2011] SCADA data can also be usedto detect problems in WT something that can be helpful to improve the reliability ofWT [Yang et al 2013 Wang et al 2014]

Statistical models seen in figure 21 are usually built around Numerical WeatherPrediction (NWP) and SCADA data NWP models are often used to build forecastingmodels as they introduce weather forecasts for the region where the wind turbinesare located NWP data usually contains information about thing like wind speedwind direction temperature and humidity These models are operated twice orfour times a day by a number of large weather services The main forecasts usuallystarts at 00 and 12 UTC corresponding to the world radiosonde launching which isthe only direct observation of the atmospheric state and has been the backbone ofatmospheric monitoring for many decades1 extra forecasts usually start at 06 and18 UTC Physical models as those seen in figure 22 include additional informationabout physical characteristics of the wind turbine and its surrounding ie terraindata information about obstacles capacity and layout were the turbine is locatedand so on Other useful information for physical models include the theoretical

1some problems with this approach is that it results with less information over large oceans andpoorer countries

6

power curve how much power is expected to be produced given a specific windspeed

The time scale of WPF methods are generally divided into 3 main groups Very-short-term (up to 9 hours) short-term (up to 72 hours) medium-term (up to 7days) [Costa et al 2008] while the time step for these models are in the rangeof seconds to days depending on its application Very short-term models used forwind power forecasting consists of statistical methods like Kalman Filters Auto-Regressive Moving Average (ARMA) Auto-Regressive with Exogenous Input (ARX)Box-Jenkins etc Input to these models are historical observations of wind speedwind directions temperature etc and common applications for this forecast horizonincludes things like intraday market trading Since these methods are merely basedon past production they are generally not useful for longer horizons

SCADAData

Physical Model

NWPData

WFCData

Forecastwind power generation

Downscaling

Transformation to Hub Height

Spatial Refinements

Conversion to power

WT Power Curve

Model Output Statistics (MOS)

Figure 22 A figure that presents the general steps when forecasting using aphysical model

Short-term forecasting models includes additional data about historical observationsusually obtained from SCADA systems but more importantly weather forecastsfrom NWP models Machine learning methods used in this area include NeuralNetworks [Kusiak et al 2009] Support Vector Machines [Fugon et al 2008]Nearest Neighbour Search [Jursa et al 2007] Random Forests etc Medium-termforecasting models usually incorporate all the methods used for shorter forecasts aswell as physical characteristics

7

CHAPTER 2 BACKGROUND

21 Neural Networks and Time Series PredictionOne of the most common ANN is the so called Multilayer Perceptron (MLP) a net-work built around some very simple properties of a cortical neuron This networkhave been around since the 50rsquos and have matured with a solid mathematical foun-dation This network have been applied successfully to many different applicationsIt is built around the idea of having multiple layers of neurons Each layer feedinginformation forward to the next layer ie a feed forward neural networks

The main advantage of using a multiple layer instead of just a single layer one isto overcome limitations pointed out by Minsky and Selfridge [1960] Minsky andSeymour [1969] were a single layer perceptron is only capable of learning linearlyseparable patterns and thus unable to learn all functions The MLP is not limitedin this way and has been shown to be able to represent a wide variety of functionsgiven appropriate parameters [Hornik et al 1989]

NuPIC is a platform actively developed and maintained by Numenta It is aplatform that introduce a collection of ideas and algorithms that are inspired bythe structural organization of the neocortex Two of the core concepts found insideNuPIC are the so called Hierarchical Temporal Memory and the Cortical LearningAlgorithm HTM was introduced in Hawkins and Blakeslee [2007] and refers to ahierarchy of cortical regions in the brain

The neocortex is classically divided into 6 different layers2 the current imple-mentation in NuPIC is focused on emulating layer 3-4 (specifically layer 3) withextensions and research code being done to include more layers

A CLA region consists of a collection of columns and each column consists of ahandful of cells This structure is based on the minicolumn hypothesis [Buxhoevedenand Casanova 2002] Each column in the CLA region have its own semanticmeaning and the sparse activity of a handful of active columns will tell us somethingabout the input The CLA is modelled so that specific cells within each columnreflect a temporal context of a pattern and a single cortical region (a CLA region) istrained with the CLA algorithm

A typical CLA region consists of around 2K columns containing around 60Kneurons in total while a typical MLP may have less then 100 neurons Each neuronin a HTM network grow new synapses over time and its not uncommon to havearound 5K synapses per neuron meaning we would have around 300M synapses intotal A single region of this size uses around 100 MB of memory and it will takearound 10 msec to do 1 inference and learning step3

2These layers should not to be confused with the hierarchy of regions3These values are given in a talk by ldquoSensor-Motor Integration in the Neocortex 2013 Hackathonrdquo

8

21 NEURAL NETWORKS AND TIME SERIES PREDICTION

The HTMCLA also differs from the MLP in that it does not directly use scalarweights to represent synaptic connectivity Synapses inside a HTM have binaryweights with a scalar permanence each synapse will either be connected or discon-nected while MLP uses scalar weights The connectedness in HTMCLA is basedon a permanence which is a value between 00 and 10 If the permanence is overcertain threshold we have a connected synapse ie we have weights of either 1 or0 This binary mechanism is there to simulate synapses that are able to form andunform during learning So essentially we have a weight change network vs wiringchange network [Chklovskii et al 2004]

NuPIC today is naturally geared towards time-series prediction which makes itan interesting candidate for wind forecasting NuPIC have been shown to outper-form MLP on financial time series [Gabrielsson et al 2013] and it has been usedsuccessfully to balance traffic [Sinkevicius et al 2011] It has also been used on theNASA Aviation Dataset [Lee and Rajabi 2014] but with less promising result MLPhave been used successfully for many kinds of time series problems [Azoff 1994Niska et al 2004 Jain and Kumar 2007]

9

Chapter 3

Method and Materials

The purpose of this chapter is to provide an explanation of the method used inthis thesis It will present terms and concepts needed and used in the field ofMachine Learning with the focus on how to create forecasts based on time series Itpresents a motivational reason of why these methods are used as well as how theyare used This chapter also contains as a description of the implemented artificialneural network how it is structured and optimized It provides details about theexperimental setup and how these experiments relates to GEFCom and finally adescription on how the datasets is structured and processed

31 Preliminaries311 RemarksThe methodology used to evaluate the prediction models presented in this thesisis based on the protocol presented in Madsen et al [2005]1 which is a completeprotocol that can be used to evaluate the performance of short-term WPF andWind-to-Power (W2P) models The reason this protocol was chosen for this thesiswas because it has been successfully used before as a guideline to evaluate a widevariety of forecast models such as AWPPS Prediktor Previento Sipreoacutelico WPPTThe protocol has been used for both on-shore and off-shore wind farms and wasdeveloped in the frame of the ANEMOS research project [Kariniotakis et al 2006]which brought together many relevant groups involved in the field The aim ofANEMOS was to develop accurate and robust models that substantially outperformcurrent state-of-the-art methods and one of its goals was to establish a common set

1It should be pointed out that no widely agreed standardization exists

11

CHAPTER 3 METHOD AND MATERIALS

of performance measures that can be used to compare forecasts across systems andlocations

312 DefinitionsTime-series

A time series sequence X of observations xt with a particular time-stamp t ieX = [x1 x2 xt] or in short X1t = X1t is a time dependent collection ofvariables

Forecast horizon

A multi-step-ahead prediction is assignment of predicting k forecasts Xt+1t+k givena collection of historical observations Xtminusp+1t In the case of GEFCom we want aforecast for 1minus 48 steps ahead

Point or spot forecast

In this paper we model the forecast pt+k|t as a so called point forecast (or spotforecasts) ie a single value for each forecast (compared to having a probabilitydistribution)

Prediction error

The Prediction Error e for lead time t+ k is defined in equation 31 as the differencebetween the forecast and the actual value where t denote the time index and kis the look-ahead time p is the actualmeasuredtrue wind power and p is thepredicted wind power

et+k|t = pt+k minus pt+k|t (31)

and the normalized prediction error ε as seen in equation 32

εt+k|t = 1pinst

et+k|t = 1pinst

(pt+k minus pt+k|t) (32)

where pinst is the installed capacity of the wind farm (in kW or MW) which isunknown in the competition

12

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 10: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 1 INTRODUCTION

to transform data trough a power curve to produce the forecast while improving theerror rate with Model Output Statistics (MOS) Previento [Focken et al 2001] devel-oped in the University Oldenburg Germany uses a similar approach to Prediktor butwith regional forecasting and uncertainty estimation The Wind Power PredictionTool (WPPT)2 [Nielsen et al 2002] is a statistical model developed by TechnicalUniversity of Denmark and it consists of semi-parametric power curve model forwind farms taking into account both wind speed and direction It uses dynamicalpredictions models describing the dynamics of the wind power and any diurnalvariations Zephyr [Giebel et al 2000] is a hybrid model that is a combination ofboth the WPPT and Prediktor in this model each wind farm is assigned a forecastmodel according to the available data Sipreoacutelico [Gonzalez et al 2004] developedby Red Eleacutectrica de Espantildea is a statistical model that was designed in order to behighly flexible depending on available data It achieves this by switching between 9different models Aiolos Wind3 is a hybrid model developed by Vitec that createsforecasts by combining a statistical model with physical factors such as wind speedat different altitudes with wind direction and air density

Expektra4 is a serviced-based company founded in 2010 that provide and developa new method for short-term demand and supply forecasting based on an ArtificialNeural Network (ANN) and traditional time series analysis They are currentlyexpanding into the area of wind power forecasting and looking for suitable methodsANN have been used successfully on both wind speed forecasting [Lawan et al2014] and wind power forecasting [Kariniotakis et al 1996] It was demonstratedin Liu et al [2012] that a complex valued recurrent neural network was able topredict output with high accuracy and Huang et al [2015] showed that the initialweights being optimized with a genetic algorithm in a back propagated neuralnetwork gave impressive results so there are good reasons to think that part of themethod Expektra are developing can be used for wind power forecasting

Neural networks in general are highly flexible and recent advancements indeep learning have shown to outperform previous models in different domains[Schmidhuber 2015] These networks are very good at automatically findingfeatures that by hand would take a lot of time and effort to achieve One networkthat shares similarities with deep learning and have received less attention isthe Cortical Learning Algorithm (CLA) Hierarchical Temporal Memory (HTM)developed by Numenta [Numenta 2011] This network is also built around the ideaof having hierarchical structures creating a deep neural network CLA HTM is

2httpwwwenforeu3httpwwwvitecsoftwarecom4httpwwwexpektrase

2

11 PROBLEM FORMULATION

currently tailored very specifically for time-series problems and have at the momentlittle research published around it making it an ideal candidate for time seriesprediction with unknown potential on wind power forecasting problems

Even though there is a lot of prominent methods already developed for windpower forecasting there are still room for improvements Energy forecasting is suchan important topic that competitions have been developed The Global EnergyForecasting Competition (GEFCom) [Hong et al 2014] is a competition that canbe used to help evaluate the performance of new forecasting models It is acompetition that has attracted hundreds of contestants from all over the worldwhich has resulted in the contribution of many new and novel ideas The GEFComwas created in order to

1 Bring together state-of-the-art techniques for energy forecasting

2 Bridge the gap between academic research and industry practice

3 Promote analytical approaches in power energy education

4 Prepare the industry to overcome forecasting challenges posed by the smartgrid world

5 Improve energy forecasting practices

Benchmark datasets and competitions are a valuable source when evaluating newmodels as it allows for a common ground to stand on With the publication of Honget al [2014] the dataset in GEFCom2012 was also published This dataset consistsof data from 7 different wind farms that span a time period of tree years It consistsof observational data from the energy production and weather forecasts

The GEFCom is divided into four tracks load forecasting price forecastingwind power forecasting and solar power forecasting The specified problem in thewind power forecasting track is built around the real-time operation of wind farmsand the dataset is structured accordingly Participants try to predict hourly powergeneration up to 1minus 48 hours ahead given meteorological forecasts and historicallyproduced power

11 Problem FormulationThe objective of this thesis is to adapt and analyse the robustness and suitability ofExpektras method when it is applied to the domain of forecasting short-term wind

3

CHAPTER 1 INTRODUCTION

power generation ie What is needed to adapt this model to wind power forecastingproblems

The second objective of this thesis will be to investigate HTM CLA [Numenta2011] a modern state-of-the-art computational theory of the neocortex ie Is itpossible to use Numenta Platform for Intelligent Computing (NuPIC) for wind powerforecasting problems

These questions will be addressed by evaluating the models against other modelspublished in the area of short-term wind power forecasting more specifically thosethat have been published in GEFCom

12 The scope of the problem1 This study will focus on short-term forecasting ie forecasts done for 1minus 48

hours ahead How to preform well on longer forecasts are left to furtherinvestigations

2 Wind power forecasting is closely related to wind speed forecasting ie tryingto use local information at the wind turbine instead of using nwp data toforecast wind power This thesis will not go into any specific details on how-toforecast wind speeds Readers can take a look at Li and Shi [2010] Cadenasand Rivera [2009] Akinci [2015]

3 There are a lot of neural networks that is worth studying but this thesis willfocus on Expektrarsquos method and HTMCLA inside NuPIC NuPIC was pickedover other deep learning methods because its direct relation to time seriesprediction

4

Chapter 2

Background

Wind Power Forecasting (WPF) has applications in generation and transmissionmaintenance planning energy optimization as well as in energy trading WPFmodels exists at different scales and they can be used to predict the productionfor a single WT to a whole Wind Farm (WF) WPF models are generally dividedinto two main groups physical and statistical but hybrids state-of-the-art methodsare also common [Giebel et al 2001 Lang et al 2006] The physical approach[Landberg and Watson 1994 Gaertner et al 2003] focus on integrating well-knownphysical aspects into the model such-as information about surrounding terrain andproperties of the WT These models try to get as good estimate of local wind speedas possible before finally reducing the remaining error with some form of MOSStatistical approaches [Rodrigues et al 2007 Gonzalez et al 2004] relies moreon historical observations and their statistical relation to meteorological predictionsas well as measurements from Supervisory Control And Data Acquisition (SCADA)

SCADA is a control system that is used in most industrial processes and is themain tool to evaluate the state of individual power plants As the name indicates itis not a full control system but rather a system for supervision In the case of windturbines it allows remote access to online data that has been gathered by sensorsinside and around the plant Variables accessible through these systems are thosedirectly related to the wind flow as well as the produced power generated by theplant ie things like rotor wind speed nacelle position pitch angle active poweretc

5

CHAPTER 2 BACKGROUND

Figure 21 A figure that presents the general outline when forecasting using thestatistical approach

Forecast models that use SCADA data as their primary input source usually have agood forecast accuracy at least for the first few hours but they tend to be less usefulfor longer prediction horizons [Giebel et al 2011] SCADA data can also be usedto detect problems in WT something that can be helpful to improve the reliability ofWT [Yang et al 2013 Wang et al 2014]

Statistical models seen in figure 21 are usually built around Numerical WeatherPrediction (NWP) and SCADA data NWP models are often used to build forecastingmodels as they introduce weather forecasts for the region where the wind turbinesare located NWP data usually contains information about thing like wind speedwind direction temperature and humidity These models are operated twice orfour times a day by a number of large weather services The main forecasts usuallystarts at 00 and 12 UTC corresponding to the world radiosonde launching which isthe only direct observation of the atmospheric state and has been the backbone ofatmospheric monitoring for many decades1 extra forecasts usually start at 06 and18 UTC Physical models as those seen in figure 22 include additional informationabout physical characteristics of the wind turbine and its surrounding ie terraindata information about obstacles capacity and layout were the turbine is locatedand so on Other useful information for physical models include the theoretical

1some problems with this approach is that it results with less information over large oceans andpoorer countries

6

power curve how much power is expected to be produced given a specific windspeed

The time scale of WPF methods are generally divided into 3 main groups Very-short-term (up to 9 hours) short-term (up to 72 hours) medium-term (up to 7days) [Costa et al 2008] while the time step for these models are in the rangeof seconds to days depending on its application Very short-term models used forwind power forecasting consists of statistical methods like Kalman Filters Auto-Regressive Moving Average (ARMA) Auto-Regressive with Exogenous Input (ARX)Box-Jenkins etc Input to these models are historical observations of wind speedwind directions temperature etc and common applications for this forecast horizonincludes things like intraday market trading Since these methods are merely basedon past production they are generally not useful for longer horizons

SCADAData

Physical Model

NWPData

WFCData

Forecastwind power generation

Downscaling

Transformation to Hub Height

Spatial Refinements

Conversion to power

WT Power Curve

Model Output Statistics (MOS)

Figure 22 A figure that presents the general steps when forecasting using aphysical model

Short-term forecasting models includes additional data about historical observationsusually obtained from SCADA systems but more importantly weather forecastsfrom NWP models Machine learning methods used in this area include NeuralNetworks [Kusiak et al 2009] Support Vector Machines [Fugon et al 2008]Nearest Neighbour Search [Jursa et al 2007] Random Forests etc Medium-termforecasting models usually incorporate all the methods used for shorter forecasts aswell as physical characteristics

7

CHAPTER 2 BACKGROUND

21 Neural Networks and Time Series PredictionOne of the most common ANN is the so called Multilayer Perceptron (MLP) a net-work built around some very simple properties of a cortical neuron This networkhave been around since the 50rsquos and have matured with a solid mathematical foun-dation This network have been applied successfully to many different applicationsIt is built around the idea of having multiple layers of neurons Each layer feedinginformation forward to the next layer ie a feed forward neural networks

The main advantage of using a multiple layer instead of just a single layer one isto overcome limitations pointed out by Minsky and Selfridge [1960] Minsky andSeymour [1969] were a single layer perceptron is only capable of learning linearlyseparable patterns and thus unable to learn all functions The MLP is not limitedin this way and has been shown to be able to represent a wide variety of functionsgiven appropriate parameters [Hornik et al 1989]

NuPIC is a platform actively developed and maintained by Numenta It is aplatform that introduce a collection of ideas and algorithms that are inspired bythe structural organization of the neocortex Two of the core concepts found insideNuPIC are the so called Hierarchical Temporal Memory and the Cortical LearningAlgorithm HTM was introduced in Hawkins and Blakeslee [2007] and refers to ahierarchy of cortical regions in the brain

The neocortex is classically divided into 6 different layers2 the current imple-mentation in NuPIC is focused on emulating layer 3-4 (specifically layer 3) withextensions and research code being done to include more layers

A CLA region consists of a collection of columns and each column consists of ahandful of cells This structure is based on the minicolumn hypothesis [Buxhoevedenand Casanova 2002] Each column in the CLA region have its own semanticmeaning and the sparse activity of a handful of active columns will tell us somethingabout the input The CLA is modelled so that specific cells within each columnreflect a temporal context of a pattern and a single cortical region (a CLA region) istrained with the CLA algorithm

A typical CLA region consists of around 2K columns containing around 60Kneurons in total while a typical MLP may have less then 100 neurons Each neuronin a HTM network grow new synapses over time and its not uncommon to havearound 5K synapses per neuron meaning we would have around 300M synapses intotal A single region of this size uses around 100 MB of memory and it will takearound 10 msec to do 1 inference and learning step3

2These layers should not to be confused with the hierarchy of regions3These values are given in a talk by ldquoSensor-Motor Integration in the Neocortex 2013 Hackathonrdquo

8

21 NEURAL NETWORKS AND TIME SERIES PREDICTION

The HTMCLA also differs from the MLP in that it does not directly use scalarweights to represent synaptic connectivity Synapses inside a HTM have binaryweights with a scalar permanence each synapse will either be connected or discon-nected while MLP uses scalar weights The connectedness in HTMCLA is basedon a permanence which is a value between 00 and 10 If the permanence is overcertain threshold we have a connected synapse ie we have weights of either 1 or0 This binary mechanism is there to simulate synapses that are able to form andunform during learning So essentially we have a weight change network vs wiringchange network [Chklovskii et al 2004]

NuPIC today is naturally geared towards time-series prediction which makes itan interesting candidate for wind forecasting NuPIC have been shown to outper-form MLP on financial time series [Gabrielsson et al 2013] and it has been usedsuccessfully to balance traffic [Sinkevicius et al 2011] It has also been used on theNASA Aviation Dataset [Lee and Rajabi 2014] but with less promising result MLPhave been used successfully for many kinds of time series problems [Azoff 1994Niska et al 2004 Jain and Kumar 2007]

9

Chapter 3

Method and Materials

The purpose of this chapter is to provide an explanation of the method used inthis thesis It will present terms and concepts needed and used in the field ofMachine Learning with the focus on how to create forecasts based on time series Itpresents a motivational reason of why these methods are used as well as how theyare used This chapter also contains as a description of the implemented artificialneural network how it is structured and optimized It provides details about theexperimental setup and how these experiments relates to GEFCom and finally adescription on how the datasets is structured and processed

31 Preliminaries311 RemarksThe methodology used to evaluate the prediction models presented in this thesisis based on the protocol presented in Madsen et al [2005]1 which is a completeprotocol that can be used to evaluate the performance of short-term WPF andWind-to-Power (W2P) models The reason this protocol was chosen for this thesiswas because it has been successfully used before as a guideline to evaluate a widevariety of forecast models such as AWPPS Prediktor Previento Sipreoacutelico WPPTThe protocol has been used for both on-shore and off-shore wind farms and wasdeveloped in the frame of the ANEMOS research project [Kariniotakis et al 2006]which brought together many relevant groups involved in the field The aim ofANEMOS was to develop accurate and robust models that substantially outperformcurrent state-of-the-art methods and one of its goals was to establish a common set

1It should be pointed out that no widely agreed standardization exists

11

CHAPTER 3 METHOD AND MATERIALS

of performance measures that can be used to compare forecasts across systems andlocations

312 DefinitionsTime-series

A time series sequence X of observations xt with a particular time-stamp t ieX = [x1 x2 xt] or in short X1t = X1t is a time dependent collection ofvariables

Forecast horizon

A multi-step-ahead prediction is assignment of predicting k forecasts Xt+1t+k givena collection of historical observations Xtminusp+1t In the case of GEFCom we want aforecast for 1minus 48 steps ahead

Point or spot forecast

In this paper we model the forecast pt+k|t as a so called point forecast (or spotforecasts) ie a single value for each forecast (compared to having a probabilitydistribution)

Prediction error

The Prediction Error e for lead time t+ k is defined in equation 31 as the differencebetween the forecast and the actual value where t denote the time index and kis the look-ahead time p is the actualmeasuredtrue wind power and p is thepredicted wind power

et+k|t = pt+k minus pt+k|t (31)

and the normalized prediction error ε as seen in equation 32

εt+k|t = 1pinst

et+k|t = 1pinst

(pt+k minus pt+k|t) (32)

where pinst is the installed capacity of the wind farm (in kW or MW) which isunknown in the competition

12

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 11: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

11 PROBLEM FORMULATION

currently tailored very specifically for time-series problems and have at the momentlittle research published around it making it an ideal candidate for time seriesprediction with unknown potential on wind power forecasting problems

Even though there is a lot of prominent methods already developed for windpower forecasting there are still room for improvements Energy forecasting is suchan important topic that competitions have been developed The Global EnergyForecasting Competition (GEFCom) [Hong et al 2014] is a competition that canbe used to help evaluate the performance of new forecasting models It is acompetition that has attracted hundreds of contestants from all over the worldwhich has resulted in the contribution of many new and novel ideas The GEFComwas created in order to

1 Bring together state-of-the-art techniques for energy forecasting

2 Bridge the gap between academic research and industry practice

3 Promote analytical approaches in power energy education

4 Prepare the industry to overcome forecasting challenges posed by the smartgrid world

5 Improve energy forecasting practices

Benchmark datasets and competitions are a valuable source when evaluating newmodels as it allows for a common ground to stand on With the publication of Honget al [2014] the dataset in GEFCom2012 was also published This dataset consistsof data from 7 different wind farms that span a time period of tree years It consistsof observational data from the energy production and weather forecasts

The GEFCom is divided into four tracks load forecasting price forecastingwind power forecasting and solar power forecasting The specified problem in thewind power forecasting track is built around the real-time operation of wind farmsand the dataset is structured accordingly Participants try to predict hourly powergeneration up to 1minus 48 hours ahead given meteorological forecasts and historicallyproduced power

11 Problem FormulationThe objective of this thesis is to adapt and analyse the robustness and suitability ofExpektras method when it is applied to the domain of forecasting short-term wind

3

CHAPTER 1 INTRODUCTION

power generation ie What is needed to adapt this model to wind power forecastingproblems

The second objective of this thesis will be to investigate HTM CLA [Numenta2011] a modern state-of-the-art computational theory of the neocortex ie Is itpossible to use Numenta Platform for Intelligent Computing (NuPIC) for wind powerforecasting problems

These questions will be addressed by evaluating the models against other modelspublished in the area of short-term wind power forecasting more specifically thosethat have been published in GEFCom

12 The scope of the problem1 This study will focus on short-term forecasting ie forecasts done for 1minus 48

hours ahead How to preform well on longer forecasts are left to furtherinvestigations

2 Wind power forecasting is closely related to wind speed forecasting ie tryingto use local information at the wind turbine instead of using nwp data toforecast wind power This thesis will not go into any specific details on how-toforecast wind speeds Readers can take a look at Li and Shi [2010] Cadenasand Rivera [2009] Akinci [2015]

3 There are a lot of neural networks that is worth studying but this thesis willfocus on Expektrarsquos method and HTMCLA inside NuPIC NuPIC was pickedover other deep learning methods because its direct relation to time seriesprediction

4

Chapter 2

Background

Wind Power Forecasting (WPF) has applications in generation and transmissionmaintenance planning energy optimization as well as in energy trading WPFmodels exists at different scales and they can be used to predict the productionfor a single WT to a whole Wind Farm (WF) WPF models are generally dividedinto two main groups physical and statistical but hybrids state-of-the-art methodsare also common [Giebel et al 2001 Lang et al 2006] The physical approach[Landberg and Watson 1994 Gaertner et al 2003] focus on integrating well-knownphysical aspects into the model such-as information about surrounding terrain andproperties of the WT These models try to get as good estimate of local wind speedas possible before finally reducing the remaining error with some form of MOSStatistical approaches [Rodrigues et al 2007 Gonzalez et al 2004] relies moreon historical observations and their statistical relation to meteorological predictionsas well as measurements from Supervisory Control And Data Acquisition (SCADA)

SCADA is a control system that is used in most industrial processes and is themain tool to evaluate the state of individual power plants As the name indicates itis not a full control system but rather a system for supervision In the case of windturbines it allows remote access to online data that has been gathered by sensorsinside and around the plant Variables accessible through these systems are thosedirectly related to the wind flow as well as the produced power generated by theplant ie things like rotor wind speed nacelle position pitch angle active poweretc

5

CHAPTER 2 BACKGROUND

Figure 21 A figure that presents the general outline when forecasting using thestatistical approach

Forecast models that use SCADA data as their primary input source usually have agood forecast accuracy at least for the first few hours but they tend to be less usefulfor longer prediction horizons [Giebel et al 2011] SCADA data can also be usedto detect problems in WT something that can be helpful to improve the reliability ofWT [Yang et al 2013 Wang et al 2014]

Statistical models seen in figure 21 are usually built around Numerical WeatherPrediction (NWP) and SCADA data NWP models are often used to build forecastingmodels as they introduce weather forecasts for the region where the wind turbinesare located NWP data usually contains information about thing like wind speedwind direction temperature and humidity These models are operated twice orfour times a day by a number of large weather services The main forecasts usuallystarts at 00 and 12 UTC corresponding to the world radiosonde launching which isthe only direct observation of the atmospheric state and has been the backbone ofatmospheric monitoring for many decades1 extra forecasts usually start at 06 and18 UTC Physical models as those seen in figure 22 include additional informationabout physical characteristics of the wind turbine and its surrounding ie terraindata information about obstacles capacity and layout were the turbine is locatedand so on Other useful information for physical models include the theoretical

1some problems with this approach is that it results with less information over large oceans andpoorer countries

6

power curve how much power is expected to be produced given a specific windspeed

The time scale of WPF methods are generally divided into 3 main groups Very-short-term (up to 9 hours) short-term (up to 72 hours) medium-term (up to 7days) [Costa et al 2008] while the time step for these models are in the rangeof seconds to days depending on its application Very short-term models used forwind power forecasting consists of statistical methods like Kalman Filters Auto-Regressive Moving Average (ARMA) Auto-Regressive with Exogenous Input (ARX)Box-Jenkins etc Input to these models are historical observations of wind speedwind directions temperature etc and common applications for this forecast horizonincludes things like intraday market trading Since these methods are merely basedon past production they are generally not useful for longer horizons

SCADAData

Physical Model

NWPData

WFCData

Forecastwind power generation

Downscaling

Transformation to Hub Height

Spatial Refinements

Conversion to power

WT Power Curve

Model Output Statistics (MOS)

Figure 22 A figure that presents the general steps when forecasting using aphysical model

Short-term forecasting models includes additional data about historical observationsusually obtained from SCADA systems but more importantly weather forecastsfrom NWP models Machine learning methods used in this area include NeuralNetworks [Kusiak et al 2009] Support Vector Machines [Fugon et al 2008]Nearest Neighbour Search [Jursa et al 2007] Random Forests etc Medium-termforecasting models usually incorporate all the methods used for shorter forecasts aswell as physical characteristics

7

CHAPTER 2 BACKGROUND

21 Neural Networks and Time Series PredictionOne of the most common ANN is the so called Multilayer Perceptron (MLP) a net-work built around some very simple properties of a cortical neuron This networkhave been around since the 50rsquos and have matured with a solid mathematical foun-dation This network have been applied successfully to many different applicationsIt is built around the idea of having multiple layers of neurons Each layer feedinginformation forward to the next layer ie a feed forward neural networks

The main advantage of using a multiple layer instead of just a single layer one isto overcome limitations pointed out by Minsky and Selfridge [1960] Minsky andSeymour [1969] were a single layer perceptron is only capable of learning linearlyseparable patterns and thus unable to learn all functions The MLP is not limitedin this way and has been shown to be able to represent a wide variety of functionsgiven appropriate parameters [Hornik et al 1989]

NuPIC is a platform actively developed and maintained by Numenta It is aplatform that introduce a collection of ideas and algorithms that are inspired bythe structural organization of the neocortex Two of the core concepts found insideNuPIC are the so called Hierarchical Temporal Memory and the Cortical LearningAlgorithm HTM was introduced in Hawkins and Blakeslee [2007] and refers to ahierarchy of cortical regions in the brain

The neocortex is classically divided into 6 different layers2 the current imple-mentation in NuPIC is focused on emulating layer 3-4 (specifically layer 3) withextensions and research code being done to include more layers

A CLA region consists of a collection of columns and each column consists of ahandful of cells This structure is based on the minicolumn hypothesis [Buxhoevedenand Casanova 2002] Each column in the CLA region have its own semanticmeaning and the sparse activity of a handful of active columns will tell us somethingabout the input The CLA is modelled so that specific cells within each columnreflect a temporal context of a pattern and a single cortical region (a CLA region) istrained with the CLA algorithm

A typical CLA region consists of around 2K columns containing around 60Kneurons in total while a typical MLP may have less then 100 neurons Each neuronin a HTM network grow new synapses over time and its not uncommon to havearound 5K synapses per neuron meaning we would have around 300M synapses intotal A single region of this size uses around 100 MB of memory and it will takearound 10 msec to do 1 inference and learning step3

2These layers should not to be confused with the hierarchy of regions3These values are given in a talk by ldquoSensor-Motor Integration in the Neocortex 2013 Hackathonrdquo

8

21 NEURAL NETWORKS AND TIME SERIES PREDICTION

The HTMCLA also differs from the MLP in that it does not directly use scalarweights to represent synaptic connectivity Synapses inside a HTM have binaryweights with a scalar permanence each synapse will either be connected or discon-nected while MLP uses scalar weights The connectedness in HTMCLA is basedon a permanence which is a value between 00 and 10 If the permanence is overcertain threshold we have a connected synapse ie we have weights of either 1 or0 This binary mechanism is there to simulate synapses that are able to form andunform during learning So essentially we have a weight change network vs wiringchange network [Chklovskii et al 2004]

NuPIC today is naturally geared towards time-series prediction which makes itan interesting candidate for wind forecasting NuPIC have been shown to outper-form MLP on financial time series [Gabrielsson et al 2013] and it has been usedsuccessfully to balance traffic [Sinkevicius et al 2011] It has also been used on theNASA Aviation Dataset [Lee and Rajabi 2014] but with less promising result MLPhave been used successfully for many kinds of time series problems [Azoff 1994Niska et al 2004 Jain and Kumar 2007]

9

Chapter 3

Method and Materials

The purpose of this chapter is to provide an explanation of the method used inthis thesis It will present terms and concepts needed and used in the field ofMachine Learning with the focus on how to create forecasts based on time series Itpresents a motivational reason of why these methods are used as well as how theyare used This chapter also contains as a description of the implemented artificialneural network how it is structured and optimized It provides details about theexperimental setup and how these experiments relates to GEFCom and finally adescription on how the datasets is structured and processed

31 Preliminaries311 RemarksThe methodology used to evaluate the prediction models presented in this thesisis based on the protocol presented in Madsen et al [2005]1 which is a completeprotocol that can be used to evaluate the performance of short-term WPF andWind-to-Power (W2P) models The reason this protocol was chosen for this thesiswas because it has been successfully used before as a guideline to evaluate a widevariety of forecast models such as AWPPS Prediktor Previento Sipreoacutelico WPPTThe protocol has been used for both on-shore and off-shore wind farms and wasdeveloped in the frame of the ANEMOS research project [Kariniotakis et al 2006]which brought together many relevant groups involved in the field The aim ofANEMOS was to develop accurate and robust models that substantially outperformcurrent state-of-the-art methods and one of its goals was to establish a common set

1It should be pointed out that no widely agreed standardization exists

11

CHAPTER 3 METHOD AND MATERIALS

of performance measures that can be used to compare forecasts across systems andlocations

312 DefinitionsTime-series

A time series sequence X of observations xt with a particular time-stamp t ieX = [x1 x2 xt] or in short X1t = X1t is a time dependent collection ofvariables

Forecast horizon

A multi-step-ahead prediction is assignment of predicting k forecasts Xt+1t+k givena collection of historical observations Xtminusp+1t In the case of GEFCom we want aforecast for 1minus 48 steps ahead

Point or spot forecast

In this paper we model the forecast pt+k|t as a so called point forecast (or spotforecasts) ie a single value for each forecast (compared to having a probabilitydistribution)

Prediction error

The Prediction Error e for lead time t+ k is defined in equation 31 as the differencebetween the forecast and the actual value where t denote the time index and kis the look-ahead time p is the actualmeasuredtrue wind power and p is thepredicted wind power

et+k|t = pt+k minus pt+k|t (31)

and the normalized prediction error ε as seen in equation 32

εt+k|t = 1pinst

et+k|t = 1pinst

(pt+k minus pt+k|t) (32)

where pinst is the installed capacity of the wind farm (in kW or MW) which isunknown in the competition

12

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 12: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 1 INTRODUCTION

power generation ie What is needed to adapt this model to wind power forecastingproblems

The second objective of this thesis will be to investigate HTM CLA [Numenta2011] a modern state-of-the-art computational theory of the neocortex ie Is itpossible to use Numenta Platform for Intelligent Computing (NuPIC) for wind powerforecasting problems

These questions will be addressed by evaluating the models against other modelspublished in the area of short-term wind power forecasting more specifically thosethat have been published in GEFCom

12 The scope of the problem1 This study will focus on short-term forecasting ie forecasts done for 1minus 48

hours ahead How to preform well on longer forecasts are left to furtherinvestigations

2 Wind power forecasting is closely related to wind speed forecasting ie tryingto use local information at the wind turbine instead of using nwp data toforecast wind power This thesis will not go into any specific details on how-toforecast wind speeds Readers can take a look at Li and Shi [2010] Cadenasand Rivera [2009] Akinci [2015]

3 There are a lot of neural networks that is worth studying but this thesis willfocus on Expektrarsquos method and HTMCLA inside NuPIC NuPIC was pickedover other deep learning methods because its direct relation to time seriesprediction

4

Chapter 2

Background

Wind Power Forecasting (WPF) has applications in generation and transmissionmaintenance planning energy optimization as well as in energy trading WPFmodels exists at different scales and they can be used to predict the productionfor a single WT to a whole Wind Farm (WF) WPF models are generally dividedinto two main groups physical and statistical but hybrids state-of-the-art methodsare also common [Giebel et al 2001 Lang et al 2006] The physical approach[Landberg and Watson 1994 Gaertner et al 2003] focus on integrating well-knownphysical aspects into the model such-as information about surrounding terrain andproperties of the WT These models try to get as good estimate of local wind speedas possible before finally reducing the remaining error with some form of MOSStatistical approaches [Rodrigues et al 2007 Gonzalez et al 2004] relies moreon historical observations and their statistical relation to meteorological predictionsas well as measurements from Supervisory Control And Data Acquisition (SCADA)

SCADA is a control system that is used in most industrial processes and is themain tool to evaluate the state of individual power plants As the name indicates itis not a full control system but rather a system for supervision In the case of windturbines it allows remote access to online data that has been gathered by sensorsinside and around the plant Variables accessible through these systems are thosedirectly related to the wind flow as well as the produced power generated by theplant ie things like rotor wind speed nacelle position pitch angle active poweretc

5

CHAPTER 2 BACKGROUND

Figure 21 A figure that presents the general outline when forecasting using thestatistical approach

Forecast models that use SCADA data as their primary input source usually have agood forecast accuracy at least for the first few hours but they tend to be less usefulfor longer prediction horizons [Giebel et al 2011] SCADA data can also be usedto detect problems in WT something that can be helpful to improve the reliability ofWT [Yang et al 2013 Wang et al 2014]

Statistical models seen in figure 21 are usually built around Numerical WeatherPrediction (NWP) and SCADA data NWP models are often used to build forecastingmodels as they introduce weather forecasts for the region where the wind turbinesare located NWP data usually contains information about thing like wind speedwind direction temperature and humidity These models are operated twice orfour times a day by a number of large weather services The main forecasts usuallystarts at 00 and 12 UTC corresponding to the world radiosonde launching which isthe only direct observation of the atmospheric state and has been the backbone ofatmospheric monitoring for many decades1 extra forecasts usually start at 06 and18 UTC Physical models as those seen in figure 22 include additional informationabout physical characteristics of the wind turbine and its surrounding ie terraindata information about obstacles capacity and layout were the turbine is locatedand so on Other useful information for physical models include the theoretical

1some problems with this approach is that it results with less information over large oceans andpoorer countries

6

power curve how much power is expected to be produced given a specific windspeed

The time scale of WPF methods are generally divided into 3 main groups Very-short-term (up to 9 hours) short-term (up to 72 hours) medium-term (up to 7days) [Costa et al 2008] while the time step for these models are in the rangeof seconds to days depending on its application Very short-term models used forwind power forecasting consists of statistical methods like Kalman Filters Auto-Regressive Moving Average (ARMA) Auto-Regressive with Exogenous Input (ARX)Box-Jenkins etc Input to these models are historical observations of wind speedwind directions temperature etc and common applications for this forecast horizonincludes things like intraday market trading Since these methods are merely basedon past production they are generally not useful for longer horizons

SCADAData

Physical Model

NWPData

WFCData

Forecastwind power generation

Downscaling

Transformation to Hub Height

Spatial Refinements

Conversion to power

WT Power Curve

Model Output Statistics (MOS)

Figure 22 A figure that presents the general steps when forecasting using aphysical model

Short-term forecasting models includes additional data about historical observationsusually obtained from SCADA systems but more importantly weather forecastsfrom NWP models Machine learning methods used in this area include NeuralNetworks [Kusiak et al 2009] Support Vector Machines [Fugon et al 2008]Nearest Neighbour Search [Jursa et al 2007] Random Forests etc Medium-termforecasting models usually incorporate all the methods used for shorter forecasts aswell as physical characteristics

7

CHAPTER 2 BACKGROUND

21 Neural Networks and Time Series PredictionOne of the most common ANN is the so called Multilayer Perceptron (MLP) a net-work built around some very simple properties of a cortical neuron This networkhave been around since the 50rsquos and have matured with a solid mathematical foun-dation This network have been applied successfully to many different applicationsIt is built around the idea of having multiple layers of neurons Each layer feedinginformation forward to the next layer ie a feed forward neural networks

The main advantage of using a multiple layer instead of just a single layer one isto overcome limitations pointed out by Minsky and Selfridge [1960] Minsky andSeymour [1969] were a single layer perceptron is only capable of learning linearlyseparable patterns and thus unable to learn all functions The MLP is not limitedin this way and has been shown to be able to represent a wide variety of functionsgiven appropriate parameters [Hornik et al 1989]

NuPIC is a platform actively developed and maintained by Numenta It is aplatform that introduce a collection of ideas and algorithms that are inspired bythe structural organization of the neocortex Two of the core concepts found insideNuPIC are the so called Hierarchical Temporal Memory and the Cortical LearningAlgorithm HTM was introduced in Hawkins and Blakeslee [2007] and refers to ahierarchy of cortical regions in the brain

The neocortex is classically divided into 6 different layers2 the current imple-mentation in NuPIC is focused on emulating layer 3-4 (specifically layer 3) withextensions and research code being done to include more layers

A CLA region consists of a collection of columns and each column consists of ahandful of cells This structure is based on the minicolumn hypothesis [Buxhoevedenand Casanova 2002] Each column in the CLA region have its own semanticmeaning and the sparse activity of a handful of active columns will tell us somethingabout the input The CLA is modelled so that specific cells within each columnreflect a temporal context of a pattern and a single cortical region (a CLA region) istrained with the CLA algorithm

A typical CLA region consists of around 2K columns containing around 60Kneurons in total while a typical MLP may have less then 100 neurons Each neuronin a HTM network grow new synapses over time and its not uncommon to havearound 5K synapses per neuron meaning we would have around 300M synapses intotal A single region of this size uses around 100 MB of memory and it will takearound 10 msec to do 1 inference and learning step3

2These layers should not to be confused with the hierarchy of regions3These values are given in a talk by ldquoSensor-Motor Integration in the Neocortex 2013 Hackathonrdquo

8

21 NEURAL NETWORKS AND TIME SERIES PREDICTION

The HTMCLA also differs from the MLP in that it does not directly use scalarweights to represent synaptic connectivity Synapses inside a HTM have binaryweights with a scalar permanence each synapse will either be connected or discon-nected while MLP uses scalar weights The connectedness in HTMCLA is basedon a permanence which is a value between 00 and 10 If the permanence is overcertain threshold we have a connected synapse ie we have weights of either 1 or0 This binary mechanism is there to simulate synapses that are able to form andunform during learning So essentially we have a weight change network vs wiringchange network [Chklovskii et al 2004]

NuPIC today is naturally geared towards time-series prediction which makes itan interesting candidate for wind forecasting NuPIC have been shown to outper-form MLP on financial time series [Gabrielsson et al 2013] and it has been usedsuccessfully to balance traffic [Sinkevicius et al 2011] It has also been used on theNASA Aviation Dataset [Lee and Rajabi 2014] but with less promising result MLPhave been used successfully for many kinds of time series problems [Azoff 1994Niska et al 2004 Jain and Kumar 2007]

9

Chapter 3

Method and Materials

The purpose of this chapter is to provide an explanation of the method used inthis thesis It will present terms and concepts needed and used in the field ofMachine Learning with the focus on how to create forecasts based on time series Itpresents a motivational reason of why these methods are used as well as how theyare used This chapter also contains as a description of the implemented artificialneural network how it is structured and optimized It provides details about theexperimental setup and how these experiments relates to GEFCom and finally adescription on how the datasets is structured and processed

31 Preliminaries311 RemarksThe methodology used to evaluate the prediction models presented in this thesisis based on the protocol presented in Madsen et al [2005]1 which is a completeprotocol that can be used to evaluate the performance of short-term WPF andWind-to-Power (W2P) models The reason this protocol was chosen for this thesiswas because it has been successfully used before as a guideline to evaluate a widevariety of forecast models such as AWPPS Prediktor Previento Sipreoacutelico WPPTThe protocol has been used for both on-shore and off-shore wind farms and wasdeveloped in the frame of the ANEMOS research project [Kariniotakis et al 2006]which brought together many relevant groups involved in the field The aim ofANEMOS was to develop accurate and robust models that substantially outperformcurrent state-of-the-art methods and one of its goals was to establish a common set

1It should be pointed out that no widely agreed standardization exists

11

CHAPTER 3 METHOD AND MATERIALS

of performance measures that can be used to compare forecasts across systems andlocations

312 DefinitionsTime-series

A time series sequence X of observations xt with a particular time-stamp t ieX = [x1 x2 xt] or in short X1t = X1t is a time dependent collection ofvariables

Forecast horizon

A multi-step-ahead prediction is assignment of predicting k forecasts Xt+1t+k givena collection of historical observations Xtminusp+1t In the case of GEFCom we want aforecast for 1minus 48 steps ahead

Point or spot forecast

In this paper we model the forecast pt+k|t as a so called point forecast (or spotforecasts) ie a single value for each forecast (compared to having a probabilitydistribution)

Prediction error

The Prediction Error e for lead time t+ k is defined in equation 31 as the differencebetween the forecast and the actual value where t denote the time index and kis the look-ahead time p is the actualmeasuredtrue wind power and p is thepredicted wind power

et+k|t = pt+k minus pt+k|t (31)

and the normalized prediction error ε as seen in equation 32

εt+k|t = 1pinst

et+k|t = 1pinst

(pt+k minus pt+k|t) (32)

where pinst is the installed capacity of the wind farm (in kW or MW) which isunknown in the competition

12

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 13: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

Chapter 2

Background

Wind Power Forecasting (WPF) has applications in generation and transmissionmaintenance planning energy optimization as well as in energy trading WPFmodels exists at different scales and they can be used to predict the productionfor a single WT to a whole Wind Farm (WF) WPF models are generally dividedinto two main groups physical and statistical but hybrids state-of-the-art methodsare also common [Giebel et al 2001 Lang et al 2006] The physical approach[Landberg and Watson 1994 Gaertner et al 2003] focus on integrating well-knownphysical aspects into the model such-as information about surrounding terrain andproperties of the WT These models try to get as good estimate of local wind speedas possible before finally reducing the remaining error with some form of MOSStatistical approaches [Rodrigues et al 2007 Gonzalez et al 2004] relies moreon historical observations and their statistical relation to meteorological predictionsas well as measurements from Supervisory Control And Data Acquisition (SCADA)

SCADA is a control system that is used in most industrial processes and is themain tool to evaluate the state of individual power plants As the name indicates itis not a full control system but rather a system for supervision In the case of windturbines it allows remote access to online data that has been gathered by sensorsinside and around the plant Variables accessible through these systems are thosedirectly related to the wind flow as well as the produced power generated by theplant ie things like rotor wind speed nacelle position pitch angle active poweretc

5

CHAPTER 2 BACKGROUND

Figure 21 A figure that presents the general outline when forecasting using thestatistical approach

Forecast models that use SCADA data as their primary input source usually have agood forecast accuracy at least for the first few hours but they tend to be less usefulfor longer prediction horizons [Giebel et al 2011] SCADA data can also be usedto detect problems in WT something that can be helpful to improve the reliability ofWT [Yang et al 2013 Wang et al 2014]

Statistical models seen in figure 21 are usually built around Numerical WeatherPrediction (NWP) and SCADA data NWP models are often used to build forecastingmodels as they introduce weather forecasts for the region where the wind turbinesare located NWP data usually contains information about thing like wind speedwind direction temperature and humidity These models are operated twice orfour times a day by a number of large weather services The main forecasts usuallystarts at 00 and 12 UTC corresponding to the world radiosonde launching which isthe only direct observation of the atmospheric state and has been the backbone ofatmospheric monitoring for many decades1 extra forecasts usually start at 06 and18 UTC Physical models as those seen in figure 22 include additional informationabout physical characteristics of the wind turbine and its surrounding ie terraindata information about obstacles capacity and layout were the turbine is locatedand so on Other useful information for physical models include the theoretical

1some problems with this approach is that it results with less information over large oceans andpoorer countries

6

power curve how much power is expected to be produced given a specific windspeed

The time scale of WPF methods are generally divided into 3 main groups Very-short-term (up to 9 hours) short-term (up to 72 hours) medium-term (up to 7days) [Costa et al 2008] while the time step for these models are in the rangeof seconds to days depending on its application Very short-term models used forwind power forecasting consists of statistical methods like Kalman Filters Auto-Regressive Moving Average (ARMA) Auto-Regressive with Exogenous Input (ARX)Box-Jenkins etc Input to these models are historical observations of wind speedwind directions temperature etc and common applications for this forecast horizonincludes things like intraday market trading Since these methods are merely basedon past production they are generally not useful for longer horizons

SCADAData

Physical Model

NWPData

WFCData

Forecastwind power generation

Downscaling

Transformation to Hub Height

Spatial Refinements

Conversion to power

WT Power Curve

Model Output Statistics (MOS)

Figure 22 A figure that presents the general steps when forecasting using aphysical model

Short-term forecasting models includes additional data about historical observationsusually obtained from SCADA systems but more importantly weather forecastsfrom NWP models Machine learning methods used in this area include NeuralNetworks [Kusiak et al 2009] Support Vector Machines [Fugon et al 2008]Nearest Neighbour Search [Jursa et al 2007] Random Forests etc Medium-termforecasting models usually incorporate all the methods used for shorter forecasts aswell as physical characteristics

7

CHAPTER 2 BACKGROUND

21 Neural Networks and Time Series PredictionOne of the most common ANN is the so called Multilayer Perceptron (MLP) a net-work built around some very simple properties of a cortical neuron This networkhave been around since the 50rsquos and have matured with a solid mathematical foun-dation This network have been applied successfully to many different applicationsIt is built around the idea of having multiple layers of neurons Each layer feedinginformation forward to the next layer ie a feed forward neural networks

The main advantage of using a multiple layer instead of just a single layer one isto overcome limitations pointed out by Minsky and Selfridge [1960] Minsky andSeymour [1969] were a single layer perceptron is only capable of learning linearlyseparable patterns and thus unable to learn all functions The MLP is not limitedin this way and has been shown to be able to represent a wide variety of functionsgiven appropriate parameters [Hornik et al 1989]

NuPIC is a platform actively developed and maintained by Numenta It is aplatform that introduce a collection of ideas and algorithms that are inspired bythe structural organization of the neocortex Two of the core concepts found insideNuPIC are the so called Hierarchical Temporal Memory and the Cortical LearningAlgorithm HTM was introduced in Hawkins and Blakeslee [2007] and refers to ahierarchy of cortical regions in the brain

The neocortex is classically divided into 6 different layers2 the current imple-mentation in NuPIC is focused on emulating layer 3-4 (specifically layer 3) withextensions and research code being done to include more layers

A CLA region consists of a collection of columns and each column consists of ahandful of cells This structure is based on the minicolumn hypothesis [Buxhoevedenand Casanova 2002] Each column in the CLA region have its own semanticmeaning and the sparse activity of a handful of active columns will tell us somethingabout the input The CLA is modelled so that specific cells within each columnreflect a temporal context of a pattern and a single cortical region (a CLA region) istrained with the CLA algorithm

A typical CLA region consists of around 2K columns containing around 60Kneurons in total while a typical MLP may have less then 100 neurons Each neuronin a HTM network grow new synapses over time and its not uncommon to havearound 5K synapses per neuron meaning we would have around 300M synapses intotal A single region of this size uses around 100 MB of memory and it will takearound 10 msec to do 1 inference and learning step3

2These layers should not to be confused with the hierarchy of regions3These values are given in a talk by ldquoSensor-Motor Integration in the Neocortex 2013 Hackathonrdquo

8

21 NEURAL NETWORKS AND TIME SERIES PREDICTION

The HTMCLA also differs from the MLP in that it does not directly use scalarweights to represent synaptic connectivity Synapses inside a HTM have binaryweights with a scalar permanence each synapse will either be connected or discon-nected while MLP uses scalar weights The connectedness in HTMCLA is basedon a permanence which is a value between 00 and 10 If the permanence is overcertain threshold we have a connected synapse ie we have weights of either 1 or0 This binary mechanism is there to simulate synapses that are able to form andunform during learning So essentially we have a weight change network vs wiringchange network [Chklovskii et al 2004]

NuPIC today is naturally geared towards time-series prediction which makes itan interesting candidate for wind forecasting NuPIC have been shown to outper-form MLP on financial time series [Gabrielsson et al 2013] and it has been usedsuccessfully to balance traffic [Sinkevicius et al 2011] It has also been used on theNASA Aviation Dataset [Lee and Rajabi 2014] but with less promising result MLPhave been used successfully for many kinds of time series problems [Azoff 1994Niska et al 2004 Jain and Kumar 2007]

9

Chapter 3

Method and Materials

The purpose of this chapter is to provide an explanation of the method used inthis thesis It will present terms and concepts needed and used in the field ofMachine Learning with the focus on how to create forecasts based on time series Itpresents a motivational reason of why these methods are used as well as how theyare used This chapter also contains as a description of the implemented artificialneural network how it is structured and optimized It provides details about theexperimental setup and how these experiments relates to GEFCom and finally adescription on how the datasets is structured and processed

31 Preliminaries311 RemarksThe methodology used to evaluate the prediction models presented in this thesisis based on the protocol presented in Madsen et al [2005]1 which is a completeprotocol that can be used to evaluate the performance of short-term WPF andWind-to-Power (W2P) models The reason this protocol was chosen for this thesiswas because it has been successfully used before as a guideline to evaluate a widevariety of forecast models such as AWPPS Prediktor Previento Sipreoacutelico WPPTThe protocol has been used for both on-shore and off-shore wind farms and wasdeveloped in the frame of the ANEMOS research project [Kariniotakis et al 2006]which brought together many relevant groups involved in the field The aim ofANEMOS was to develop accurate and robust models that substantially outperformcurrent state-of-the-art methods and one of its goals was to establish a common set

1It should be pointed out that no widely agreed standardization exists

11

CHAPTER 3 METHOD AND MATERIALS

of performance measures that can be used to compare forecasts across systems andlocations

312 DefinitionsTime-series

A time series sequence X of observations xt with a particular time-stamp t ieX = [x1 x2 xt] or in short X1t = X1t is a time dependent collection ofvariables

Forecast horizon

A multi-step-ahead prediction is assignment of predicting k forecasts Xt+1t+k givena collection of historical observations Xtminusp+1t In the case of GEFCom we want aforecast for 1minus 48 steps ahead

Point or spot forecast

In this paper we model the forecast pt+k|t as a so called point forecast (or spotforecasts) ie a single value for each forecast (compared to having a probabilitydistribution)

Prediction error

The Prediction Error e for lead time t+ k is defined in equation 31 as the differencebetween the forecast and the actual value where t denote the time index and kis the look-ahead time p is the actualmeasuredtrue wind power and p is thepredicted wind power

et+k|t = pt+k minus pt+k|t (31)

and the normalized prediction error ε as seen in equation 32

εt+k|t = 1pinst

et+k|t = 1pinst

(pt+k minus pt+k|t) (32)

where pinst is the installed capacity of the wind farm (in kW or MW) which isunknown in the competition

12

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 14: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 2 BACKGROUND

Figure 21 A figure that presents the general outline when forecasting using thestatistical approach

Forecast models that use SCADA data as their primary input source usually have agood forecast accuracy at least for the first few hours but they tend to be less usefulfor longer prediction horizons [Giebel et al 2011] SCADA data can also be usedto detect problems in WT something that can be helpful to improve the reliability ofWT [Yang et al 2013 Wang et al 2014]

Statistical models seen in figure 21 are usually built around Numerical WeatherPrediction (NWP) and SCADA data NWP models are often used to build forecastingmodels as they introduce weather forecasts for the region where the wind turbinesare located NWP data usually contains information about thing like wind speedwind direction temperature and humidity These models are operated twice orfour times a day by a number of large weather services The main forecasts usuallystarts at 00 and 12 UTC corresponding to the world radiosonde launching which isthe only direct observation of the atmospheric state and has been the backbone ofatmospheric monitoring for many decades1 extra forecasts usually start at 06 and18 UTC Physical models as those seen in figure 22 include additional informationabout physical characteristics of the wind turbine and its surrounding ie terraindata information about obstacles capacity and layout were the turbine is locatedand so on Other useful information for physical models include the theoretical

1some problems with this approach is that it results with less information over large oceans andpoorer countries

6

power curve how much power is expected to be produced given a specific windspeed

The time scale of WPF methods are generally divided into 3 main groups Very-short-term (up to 9 hours) short-term (up to 72 hours) medium-term (up to 7days) [Costa et al 2008] while the time step for these models are in the rangeof seconds to days depending on its application Very short-term models used forwind power forecasting consists of statistical methods like Kalman Filters Auto-Regressive Moving Average (ARMA) Auto-Regressive with Exogenous Input (ARX)Box-Jenkins etc Input to these models are historical observations of wind speedwind directions temperature etc and common applications for this forecast horizonincludes things like intraday market trading Since these methods are merely basedon past production they are generally not useful for longer horizons

SCADAData

Physical Model

NWPData

WFCData

Forecastwind power generation

Downscaling

Transformation to Hub Height

Spatial Refinements

Conversion to power

WT Power Curve

Model Output Statistics (MOS)

Figure 22 A figure that presents the general steps when forecasting using aphysical model

Short-term forecasting models includes additional data about historical observationsusually obtained from SCADA systems but more importantly weather forecastsfrom NWP models Machine learning methods used in this area include NeuralNetworks [Kusiak et al 2009] Support Vector Machines [Fugon et al 2008]Nearest Neighbour Search [Jursa et al 2007] Random Forests etc Medium-termforecasting models usually incorporate all the methods used for shorter forecasts aswell as physical characteristics

7

CHAPTER 2 BACKGROUND

21 Neural Networks and Time Series PredictionOne of the most common ANN is the so called Multilayer Perceptron (MLP) a net-work built around some very simple properties of a cortical neuron This networkhave been around since the 50rsquos and have matured with a solid mathematical foun-dation This network have been applied successfully to many different applicationsIt is built around the idea of having multiple layers of neurons Each layer feedinginformation forward to the next layer ie a feed forward neural networks

The main advantage of using a multiple layer instead of just a single layer one isto overcome limitations pointed out by Minsky and Selfridge [1960] Minsky andSeymour [1969] were a single layer perceptron is only capable of learning linearlyseparable patterns and thus unable to learn all functions The MLP is not limitedin this way and has been shown to be able to represent a wide variety of functionsgiven appropriate parameters [Hornik et al 1989]

NuPIC is a platform actively developed and maintained by Numenta It is aplatform that introduce a collection of ideas and algorithms that are inspired bythe structural organization of the neocortex Two of the core concepts found insideNuPIC are the so called Hierarchical Temporal Memory and the Cortical LearningAlgorithm HTM was introduced in Hawkins and Blakeslee [2007] and refers to ahierarchy of cortical regions in the brain

The neocortex is classically divided into 6 different layers2 the current imple-mentation in NuPIC is focused on emulating layer 3-4 (specifically layer 3) withextensions and research code being done to include more layers

A CLA region consists of a collection of columns and each column consists of ahandful of cells This structure is based on the minicolumn hypothesis [Buxhoevedenand Casanova 2002] Each column in the CLA region have its own semanticmeaning and the sparse activity of a handful of active columns will tell us somethingabout the input The CLA is modelled so that specific cells within each columnreflect a temporal context of a pattern and a single cortical region (a CLA region) istrained with the CLA algorithm

A typical CLA region consists of around 2K columns containing around 60Kneurons in total while a typical MLP may have less then 100 neurons Each neuronin a HTM network grow new synapses over time and its not uncommon to havearound 5K synapses per neuron meaning we would have around 300M synapses intotal A single region of this size uses around 100 MB of memory and it will takearound 10 msec to do 1 inference and learning step3

2These layers should not to be confused with the hierarchy of regions3These values are given in a talk by ldquoSensor-Motor Integration in the Neocortex 2013 Hackathonrdquo

8

21 NEURAL NETWORKS AND TIME SERIES PREDICTION

The HTMCLA also differs from the MLP in that it does not directly use scalarweights to represent synaptic connectivity Synapses inside a HTM have binaryweights with a scalar permanence each synapse will either be connected or discon-nected while MLP uses scalar weights The connectedness in HTMCLA is basedon a permanence which is a value between 00 and 10 If the permanence is overcertain threshold we have a connected synapse ie we have weights of either 1 or0 This binary mechanism is there to simulate synapses that are able to form andunform during learning So essentially we have a weight change network vs wiringchange network [Chklovskii et al 2004]

NuPIC today is naturally geared towards time-series prediction which makes itan interesting candidate for wind forecasting NuPIC have been shown to outper-form MLP on financial time series [Gabrielsson et al 2013] and it has been usedsuccessfully to balance traffic [Sinkevicius et al 2011] It has also been used on theNASA Aviation Dataset [Lee and Rajabi 2014] but with less promising result MLPhave been used successfully for many kinds of time series problems [Azoff 1994Niska et al 2004 Jain and Kumar 2007]

9

Chapter 3

Method and Materials

The purpose of this chapter is to provide an explanation of the method used inthis thesis It will present terms and concepts needed and used in the field ofMachine Learning with the focus on how to create forecasts based on time series Itpresents a motivational reason of why these methods are used as well as how theyare used This chapter also contains as a description of the implemented artificialneural network how it is structured and optimized It provides details about theexperimental setup and how these experiments relates to GEFCom and finally adescription on how the datasets is structured and processed

31 Preliminaries311 RemarksThe methodology used to evaluate the prediction models presented in this thesisis based on the protocol presented in Madsen et al [2005]1 which is a completeprotocol that can be used to evaluate the performance of short-term WPF andWind-to-Power (W2P) models The reason this protocol was chosen for this thesiswas because it has been successfully used before as a guideline to evaluate a widevariety of forecast models such as AWPPS Prediktor Previento Sipreoacutelico WPPTThe protocol has been used for both on-shore and off-shore wind farms and wasdeveloped in the frame of the ANEMOS research project [Kariniotakis et al 2006]which brought together many relevant groups involved in the field The aim ofANEMOS was to develop accurate and robust models that substantially outperformcurrent state-of-the-art methods and one of its goals was to establish a common set

1It should be pointed out that no widely agreed standardization exists

11

CHAPTER 3 METHOD AND MATERIALS

of performance measures that can be used to compare forecasts across systems andlocations

312 DefinitionsTime-series

A time series sequence X of observations xt with a particular time-stamp t ieX = [x1 x2 xt] or in short X1t = X1t is a time dependent collection ofvariables

Forecast horizon

A multi-step-ahead prediction is assignment of predicting k forecasts Xt+1t+k givena collection of historical observations Xtminusp+1t In the case of GEFCom we want aforecast for 1minus 48 steps ahead

Point or spot forecast

In this paper we model the forecast pt+k|t as a so called point forecast (or spotforecasts) ie a single value for each forecast (compared to having a probabilitydistribution)

Prediction error

The Prediction Error e for lead time t+ k is defined in equation 31 as the differencebetween the forecast and the actual value where t denote the time index and kis the look-ahead time p is the actualmeasuredtrue wind power and p is thepredicted wind power

et+k|t = pt+k minus pt+k|t (31)

and the normalized prediction error ε as seen in equation 32

εt+k|t = 1pinst

et+k|t = 1pinst

(pt+k minus pt+k|t) (32)

where pinst is the installed capacity of the wind farm (in kW or MW) which isunknown in the competition

12

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 15: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

power curve how much power is expected to be produced given a specific windspeed

The time scale of WPF methods are generally divided into 3 main groups Very-short-term (up to 9 hours) short-term (up to 72 hours) medium-term (up to 7days) [Costa et al 2008] while the time step for these models are in the rangeof seconds to days depending on its application Very short-term models used forwind power forecasting consists of statistical methods like Kalman Filters Auto-Regressive Moving Average (ARMA) Auto-Regressive with Exogenous Input (ARX)Box-Jenkins etc Input to these models are historical observations of wind speedwind directions temperature etc and common applications for this forecast horizonincludes things like intraday market trading Since these methods are merely basedon past production they are generally not useful for longer horizons

SCADAData

Physical Model

NWPData

WFCData

Forecastwind power generation

Downscaling

Transformation to Hub Height

Spatial Refinements

Conversion to power

WT Power Curve

Model Output Statistics (MOS)

Figure 22 A figure that presents the general steps when forecasting using aphysical model

Short-term forecasting models includes additional data about historical observationsusually obtained from SCADA systems but more importantly weather forecastsfrom NWP models Machine learning methods used in this area include NeuralNetworks [Kusiak et al 2009] Support Vector Machines [Fugon et al 2008]Nearest Neighbour Search [Jursa et al 2007] Random Forests etc Medium-termforecasting models usually incorporate all the methods used for shorter forecasts aswell as physical characteristics

7

CHAPTER 2 BACKGROUND

21 Neural Networks and Time Series PredictionOne of the most common ANN is the so called Multilayer Perceptron (MLP) a net-work built around some very simple properties of a cortical neuron This networkhave been around since the 50rsquos and have matured with a solid mathematical foun-dation This network have been applied successfully to many different applicationsIt is built around the idea of having multiple layers of neurons Each layer feedinginformation forward to the next layer ie a feed forward neural networks

The main advantage of using a multiple layer instead of just a single layer one isto overcome limitations pointed out by Minsky and Selfridge [1960] Minsky andSeymour [1969] were a single layer perceptron is only capable of learning linearlyseparable patterns and thus unable to learn all functions The MLP is not limitedin this way and has been shown to be able to represent a wide variety of functionsgiven appropriate parameters [Hornik et al 1989]

NuPIC is a platform actively developed and maintained by Numenta It is aplatform that introduce a collection of ideas and algorithms that are inspired bythe structural organization of the neocortex Two of the core concepts found insideNuPIC are the so called Hierarchical Temporal Memory and the Cortical LearningAlgorithm HTM was introduced in Hawkins and Blakeslee [2007] and refers to ahierarchy of cortical regions in the brain

The neocortex is classically divided into 6 different layers2 the current imple-mentation in NuPIC is focused on emulating layer 3-4 (specifically layer 3) withextensions and research code being done to include more layers

A CLA region consists of a collection of columns and each column consists of ahandful of cells This structure is based on the minicolumn hypothesis [Buxhoevedenand Casanova 2002] Each column in the CLA region have its own semanticmeaning and the sparse activity of a handful of active columns will tell us somethingabout the input The CLA is modelled so that specific cells within each columnreflect a temporal context of a pattern and a single cortical region (a CLA region) istrained with the CLA algorithm

A typical CLA region consists of around 2K columns containing around 60Kneurons in total while a typical MLP may have less then 100 neurons Each neuronin a HTM network grow new synapses over time and its not uncommon to havearound 5K synapses per neuron meaning we would have around 300M synapses intotal A single region of this size uses around 100 MB of memory and it will takearound 10 msec to do 1 inference and learning step3

2These layers should not to be confused with the hierarchy of regions3These values are given in a talk by ldquoSensor-Motor Integration in the Neocortex 2013 Hackathonrdquo

8

21 NEURAL NETWORKS AND TIME SERIES PREDICTION

The HTMCLA also differs from the MLP in that it does not directly use scalarweights to represent synaptic connectivity Synapses inside a HTM have binaryweights with a scalar permanence each synapse will either be connected or discon-nected while MLP uses scalar weights The connectedness in HTMCLA is basedon a permanence which is a value between 00 and 10 If the permanence is overcertain threshold we have a connected synapse ie we have weights of either 1 or0 This binary mechanism is there to simulate synapses that are able to form andunform during learning So essentially we have a weight change network vs wiringchange network [Chklovskii et al 2004]

NuPIC today is naturally geared towards time-series prediction which makes itan interesting candidate for wind forecasting NuPIC have been shown to outper-form MLP on financial time series [Gabrielsson et al 2013] and it has been usedsuccessfully to balance traffic [Sinkevicius et al 2011] It has also been used on theNASA Aviation Dataset [Lee and Rajabi 2014] but with less promising result MLPhave been used successfully for many kinds of time series problems [Azoff 1994Niska et al 2004 Jain and Kumar 2007]

9

Chapter 3

Method and Materials

The purpose of this chapter is to provide an explanation of the method used inthis thesis It will present terms and concepts needed and used in the field ofMachine Learning with the focus on how to create forecasts based on time series Itpresents a motivational reason of why these methods are used as well as how theyare used This chapter also contains as a description of the implemented artificialneural network how it is structured and optimized It provides details about theexperimental setup and how these experiments relates to GEFCom and finally adescription on how the datasets is structured and processed

31 Preliminaries311 RemarksThe methodology used to evaluate the prediction models presented in this thesisis based on the protocol presented in Madsen et al [2005]1 which is a completeprotocol that can be used to evaluate the performance of short-term WPF andWind-to-Power (W2P) models The reason this protocol was chosen for this thesiswas because it has been successfully used before as a guideline to evaluate a widevariety of forecast models such as AWPPS Prediktor Previento Sipreoacutelico WPPTThe protocol has been used for both on-shore and off-shore wind farms and wasdeveloped in the frame of the ANEMOS research project [Kariniotakis et al 2006]which brought together many relevant groups involved in the field The aim ofANEMOS was to develop accurate and robust models that substantially outperformcurrent state-of-the-art methods and one of its goals was to establish a common set

1It should be pointed out that no widely agreed standardization exists

11

CHAPTER 3 METHOD AND MATERIALS

of performance measures that can be used to compare forecasts across systems andlocations

312 DefinitionsTime-series

A time series sequence X of observations xt with a particular time-stamp t ieX = [x1 x2 xt] or in short X1t = X1t is a time dependent collection ofvariables

Forecast horizon

A multi-step-ahead prediction is assignment of predicting k forecasts Xt+1t+k givena collection of historical observations Xtminusp+1t In the case of GEFCom we want aforecast for 1minus 48 steps ahead

Point or spot forecast

In this paper we model the forecast pt+k|t as a so called point forecast (or spotforecasts) ie a single value for each forecast (compared to having a probabilitydistribution)

Prediction error

The Prediction Error e for lead time t+ k is defined in equation 31 as the differencebetween the forecast and the actual value where t denote the time index and kis the look-ahead time p is the actualmeasuredtrue wind power and p is thepredicted wind power

et+k|t = pt+k minus pt+k|t (31)

and the normalized prediction error ε as seen in equation 32

εt+k|t = 1pinst

et+k|t = 1pinst

(pt+k minus pt+k|t) (32)

where pinst is the installed capacity of the wind farm (in kW or MW) which isunknown in the competition

12

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 16: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 2 BACKGROUND

21 Neural Networks and Time Series PredictionOne of the most common ANN is the so called Multilayer Perceptron (MLP) a net-work built around some very simple properties of a cortical neuron This networkhave been around since the 50rsquos and have matured with a solid mathematical foun-dation This network have been applied successfully to many different applicationsIt is built around the idea of having multiple layers of neurons Each layer feedinginformation forward to the next layer ie a feed forward neural networks

The main advantage of using a multiple layer instead of just a single layer one isto overcome limitations pointed out by Minsky and Selfridge [1960] Minsky andSeymour [1969] were a single layer perceptron is only capable of learning linearlyseparable patterns and thus unable to learn all functions The MLP is not limitedin this way and has been shown to be able to represent a wide variety of functionsgiven appropriate parameters [Hornik et al 1989]

NuPIC is a platform actively developed and maintained by Numenta It is aplatform that introduce a collection of ideas and algorithms that are inspired bythe structural organization of the neocortex Two of the core concepts found insideNuPIC are the so called Hierarchical Temporal Memory and the Cortical LearningAlgorithm HTM was introduced in Hawkins and Blakeslee [2007] and refers to ahierarchy of cortical regions in the brain

The neocortex is classically divided into 6 different layers2 the current imple-mentation in NuPIC is focused on emulating layer 3-4 (specifically layer 3) withextensions and research code being done to include more layers

A CLA region consists of a collection of columns and each column consists of ahandful of cells This structure is based on the minicolumn hypothesis [Buxhoevedenand Casanova 2002] Each column in the CLA region have its own semanticmeaning and the sparse activity of a handful of active columns will tell us somethingabout the input The CLA is modelled so that specific cells within each columnreflect a temporal context of a pattern and a single cortical region (a CLA region) istrained with the CLA algorithm

A typical CLA region consists of around 2K columns containing around 60Kneurons in total while a typical MLP may have less then 100 neurons Each neuronin a HTM network grow new synapses over time and its not uncommon to havearound 5K synapses per neuron meaning we would have around 300M synapses intotal A single region of this size uses around 100 MB of memory and it will takearound 10 msec to do 1 inference and learning step3

2These layers should not to be confused with the hierarchy of regions3These values are given in a talk by ldquoSensor-Motor Integration in the Neocortex 2013 Hackathonrdquo

8

21 NEURAL NETWORKS AND TIME SERIES PREDICTION

The HTMCLA also differs from the MLP in that it does not directly use scalarweights to represent synaptic connectivity Synapses inside a HTM have binaryweights with a scalar permanence each synapse will either be connected or discon-nected while MLP uses scalar weights The connectedness in HTMCLA is basedon a permanence which is a value between 00 and 10 If the permanence is overcertain threshold we have a connected synapse ie we have weights of either 1 or0 This binary mechanism is there to simulate synapses that are able to form andunform during learning So essentially we have a weight change network vs wiringchange network [Chklovskii et al 2004]

NuPIC today is naturally geared towards time-series prediction which makes itan interesting candidate for wind forecasting NuPIC have been shown to outper-form MLP on financial time series [Gabrielsson et al 2013] and it has been usedsuccessfully to balance traffic [Sinkevicius et al 2011] It has also been used on theNASA Aviation Dataset [Lee and Rajabi 2014] but with less promising result MLPhave been used successfully for many kinds of time series problems [Azoff 1994Niska et al 2004 Jain and Kumar 2007]

9

Chapter 3

Method and Materials

The purpose of this chapter is to provide an explanation of the method used inthis thesis It will present terms and concepts needed and used in the field ofMachine Learning with the focus on how to create forecasts based on time series Itpresents a motivational reason of why these methods are used as well as how theyare used This chapter also contains as a description of the implemented artificialneural network how it is structured and optimized It provides details about theexperimental setup and how these experiments relates to GEFCom and finally adescription on how the datasets is structured and processed

31 Preliminaries311 RemarksThe methodology used to evaluate the prediction models presented in this thesisis based on the protocol presented in Madsen et al [2005]1 which is a completeprotocol that can be used to evaluate the performance of short-term WPF andWind-to-Power (W2P) models The reason this protocol was chosen for this thesiswas because it has been successfully used before as a guideline to evaluate a widevariety of forecast models such as AWPPS Prediktor Previento Sipreoacutelico WPPTThe protocol has been used for both on-shore and off-shore wind farms and wasdeveloped in the frame of the ANEMOS research project [Kariniotakis et al 2006]which brought together many relevant groups involved in the field The aim ofANEMOS was to develop accurate and robust models that substantially outperformcurrent state-of-the-art methods and one of its goals was to establish a common set

1It should be pointed out that no widely agreed standardization exists

11

CHAPTER 3 METHOD AND MATERIALS

of performance measures that can be used to compare forecasts across systems andlocations

312 DefinitionsTime-series

A time series sequence X of observations xt with a particular time-stamp t ieX = [x1 x2 xt] or in short X1t = X1t is a time dependent collection ofvariables

Forecast horizon

A multi-step-ahead prediction is assignment of predicting k forecasts Xt+1t+k givena collection of historical observations Xtminusp+1t In the case of GEFCom we want aforecast for 1minus 48 steps ahead

Point or spot forecast

In this paper we model the forecast pt+k|t as a so called point forecast (or spotforecasts) ie a single value for each forecast (compared to having a probabilitydistribution)

Prediction error

The Prediction Error e for lead time t+ k is defined in equation 31 as the differencebetween the forecast and the actual value where t denote the time index and kis the look-ahead time p is the actualmeasuredtrue wind power and p is thepredicted wind power

et+k|t = pt+k minus pt+k|t (31)

and the normalized prediction error ε as seen in equation 32

εt+k|t = 1pinst

et+k|t = 1pinst

(pt+k minus pt+k|t) (32)

where pinst is the installed capacity of the wind farm (in kW or MW) which isunknown in the competition

12

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 17: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

21 NEURAL NETWORKS AND TIME SERIES PREDICTION

The HTMCLA also differs from the MLP in that it does not directly use scalarweights to represent synaptic connectivity Synapses inside a HTM have binaryweights with a scalar permanence each synapse will either be connected or discon-nected while MLP uses scalar weights The connectedness in HTMCLA is basedon a permanence which is a value between 00 and 10 If the permanence is overcertain threshold we have a connected synapse ie we have weights of either 1 or0 This binary mechanism is there to simulate synapses that are able to form andunform during learning So essentially we have a weight change network vs wiringchange network [Chklovskii et al 2004]

NuPIC today is naturally geared towards time-series prediction which makes itan interesting candidate for wind forecasting NuPIC have been shown to outper-form MLP on financial time series [Gabrielsson et al 2013] and it has been usedsuccessfully to balance traffic [Sinkevicius et al 2011] It has also been used on theNASA Aviation Dataset [Lee and Rajabi 2014] but with less promising result MLPhave been used successfully for many kinds of time series problems [Azoff 1994Niska et al 2004 Jain and Kumar 2007]

9

Chapter 3

Method and Materials

The purpose of this chapter is to provide an explanation of the method used inthis thesis It will present terms and concepts needed and used in the field ofMachine Learning with the focus on how to create forecasts based on time series Itpresents a motivational reason of why these methods are used as well as how theyare used This chapter also contains as a description of the implemented artificialneural network how it is structured and optimized It provides details about theexperimental setup and how these experiments relates to GEFCom and finally adescription on how the datasets is structured and processed

31 Preliminaries311 RemarksThe methodology used to evaluate the prediction models presented in this thesisis based on the protocol presented in Madsen et al [2005]1 which is a completeprotocol that can be used to evaluate the performance of short-term WPF andWind-to-Power (W2P) models The reason this protocol was chosen for this thesiswas because it has been successfully used before as a guideline to evaluate a widevariety of forecast models such as AWPPS Prediktor Previento Sipreoacutelico WPPTThe protocol has been used for both on-shore and off-shore wind farms and wasdeveloped in the frame of the ANEMOS research project [Kariniotakis et al 2006]which brought together many relevant groups involved in the field The aim ofANEMOS was to develop accurate and robust models that substantially outperformcurrent state-of-the-art methods and one of its goals was to establish a common set

1It should be pointed out that no widely agreed standardization exists

11

CHAPTER 3 METHOD AND MATERIALS

of performance measures that can be used to compare forecasts across systems andlocations

312 DefinitionsTime-series

A time series sequence X of observations xt with a particular time-stamp t ieX = [x1 x2 xt] or in short X1t = X1t is a time dependent collection ofvariables

Forecast horizon

A multi-step-ahead prediction is assignment of predicting k forecasts Xt+1t+k givena collection of historical observations Xtminusp+1t In the case of GEFCom we want aforecast for 1minus 48 steps ahead

Point or spot forecast

In this paper we model the forecast pt+k|t as a so called point forecast (or spotforecasts) ie a single value for each forecast (compared to having a probabilitydistribution)

Prediction error

The Prediction Error e for lead time t+ k is defined in equation 31 as the differencebetween the forecast and the actual value where t denote the time index and kis the look-ahead time p is the actualmeasuredtrue wind power and p is thepredicted wind power

et+k|t = pt+k minus pt+k|t (31)

and the normalized prediction error ε as seen in equation 32

εt+k|t = 1pinst

et+k|t = 1pinst

(pt+k minus pt+k|t) (32)

where pinst is the installed capacity of the wind farm (in kW or MW) which isunknown in the competition

12

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 18: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

Chapter 3

Method and Materials

The purpose of this chapter is to provide an explanation of the method used inthis thesis It will present terms and concepts needed and used in the field ofMachine Learning with the focus on how to create forecasts based on time series Itpresents a motivational reason of why these methods are used as well as how theyare used This chapter also contains as a description of the implemented artificialneural network how it is structured and optimized It provides details about theexperimental setup and how these experiments relates to GEFCom and finally adescription on how the datasets is structured and processed

31 Preliminaries311 RemarksThe methodology used to evaluate the prediction models presented in this thesisis based on the protocol presented in Madsen et al [2005]1 which is a completeprotocol that can be used to evaluate the performance of short-term WPF andWind-to-Power (W2P) models The reason this protocol was chosen for this thesiswas because it has been successfully used before as a guideline to evaluate a widevariety of forecast models such as AWPPS Prediktor Previento Sipreoacutelico WPPTThe protocol has been used for both on-shore and off-shore wind farms and wasdeveloped in the frame of the ANEMOS research project [Kariniotakis et al 2006]which brought together many relevant groups involved in the field The aim ofANEMOS was to develop accurate and robust models that substantially outperformcurrent state-of-the-art methods and one of its goals was to establish a common set

1It should be pointed out that no widely agreed standardization exists

11

CHAPTER 3 METHOD AND MATERIALS

of performance measures that can be used to compare forecasts across systems andlocations

312 DefinitionsTime-series

A time series sequence X of observations xt with a particular time-stamp t ieX = [x1 x2 xt] or in short X1t = X1t is a time dependent collection ofvariables

Forecast horizon

A multi-step-ahead prediction is assignment of predicting k forecasts Xt+1t+k givena collection of historical observations Xtminusp+1t In the case of GEFCom we want aforecast for 1minus 48 steps ahead

Point or spot forecast

In this paper we model the forecast pt+k|t as a so called point forecast (or spotforecasts) ie a single value for each forecast (compared to having a probabilitydistribution)

Prediction error

The Prediction Error e for lead time t+ k is defined in equation 31 as the differencebetween the forecast and the actual value where t denote the time index and kis the look-ahead time p is the actualmeasuredtrue wind power and p is thepredicted wind power

et+k|t = pt+k minus pt+k|t (31)

and the normalized prediction error ε as seen in equation 32

εt+k|t = 1pinst

et+k|t = 1pinst

(pt+k minus pt+k|t) (32)

where pinst is the installed capacity of the wind farm (in kW or MW) which isunknown in the competition

12

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 19: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 3 METHOD AND MATERIALS

of performance measures that can be used to compare forecasts across systems andlocations

312 DefinitionsTime-series

A time series sequence X of observations xt with a particular time-stamp t ieX = [x1 x2 xt] or in short X1t = X1t is a time dependent collection ofvariables

Forecast horizon

A multi-step-ahead prediction is assignment of predicting k forecasts Xt+1t+k givena collection of historical observations Xtminusp+1t In the case of GEFCom we want aforecast for 1minus 48 steps ahead

Point or spot forecast

In this paper we model the forecast pt+k|t as a so called point forecast (or spotforecasts) ie a single value for each forecast (compared to having a probabilitydistribution)

Prediction error

The Prediction Error e for lead time t+ k is defined in equation 31 as the differencebetween the forecast and the actual value where t denote the time index and kis the look-ahead time p is the actualmeasuredtrue wind power and p is thepredicted wind power

et+k|t = pt+k minus pt+k|t (31)

and the normalized prediction error ε as seen in equation 32

εt+k|t = 1pinst

et+k|t = 1pinst

(pt+k minus pt+k|t) (32)

where pinst is the installed capacity of the wind farm (in kW or MW) which isunknown in the competition

12

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 20: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

31 PRELIMINARIES

313 Reference modelsPersistence (also called a naiumlve or plain predictor) as seen in equation 33 is themodel most commonly used for benchmarking WPF models This simple modelstates that the future wind generation value will be the same as the last measuredvalue

ppersistancet+k|t = pt (33)

A alternative would be to use an even simpler model (a climatology prediction 34ie predicting the mean a value that would be approximated from the training set(see section 315)

pmeant+k|t = pt = 1

N

Nsumt=1

pt (34)

There are other reference models like the one suggested by Nielsen et al [1998]which have advantages over persistence but it was never widely adopted is not usedby GEFCom

314 Error metricsIn order to understand the reason why specific models perform well it is usually agood idea to evaluate it against a wide variety of different criteria as is emphasisedby Kariniotakis [1997] The sections describes error measures used and in thissection N denotes the size of the test set

Forecast Bias

The Normalized Forecast Bias (NBIAS) describes the systematic error and is definedin equation 35 it is estimated by calculating the average error for each step aheadIt gives a indication of the direction of the error

NBIASk = 1N

Nsumt=1

εt+k|t (35)

This bias is sometimes referred to by some authors asMean Forecast Error (MFE)A perfect MFE does not mean the forecast is perfect and contains no error as bothpositive and negative error cancels each other out but it gives an indication thatthey are in a proper target

13

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 21: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 3 METHOD AND MATERIALS

Mean Absolute Error

The Normalized Mean Absolute Error (NMAE) is an error quantity that looks at theaverage of the absolute error of the prediction and is defined in equation 36

NMAEk = 1N

Nsumt=1|εt+k|t| (36)

Another common name used instead of Mean Absolute Error (MAE) is Mean Abso-lute Deviation (MAD) this value shows the magnitude of the overall error that hasoccurred due to forecasting and thus should be as small as possible This error isalso scale dependent and will be effected of data transformations and the scale ofmeasurements

Mean Squared Error

The Normalized Mean Squared Error (NMSE) is an error quantity that looks atthe average of the squared errors ε2

t+k|t and is build using the Normalized Sum ofSquared Error (NSSE) defined in 37

NSSEk =Nsum

t=1ε2

t+k|t (37)

NMSE is defined in equation 38

NMSEk = 1NNSSEk = 1

N

Nsumt=1

ε2t+k|t (38)

In this error positive and negative errors does not cancel each other out and largeindividual errors will be penalized more harshly

Root Mean Squared Error

The Normalized Root Mean Squared Error (NRMSE) is an error quantity that squareroot the NMSE this error is defined in equation 39

NRMSEk = NMSE12k =

(1N

Nsumt=1

ε2t+k|t

)12

(39)

NRMSE is the main metric that is used in GEFCom and it shares the same propertiesas NMSE

14

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 22: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

31 PRELIMINARIES

315 Model selection

In regression and classification one of the main issues we are faced with is ldquoHow dowe create a good modelrdquo One way to define this ldquogoodnessrdquo would be to look at themodelrsquos generalization ability ie its performance on unseen data

To make sure the model we are building will generalize to new unseen data it isimportant how we select and build the model in the first place and what we havecontrol over is the data we have at hand and how we used that data to build ourmodel

Training testing and validating

In the case of supervised learning we have access to a target series and its associatedfeatures ie the production series (our target) and wind speed and wind directionas features

Common practice within Machine Learning which is adhered to in this thesis isto split the whole dataset into 3 smaller subsets Then use two of these subset tofind a good model (ie the training set and validation set) and the remaining (testset) is used for evaluation ie the estimate on how accurate the model we havecreated would be on ldquounseen datardquo With highly flexible models like an artificialneural network we need to be careful not to overfit the data which is one of thereasons why we have the validation set we donrsquot want to create a model thatdoesnrsquot generalize well because the model is fitted to every minor variation ie ithas captured a lot of noise

316 Evaluation

The improvement in respect to a considered reference model ref is defined inequation 310

IrefECk = 100 middot EC

refk minus ECk

ECrefk

() (310)

Where Evaluation Criterion (EC) is any error measurement such as NMSE or NMAEetc

15

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 23: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 3 METHOD AND MATERIALS

Testing period

Date Time Forecast

2011-01-01 0100 1-48 hours2011-01-04 1300 1-48 hours2011-01-08 0100 1-48 hours2011-01-11 1300 1-48 hours2012-06-23 0100 1-48 hours2012-06-26 1300 1-48 hours

Table 31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15 January 2011at 0100 to 17 January 2011 at 0000 In between these periods missing data withpower observations are available for updating the models

32 ExperimentsTraining and testing is structured based on the structure of the GEFCom The datasetspans from midnight of 1st of July 2009 to noon the 26th of June 2012 The periodfrom 1st of July 2009 to 1st January 2011 at 0100 is used for training and validatingwhile rest is used for testing In the testing range a number of 48-hour periods aredefined (See table 31) in-between these testing periods exists additional trainingdata which enables the models to be updated in-between forecasts

The testing periods repeats every 7 days until the end of the dataset and onlymeteorological forecasts that were relevant for the periods with missing power dataare given this was done in order to be consistent

Meteorological forecasts in the GEFCom dataset consists of zonal u and merid-ional v components collected for surface winds at 10m above ground level Theywere extracted from the archive of the European Centre for Medium-range WeatherForecasts (ECMWF) a service who issues high-resolution deterministic forecaststwice a day at 00 Coordinated Universal Time (UTC) and 12 UTC each of theseforecasts periods consist of data for 1 minus 48 hours ahead In order to match thehourly resolution of the power measurements forecasts were interpolated usingcubic splines to have an hourly resolution A summary of the features found in thedataset is found in 32

16

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 24: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

32 EXPERIMENTS

No Category Parameter Alias Type

1 Date Date date String2 Date Year year Integer3 Date Month month Integer4 Date Day day Integer5 Date Hour hours Integer6 Date Week week Integer7 Forecast Wind Speed ws Real8 Forecast Wind Direction (deg) wd Real9 Forecast Wind U u Real10 Forecast Wind V v Real11 Forecast Issued hp Integer12 SCADA Production wp Real

Table 32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for a given date Thelatest issued forecast available is the features will will use in training and testing

17

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 25: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 3 METHOD AND MATERIALS

The database for the meteorological forecasts given by the GEFCom datasetcontains sections of missing data each corresponding to the date in which themissing power information exists these sections were filled out with the previousbest forecast that were available in a pre-processing step For example if we do nothave data for forecast issued at 2011-01-01 1200 we use data from 2011-01-010000 as a 48 hours ahead is available for this this date If it were the case thatprevious section would also contain missing data we would go back an additionalsection and use those forecasts If no forecast are available 48 hours back we canuse the best known forecast and extend it in the same fashion as the persistencemodel

Any predictions made by these models should fall within a certain range so anyforecast outside this range will be clamped This is the main post-processing stepthat is being done ie there are an upper limit on what we can produce

33 Holdback Input RandomizationThe Holdback Input Randomization (HIPR) method described in Kemp et al [2007]is a method that can be used to investigate the importance of the input parametersIt works by sequentially feeding each data point in the test set to the neural networkwhile replacing the values of one input-parameter at the time This replacementis done by a uniformly distributed random values Values in a range in the waythe neural network was originally trained ie (-11) A NRMSE score is calculatedfor each replacement The result of this is that we can get information about therelevance of the input

34 Optimization methodsThe two main optimization algorithms that have been used in this thesis are theParticle Swarm Optimization (PSO) algorithm presented by Eberhart and Kennedy[1995] and the Levenberg-Marquardt (LM)2 algorithm independently developed byKenneth Levenberg and Donald Marquardt [Marquardt 1963 Levenberg 1944]The PSO algorithm is a population-based stochastic algorithm similar to GeneticAlgorithm (GA) but based on socialndashpsychological principles instead of evolution It

2The LM-algorithm was used in the beginning of the project but the result reported ended upusing the PSO algorithm I have included the LM-algorithm in this section because speed optimizationwas measured using this algorithm

18

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 26: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

35 NEURAL NETWORKS

can be summarized by the following steps Each Networks were trained multipletimes in order to avoid local minimum

bull Step 1 Initialize particles with random velocities and accelerations

bull Step 2 Determine which particle is closest to the goal

bull Step 3 Adjust accelerations toward that particle

bull Step 4 Update particles positions based on their velocity and update velocitybased on acceleration

bull Step 5 go to step 2

The LM algorithm is a combination of the Error Back-Propagation (EBP) method[Rumelhart et al 1988] and the GaussndashNewton method This algorithm hasthe speed advantage of the GaussndashNewton and the stability of the EBP [Yu andWilamowski 2011] A detailed treatment of LM can be found in Moreacute [1978] andPSO in Poli et al [2007]

35 Neural Networks

351 Multilayer PerceptronThe perceptron is built around a nonlinear model of a neuron the McCullochndashPittsmodel of a neuron [McCulloch and Pitts 1943] it basically consists of a 2-stepprocess where the cell-body contains a summation function of the weighted sum ofall inputs including a bias The perceptron is described by equation 311 The sum sis passed trough a activation function (see 351) which mimics the activation orfiring of the neuron

s =Msum

i=1wixi + x0 (311)

wi is the weight of the ldquosynapserdquo of the input channel and is the parameter we wantto adjust xi is the input value and x0 is the bias M denotes the number of inputswe have

19

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 27: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 3 METHOD AND MATERIALS

sum f(s)

w

w

w

w

Bias

Sign

alInpu

tSign

al

Output

Signal

Figure 31 The perceptron

The structure of the MLP consists of many perceptrons and it is shown in figure 32The input signal flow from the input layer at the bottom to the output layer at thetop We have a bias signal seen on left side of the diagram which is set to a fixednumber

MLPs are fully connected networks meaning that neurons in any layer of thenetwork is connected to all the neurons in the previous layer Each connection inthe network have a weight wij associated to it The initialization process of theseweights are done before any training is being made and is achieved by randomlyassigning very small values to respective weight in the graph The architecture usedin this study has only one output variable ie the function we approximate has anoutput that produces forecasts of the power generation given a certain input

20

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 28: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

35 NEURAL NETWORKS

Output signal

Bia

s si

gnal

Input signal

Output Layer

Hidden Layer

Input Layer

tanhtanh

linear

hours u v week ws ws-1 ws-2 ws ws+1 ws+2

Figure 32 Architectural graph of the neural network that will produce a singleoutput value It consists of a collection of hidden neurons in each H hidden layersas well as M input connections Each edge seen in this graph have a wij associatedwith it

The performance of neural networks are generally improved if data is normalisedThis is because if we where to use the original data directly it can cause a con-vergence problem Normalization is done using the mapminmax function seen inequation 312

y = (ymax minus ymin) middot (xminus xmin)xmax minus xmin

+ ymin (312)

ymax is the max value of specified range which in this case is 1 and ymin minus1 x isthe value to be scaled xmax is the max value of numbers to be scaled xmin is themin value of numbers to be scaled

The forecasting model consists of 7 different networks one for each wind farmthese networks are trained on the first section of the dataset before the test periodA random split of 602020 is created where 60 of the available data is used fortraining and 20 of the data is used to validate the network And 20 is set tobe a hold-out set for the hyperparamters Input features3 feed into the models are

3see table 32

21

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 29: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 3 METHOD AND MATERIALS

ws u v hours ws ws+1 ws+2 wsminus1 wminus2 were ws+x and wsminusx donate a time shiftof x we can use ws+x as input into the model because ws is a forecast in-itself

Activation Functions

The computation done for each neuron in the multilayer perceptron requires knowl-edge of the derivative of the activation function In other words the activationfunction we pick needs to be continuous In this thesis we use the following twoactivation functions The hyperbolic tangent function seen in equation 313 and

f(s) = tanh(s) = es minus eminuss

es + eminuss(313)

the linear transfer function seen in equation 314

f(s) =

+1 if s ge 1s if minus 1 lt s lt 1minus1 if s le minus1

(314)

Hyperparameter optimization

In order to obtain the hyperparameter necessary for respective model to eachwind-farm a random hyperparameter search was performed for all models A hold-out validation set was used to pick best hyperparameter Random search havebeen shown to work better then Grid Search when not all parameters are equallyimportant [Bergstra and Bengio 2012]

352 Numenta Platform for Intelligent ComputingThis section describes the key principles introduced with NuPIC It explains ingeneral terms the theory behind the Online Prediction Framework (OPF)4 whichuses the CLA and HTM algorithm the OPF works as an API to create predictivemodels The OPF consist of 5 major types of components Encoders Spatial Pooler(SP)Temporal Memory (TM) Temporal Pooler (TP) and Classifiers Togetherthese components construct a single CLA region5 and Figure 33 demonstrates theinformation flow through a single region

4The OPF is used with Numentas commercial product GROK5Currently models created with the OPF do not use a TP nor does this client allow creations of a

hierarchy of regions ie what we would call a HTM it is just possible to create one region

22

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 30: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

35 NEURAL NETWORKS

Another central concept in NuPIC is Sparse Distributed Representation (SDR)which refers to the activation of a small percentage of the neurons at any giventime this neuronal activity is represented as a n-dimensional spare binary vectorx = [b0 bn] with around 2 active cells The output from the spatial pooler andtemporal memory are both SDRs while the output from the encoder does notenforce this and is just a normal binary vector A general overview of the propertiesof SDRs have been discussed by Ahmad and Hawkins [2015]

Encoder

Spatial Pooler

Temporal Memory

Classifier

Figure 33 Information flow of a single region predictive model created with theOPF

The overlap between two different SDR is defined by

o(x y) = x middot y (315)

A match between two SDR is defined as by

m(x y) = o(x y) ge θ (316)

where θ is set to θ le x 1 and θ le y 1 an interesting property with an SDRsomething that is used multiple times inside the temporal memory and especiallyfor predictions is the fact that a set of fixed sized SDR can be reliably stored bytaking the union as a single pattern The boolean OR-operator is used to create a

23

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 31: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 3 METHOD AND MATERIALS

Testing period

Scalar Encoding

1 111110000000002 0111110000000010 00000000011111

Table 33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder

new vector from a set of SDRrsquos The downside of storing patterns this way is thatthe more patterns we store the bigger the probability of false positives

Encoders

NuPIC contains many different encoders6 the job of an encoder is to convert rawinput into a more suitable representation (ie a binary vector) The raw input arefed into the model using a dictionary data-structure

Useful encoders include scalar and categorical encoders while more exoticencoders include one fore Global Positioning System (GPS) coordinates Thisencoder can be used to extract information about anomalous movements Thedictionary representation of entries of raw inputs are each encoded separately andconcatenated using a multi encoder

One property of the spatial pooler is that overlapping input patterns are mappedto the same SDR This means that we want the encoder to encode input so thatsimilar inputs share bits The ScalarEncoder fulfil this property by the processillustrated in table 33

We calculate the range of values to encode using equation 317 vmin representsthe minimum value of the input signal while vmax denotes the upper bound of theinput signal

vrange = vmax minus vmin (317)

There are three mutually exclusive parameters that determine the overall sizeof the output from a scalar encoder (n r ψ) n directly represent the total number

6A full list of all encoders can be find in the API documentation for NuPIC

24

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 32: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

35 NEURAL NETWORKS

of bits in the output and it must be bigger then or equal to w which represents thenumber of bits that are set to encode a single value ie the ldquowidthrdquo of the outputsignal7 r and ψ are specified wrt the input at while w is specified wrt the outputTwo inputs separated by more than the radius have non-overlapping representationsTwo inputs separated by less than the radius will in general overlap in at least someof their bits Two inputs separated by greater than or equal to the resolution areguaranteed to have different representations ψ can be calculated using equation318

ψ = r

w(318)

Depending on weather we want to have a periodic behaviour or not n is calculateda bit differently see the scalar encoder for implementation details8

Spatial Pooler

The spatial pooler receives a binary vector as its input from an encoder and willoutput a SDR The structure of the spatial pooler consist of a input space and a setof columns The output SDR represents which of the columns in a region that areactive Each column have synapses connected to the input space The SP consistsof around 50 randomly and potentially connected synapses this is called theldquopotential poolrdquo Each synapse will connect and disconnect with the input spaceduring learning

The general information flow of the spatial pooler consists of the following stepsinput stimuli from lower regions leads to activation of the input space to which eachmini-column is synaptically connected this activation the so called ldquooverlap scorerdquo isset for each column The score is calculated based on the total sum of each neuronsthat tries to influence that column weighted with a ldquoboosting factorrdquo that tries toincrease the chance for certain columns to win more easily A final top percentage(usually around 2 activity) of the columns with the highest influence (biggestoverlap score) will be chosen columns also inhibit close-by columns and the resultof this process is a binary vector with few active bit a SDR This is illustrated in319 where b can either be 0 or 1 and s is a value representing the score

7w must be odd to avoid centering problems8httpsgithubcomnumentanupic

25

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 33: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 3 METHOD AND MATERIALS

[b1 b2 b3 bn

]︸ ︷︷ ︸

Input vector

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Connected synapses for each columnn

=[s1 s2 s4 sn

]︸ ︷︷ ︸

Overlap score

Inhibition︷︸︸︷rarr

[b1 b2 b3 bn

]︸ ︷︷ ︸

Output SDR

(319)

Learning in this structure is done by adjusting the permanence of the proximal den-drite to better match the input for each winning column increase the permanencefor the synapses that correctly matched the input and decrease the permanence forthe rest We also increase the boosting factor for losing columns to allow it to havea bigger chance of winning next time

Temporal Memory

The temporal memory receives a SDR from the spatial pooler which represents theactive columns the output of the temporal memory is the activity from the wholeCLA region The general idea of TM9 is that cells in each mini-column providetemporal context to the pattern identified with the spatial pooler The TMrsquos job isto form transitions between different SDRrsquos and it achieves this by allowing activecells to form connections to previously active cells so that in a future setting eachcell is able to predict its own activity

Each cell in a region has several distal segments ideally one for each patternthe cell has transitioned from practically these segments are connected to a coupleof patterns each A cell enters a predictive state if there is enough activity on asegment and every segment consist of a collection of connections synapses thathave been formed to a subset of previously active cells (typically around 10-15cells)

The Temporal Memory receives a sparse binary vector from the spatial poolerand the general information flow for this part of the CLA goes through two phasesWe have the following steps in the first phase 1) For each active column we checkto see if there are any cell in a predictive state if there are any cell in a predictive

9There are a lot of details in how the Temporal Memory is implemented pseudo-code and moredetails can be found in [Numenta 2011] the nupic git repository is the best source for finer detailshttpsgithubcomnumentanupic

26

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 34: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

35 NEURAL NETWORKS

state then 2) Activate that particular cell If no cell was found in a predictive statethen 3) Activates all cells in that particular column a process called bursting

Bursting is analogous to we do not know what cell to activate because we havenot seen this instance of the sequence of patterns before and we are unable to putthe column into the correct temporal context let us activate all cells reflecting thisuncertainty Bursting does not occur when the temporal context has been predictedby the CLA Equation 320 show the first phase

[b1 b2 b3 bn

]︸ ︷︷ ︸

SP SDR

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Predictive State

=

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

︸ ︷︷ ︸

Active State

(320)

The second phase of the algorithm is there to figure out what cells should beturned into a predictive state for the next time-step This is achieved by checkingevery distal segment on every cell for activity above a certain threshold ie checkfor active segments One active segment is enough to put a cell into a predictivestate The output from the temporal pooler is an vector representing the state of allcell in that region Equation 321 show phase 2

[b1 b2 b3 bn

]︸ ︷︷ ︸

Active State

middot

b11 b12 b13 b1n

b21 b22 b23 b2n

bd1 bd2 bd3 bdn

X︸ ︷︷ ︸

Segment X

=[b1 b2 b3 bn

]X︸ ︷︷ ︸

Segment Activation X

gt τ

rarr[s1 s2 s3 sn

]︸ ︷︷ ︸

Predictive State

(321)

If learning is turned on update the permanence on the synapses connectedto active distal segments (the same way as in the spatial pooler) These changesare marked as temporary changes until we are sure that a cell is in fact able tocorrectly predict something with these new changes After this fact is known thechange become either permanent or removed Temporary marked cells are updatedwhenever a cell goes from being inactive to active from a feed-forward input (updateperformance as we correctly predicted the feed-forward activation) If a cell insteadwent from active to inactive undo the change

27

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 35: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 3 METHOD AND MATERIALS

NuPIC Classifiers

There are many different classifiers included with NuPIC such as a k-Nearest Neigh-bour (kNN) and a Support Vector Machine (SVM) the OPF in particular uses acustom built classifier called the ldquoCLAClassifierrdquo The purpose of this classifier is todecode predictions made by the CLA All classifications preformed with CLAClassifieris preformed in different ways depending on how the input has been encoded

Scalar values are decoded using a process were each cell in a CLA region ispaired with two histograms One of the histogram keeps track of the frequencyof encountered patterns associated with each cell the other one keeps track of amoving average for each bucket This process is illustrated in figure 34

SDR

column 1

column 2 column Nhistograms

maxvalue

minvalue

Like

lihoo

d

Mov

ing

aver

age

2 histogram per cell

Figure 34 The CLAClassifier

Training in NuPIC

Training the NuPIC model is done using online learning algorithms The followingschema seen in figure 35 is used to train and test these models We have 7 differentwind-farms so this schema is done for each respective wind-farm

This schema is slightly adjusted from the traditional way the OPF handles multi-step predictions The default way of doing this is to use 48 different models each for

28

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 36: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

35 NEURAL NETWORKS

every step a-head in the CLAClassifier which would give us 48 middot 7 = 336 models intotal (for all the wind farms) Not only does this change match Expekrarsquos approachbut it also reduces the memory requirements for the CLAClassifier

Dataset

OPF-Model

PSO - Swarmingor

Manual setup

pre-training-datachunk

Hyperparameter setup

Online learning Activated

Training phase

training datastream

Predictions

Dataset OPF-Model

Online learning Deactivated

Testing phase

testing data

Predictions

Multistep

Predictions

Figure 35 Training a OPF model

Input and hyperparamter selection

Finding hyperparameters for an OPF model was partly done using a custom built-inPSO algorithm and configured manually mainly by ensuring that the PSO-algorithmdid not remove encoders A single CLA region contains a lot of hyper-parametersthese parameters have been included with a corresponding description in appendixA Input to the model are date ws wp u v

29

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 37: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

Chapter 4

Result

This chapter contains information of the primary result that was obtained in thisstudy Figures 44-410 contain different error measurements on the test for respec-tive wind farm we see that Expektra ANN model is able to perform well over allwind farms while the NuPIC model performs worse than Expektra but in general stillbetter then our reference model NuPIC is off target with a bias error on most windfarms The graphs also indicates that NuPIC is unable to pick up some trends in thecumulated ε2 graph Expektra model on the other hand shows no clear problems incumulated ε2 and is on target on all wind farms and appendix C has been includedto reflect this on different lead times

0 5 10 15 20

Wind speed (ms)

00

02

04

06

08

10

Cu

mu

lati

vep

rob

abili

ty

wind speed

20

24

68

1012

1416

wind direction

500

50100

150200

250300

350400

norm

aliz

ed p

ow

er

outp

ut

02

00

02

04

06

08

10

Figure 41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seen for WindFarm 1 GEFCom dataset (hourly data from January 1 2010 to January 1 2012)

31

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 38: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 4 RESULT

Figure 42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data from January 12010 to January 1 2012)

Figure 43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data from January 12010 to January 1 2012)

32

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 39: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 1

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 1

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 1

Expektra

NuPIC

Persistence

Figure 44 Different error measurement for WF 1

33

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 40: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 2

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 2

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 2

Expektra

NuPIC

Persistence

Figure 45 Different error measurement for WF 2

34

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 41: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 3

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 3

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 3

Expektra

NuPIC

Persistence

Figure 46 Different error measurement for WF 3

35

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 42: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 4

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 4

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 4

Expektra

NuPIC

Persistence

Figure 47 Different error measurement for WF 4

36

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 43: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 5

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 5

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 5

Expektra

NuPIC

Persistence

Figure 48 Different error measurement for WF 5

37

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 44: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 4 RESULT

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 6

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 6

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 6

Expektra

NuPIC

Persistence

Figure 49 Different error measurement for WF 6

38

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 45: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

10 20 30 40

look-ahead time k (in hours)

minus03

minus02

minus01

00

01

02

03

NB

IAS

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NR

MS

E

Wind Farm 7

Expektra

NuPIC

Persistence

0 1000 2000 3000 4000 5000 6000 7000 8000

time (in hours)

0

200

400

600

800

1000

cum

ula

tedε2

Wind Farm 7

Expektra

NuPIC

Persistence

10 20 30 40

look-ahead time k (in hours)

00

01

02

03

04

05

NM

AE

Wind Farm 7

Expektra

NuPIC

Persistence

Figure 410 Different error measurement for WF 7

39

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 46: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 4 RESULT

Wind Farm

User 1 2 3 4 5 6 7 All

Leustagos 0145 0138 0168 0144 0158 0133 0140 0146DuckTile 0143 0145 0172 0145 0165 0137 0146 0148MZ 0141 0151 0174 0145 0167 0141 0145 0149Propeller 0144 0153 0177 0147 0175 0141 0147 0152Duehee Lee 0157 0144 0176 0160 0169 0154 0148 0155Expektra 0165 0158 0184 0164 0179 0153 0153 0165MTU EE5260 0161 0172 0193 0162 0192 0156 0160 0168SunWind 0174 0177 0193 0176 0179 0157 0162 0172ymzsmsd 0163 0186 0200 0164 0192 0162 0167 01744138 Kalchas 0180 0179 0197 0175 0200 0160 0165 0177NuPIC 0243 0254 0264 0310 0290 0224 0240 0264

Persistence 0302 0338 0373 0364 0388 0341 0361 0355

Table 41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result

41 Experimental results

Looking at the primary results presented in this study we see in table 41 thatExpektrarsquos ANN model is able to achieve performance comparable to other methodspublished with Hong et al [2014] This can be used as a starting point for furtherinvestigations A similar approach to Expektra is DueheeLee [Lee and Baldick 2014]as both methods are based on the same neural network architecture but with differ-ent optimization techniques DueheeLee network is structured slightly differentlyto Expektrarsquos model which only consists of 1 output node whereas DueheeLee usesfive (representing five prediction ahead) Beside these differences DueheeLee havealso included additional sub-models to create an prediction ensemble blending theoutput result of the ANN with a Gaussian Process (GP) to improve very short-termpredictions MTU EE5260 is also a neural networks approach that converts meteo-rological forecasts to power and here the score is very close The HTM CLA beatsthe persistent model but needs more work to outperform other models in GEFCom

40

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 47: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

42 INPUT IMPORTANCE

42 Input Importance

For interpretation proposes in order to understand the model better a analysis ofthe relative input parameter importance was performed This analysis is illustratedin 411 Each box represent added noise to that channel and will result in a higherNRMSE score if that feature was important A reference point ldquoall-channelsrdquo repre-sent the error distribution of the model with no input replacement We clearly seein this figure that wind speed channels ws is the most important attribute addingnoise to this channel will greatly effect the NRMSE score We also see that the windcomponents u v shows little to no influence and that the time stamp related inputshours and week both indicates that there are seasonal and daily trends present inthe dataset

allminuschannels

hours u v

week

ws

ws minus

1

ws minus

2

ws minus

3

ws +

1

ws +

2

ws +

3

02

03

04

05

06

NR

MS

E

Figure 411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo re-flects the reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = timestamps u v is adirectional vector of the wind

41

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 48: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 4 RESULT

421 Adaptation and OptimizationAll training for Expektras model was performed on a MacBook Pro with a IntelCore 2 Duo Processor (P8600 240 GHz) on a 64-bit Operating System with 4GB installed RAM Memory After adapting the code to be able to run experimentsusing the GEFCom dataset an investigation was performed to evaluate the perfor-mance of the core source code given by Expektra This investigation resulted inidentification of some slow parts of code after optimizing these issues mainly byusing a MathNET native library instead of the managed provider a speed test wasperformed comparing the unoptimized version with the optimized one Figure 412shows the speed up achieved during training using LM algorithm

10 15 20 250

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Tim

e

Normal

Optimized

Figure 412 Plot showing unoptimized version vs the optimized when traininga network with 10 input neurons 10152025 hidden neurons 1 output neuronTraining was done for 100 epochs using LM

43 SummaryA plot that shows the average NRMSE-improvement over persistence model can beeseen in 413 we see that both models are able to perform better then persistenceExpektrarsquos model performs better then the NuPIC model for the whole forecast

42

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 49: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

43 SUMMARY

horizon and NuPIC is performing better then persistence towards the end of theforecast

0 10 20 30 40 50look-ahead time (in hours)

0

20

40

60

80

100

Imp

rove

men

t(

)-

NR

MS

E

Improvement over persistence

Expektra

NuPIC

Figure 413 Summarized average improvement over all wind-farms with 95confidence intervals

43

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 50: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

Chapter 5

Discussion

With the exponential growth of computer power it becomes easier to study deeperand more complex networks and with recent advantages in the area of deep learninga new interest in ANN has resurged In practice ANNs have been used by mostgroups in the field of WPF but these networks never caught on as it was arguedthat improvements made by ANN were not usually enough for the extra effort intraining these networks [Giebel et al 2011] This is steadily becoming less of anissue as computation becomes cheaper and bigger networks perform better

By using a simple MLP network we observe that we are able to obtain resultssimilar in performance to other models published in Hong et al [2014] This canbe used as a starting point for further investigations The GEFCom dataset have alimited amount of input features and if we had access to a wider range of them thepower of neural networks could be studied more in-depth Given the number offeatures in the dataset this is harder to do

Using persistence as a reference model was done in order to have the same base-line as the one used in GEFCom but a better reference model should be consideredin the future such as the one presented in Nielsen et al [1998] Especially if longerforecast is to be considered

51 Method development issuesWorking with Expektrarsquos model is very straightforward and no major issues wereencountered during development but some issues were encountered working withNuPIC1

1It should also be pointed out that it helps to have the people who developed the code on thespot and this is probably the main reason for little trouble

45

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 51: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

CHAPTER 5 DISCUSSION

1 Installing NuPIC is difficult This seem to be a general problem looking atposts in the nupic mailing list2 A very important topic to fix if they want morepeople to work with this model

2 Using built-in Swarming (PSO) did not work well especially for slightly largerdataset and for 48 prediction steps The main problem was to fit everythinginto memory and that the code run too slow when swarming over multiplestep-ahead making it practically impossible to use Scaling down the problemby just having a few prediction steps and multiple models helped but theswarm model kept dismissing wind speed as an important feature which wasfixed by specifically telling it to not touch that encoder

3 Inconsistencies between the white-paper documentation of HTMCLA andthe actual implementation is making it hard to understand the underlingprinciples of NuPIC NuPIC present a very complex network and having anyinconsistencies in documentation makes it hard to understand The code isquite well documented but the additional material is a bit sparse

These issues are all understandable and are continuously being improved uponNuPIC is still a young platform so it is expected to find issues due to the complexityand research nature of the project Training the CLAClassifer to use many steppredictions instead of just one will most likely reduce the BIAS error seen in theresult To investigate this a more powerful computer is needed

There are some differences in the input between the models This setup wascreated because a OPF model need to encode the value it is going to predict SCADAdata is available so this is not an issue but it may introduce some unfairness betweenNuPIC and Expektra In the current setup NuPIC does not seem to gain anythingextra by the additional information and other models in the competition will mostlikely use different inputs as well

In general NuPIC is different in the way you feed data you do so by sending inthe front of the signal We achieve temporal context through the temporal memory

52 Future improvements and directionsOne issue with the GEFCom dataset is that it does not contain any specific infor-mation about the location of these 7 wind farms It is worth considering taking aa look in how to handle different models that are specialized for different types

2httpnumentaorglists

46

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 52: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

53 CONCLUSIONS

of terrain as this has been shown to increase the performance [Kariniotakis et al2006] Another thing to investigate could be to use a more advanced schema forhyperparameter selection Jursa and Rohrig [2008] shows that the WPF error canbe reduced by smarter use of optimization algorithms for feature selection andhyperparameter selection which should improve the performance

In general having more types of inputs from sources like SCADA and NWP sys-tems could give a lot of valuable information that can be used for better predictionsIt is clear that wind speed is a key feature to produce good forecast which also issupported by the input analysis in section 42 it has been argued that the error inWPF models stems largely from wrong weather forecasts [Giebel et al 2011] soidentifying good sources for weather forecasts would be desirable for more effectivemodel learning Combining several different weather forecasts could be investigatedas Nielsen et al [2007] showed that power forecasts based on a number of differentmeteorological forecasts were better then a single source

Regarding HTM CLA it is worth considering implementing a custom encoderfor it an encoder that would specifically be targeted at data concerning wind farmsSCADA data could also possibly be streamed directly into a NuPIC model whichwould allow for some interesting online predictions and anomaly detections

Training times are always an issue especially for very large models Both NuPICand Expektrarsquos model would benefit in being implemented in a more parallel fashionas these algorithms are very well suited for parallel computing The speed up wouldbe of great value so looking into and implementing a GPU version of the algorithmscould be worthwhile

53 ConclusionsThe field of energy forecasting is still young and the necessity for good forecastingtechniques is constantly growing This initial study shows that part of Expektrarsquosforecasting method can be used to forecast wind power generation We observe thatthis ANN is able to achieve performance comparable to other methods published inHong et al [2014] Additional work is still needed to obtain state-of-the-art resultsNumentarsquos model is able to beat the reference model but still needs additional workbefore we can draw definite conclusions about its performance Constructing acustom encoder could be a first step in that direction

47

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 53: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

Bibliography

Subutai Ahmad and Jeff Hawkins Properties of sparse distributed representa-tions and their application to hierarchical temporal memory arXiv preprintarXiv150307469 2015

TC Akinci Short term wind speed forecasting with ann in batman turkey Elektron-ika ir Elektrotechnika 107(1)41ndash45 2015

E Michael Azoff Neural network time series forecasting of financial markets JohnWiley amp Sons Inc 1994

James Bergstra and Yoshua Bengio Random search for hyper-parameter optimiza-tion The Journal of Machine Learning Research 13(1)281ndash305 2012

Daniel P Buxhoeveden and Manuel F Casanova The minicolumn hypothesis inneuroscience Brain 125(5)935ndash951 2002

Erasmo Cadenas and Wilfrido Rivera Short term wind speed forecasting in laventa oaxaca meacutexico using artificial neural networks Renewable Energy 34(1)274ndash278 2009

Dmitri B Chklovskii BW Mel and K Svoboda Cortical rewiring and informationstorage Nature 431(7010)782ndash788 2004

A Costa A Crespo J Navarro G Lizcano H Madsen and E Feitosa A reviewon the young history of the wind power short-term prediction Renewable andSustainable Energy Reviews 12(6)1725ndash1744 August 2008 ISSN 13640321 doi101016jrser200701015 URL httpdxdoiorg101016jrser200701015

Russ C Eberhart and James Kennedy A new optimizer using particle swarm theoryIn Proceedings of the sixth international symposium on micro machine and humanscience volume 1 pages 39ndash43 New York NY 1995

49

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 54: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

BIBLIOGRAPHY

Shu Fan James R Liao Ryuichi Yokoyama Luonan Chen and Wei-Jen Lee Fore-casting the wind generation using a two-stage network based on meteorologicalinformation Energy Conversion IEEE Transactions on 24(2)474ndash482 2009

Ulrich Focken Matthias Lange and Hans-Peter Waldl Previento-a wind powerprediction system with an innovative upscaling algorithm In Proceedings of theEuropean Wind Energy Conference Copenhagen Denmark volume 276 Citeseer2001

Lionel Fugon Jeacutereacutemie Juban and Georges Kariniotakis Data mining for windpower forecasting In European Wind Energy Conference amp Exhibition EWEC 2008pages 6ndashpages EWEC 2008

Patrick Gabrielsson Rikard Koumlnig and Ulf Johansson Evolving hierarchical temporalmemory-based trading models Springer 2013

MA Gaertner C Gallardo C Tejeda N Martiacutenez S Calabria N Martiacutenez andB Fernaacutendez The casandra project results of wind power 72-h range dailyoperational forecasting in spain In European Wind Energy Conference 2003

Gregor Giebel Lars Landberg Alfred Joensen Torben Skov Nielsen and HenrikMadsen The zephyr project the next generation prediction system Wind Powerfor the 21st Century Kassel 2000

Gregor Giebel Lars Landberg Torben Skov Nielsen and Henrik Madsen The zephyr-project The next generation prediction system In Proc of the 2001 EuropeanWind Energy Conference EWECrsquo01 Copenhagen Denmark pages 777ndash780 2001

Gregor Giebel Richard Brownsword George Kariniotakis Michael Denhard andCaroline Draxl The state-of-the-art in short-term prediction of wind power Aliterature overview Technical report ANEMOS plus 2011

Gerardo Gonzalez Belen Diaz-Guerra Fernando Soto Sara Lopez Ismael SanchezJulio Usaola Monica Alonso and Miguel G Lobo Sipreoacutelico-wind power predic-tion tool for the spanish peninsular power system Proceedings of the CIGREacute 40thgeneral session and exhibition Pariacutes France 2004

Jeff Hawkins and Sandra Blakeslee On intelligence Macmillan 2007

Tao Hong Pierre Pinson and Shu Fan Global energy forecasting competition 2012International Journal of Forecasting 30(2)357ndash363 2014

50

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 55: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

Kurt Hornik Maxwell Stinchcombe and Halbert White Multilayer feedforwardnetworks are universal approximators Neural networks 2(5)359ndash366 1989

Daizheng Huang Renxi Gong and Shu Gong Prediction of wind power by chaosand bp artificial neural networks approach based on genetic algorithm Journalof Electrical Engineering amp Technology 10(1)41ndash46 2015

Ashu Jain and Avadhnam Madhav Kumar Hybrid neural network models forhydrologic time series forecasting Applied Soft Computing 7(2)585ndash592 2007

Reneacute Jursa and Kurt Rohrig Short-term wind power forecasting using evolution-ary algorithms for the automated specification of artificial intelligence modelsInternational Journal of Forecasting 24(4)694ndash709 2008

Reneacute Jursa et al Wind power prediction with different artificial intelligence modelsIn Proceedings of the European Wind Energy Conference EWECrsquo07 2007

G Kariniotakis Position paper on joule project jor3-ct96-0119 1997

Georges Kariniotakis J Halliday R Brownsword Ignacio Marti Ana Maria Palo-mares I Cruz H Madsen TS Nielsen Henrik Aa Nielsen Ulrich Focken et alNext generation short-term forecasting of wind powerndashoverview of the anemosproject In European Wind Energy Conference EWEC 2006 pages 10ndashpages 2006

GN Kariniotakis GS Stavrakakis and EF Nogaret Wind power forecasting usingadvanced neural networks models Energy conversion ieee transactions on 11(4)762ndash767 1996

Stanley J Kemp Patricia Zaradic and Frank Hansen An approach for determiningrelative input parameter importance and significance in artificial neural networksecological modelling 204(3)326ndash334 2007

Andrew Kusiak Haiyang Zheng and Zhe Song Wind farm power prediction adata-mining approach Wind Energy 12(3)275ndash293 2009

Lars Landberg and Simon J Watson Short-term prediction of local wind conditionsBoundary-Layer Meteorology 70(1-2)171ndash195 1994

S Lang C Mohrlen J Jorgensen BO Gallachoacuteir and E McKeogh Aggregate forecast-ing of wind generation on the irish grid using a multi-scheme ensemble predictionsystem In international solar energy society uk section-conference-c volume 85page 89 2006

51

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 56: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

BIBLIOGRAPHY

SM Lawan WAWZ Abidin WY Chai A Baharun and T Masri Different models ofwind speed prediction a comprehensive review International Journal of Scientificamp Engineering Research 5(1)1760ndash1768 2014

Duehee Lee and Ross Baldick Short-term wind power ensemble prediction basedon gaussian processes and neural networks Smart Grid IEEE Transactions on 5(1)501ndash510 2014

Ritchie Lee and Mariam Rajabi Assessing nupic and cla in a machine learningcontext using nasa aviation datasets 2014

Kenneth Levenberg A method for the solution of certain problems in least squaresQuarterly of applied mathematics 2164ndash168 1944

Gong Li and Jing Shi On comparing three artificial neural networks for wind speedforecasting Applied Energy 87(7)2313ndash2320 2010

Ziqiao Liu Wenzhong Gao Yih-Huei Wan and Eduard Muljadi Wind power plantprediction by using neural networks In Energy Conversion Congress and Exposition(ECCE) 2012 IEEE pages 3154ndash3160 IEEE 2012

Henrik Madsen Pierre Pinson George Kariniotakis Henrik Aa Nielsen and Torben SNielsen Standardizing the performance evaluation of shortterm wind powerprediction models Wind Engineering 29(6)475ndash489 2005

Donald W Marquardt An algorithm for least-squares estimation of nonlinearparameters Journal of the Society for Industrial amp Applied Mathematics 11(2)431ndash441 1963

Warren S McCulloch and Walter Pitts A logical calculus of the ideas immanent innervous activity The bulletin of mathematical biophysics 5(4)115ndash133 1943

Marvin Minsky and Papert Seymour Perceptrons 1969

Marvin Lee Minsky and Oliver G Selfridge Learning in random nets MIT LincolnLaboratory 1960

Jorge J Moreacute The levenberg-marquardt algorithm implementation and theory InNumerical analysis pages 105ndash116 Springer 1978

Henrik Aa Nielsen Torben S Nielsen Henrik Madsen Maria J Pindado and IgnacioMarti Optimal combination of wind power forecasts Wind Energy 10(5)471ndash482 2007

52

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 57: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

Torben Skov Nielsen Alfred Joensen Henrik Madsen Lars Landberg and GregorGiebel A new reference for wind power forecasting Wind energy 1(1)29ndash341998

Torben Skov Nielsen Henrik Madsen Henrik Aalborg Nielsen Gregor Giebel andLars Landberg Prediction of regional wind power 2002

Harri Niska Teri Hiltunen Ari Karppinen Juhani Ruuskanen and MikkoKolehmainen Evolving the neural network model for forecasting air pollutiontime series Engineering Applications of Artificial Intelligence 17(2)159ndash1672004

Numenta Hierarchical temporal memory including htm cortical learning algorithmsv021 Technical report Numenta 2011

Riccardo Poli James Kennedy and Tim Blackwell Particle swarm optimizationSwarm intelligence 1(1)33ndash57 2007

A Rodrigues JA Peccedilas Lopes P Miranda L Palma C Monteiro R Bessa J SousaC Rodrigues and J Matos Eprevndasha wind power forecasting tool for portugal InProceedings of the European Wind Energy Conference EWEC volume 7 2007

David E Rumelhart Geoffrey E Hinton and Ronald J Williams Learning representa-tions by back-propagating errors Cognitive modeling 53 1988

Juumlrgen Schmidhuber Deep learning in neural networks An overview NeuralNetworks 6185ndash117 2015

S Sinkevicius R Simutis and V Raudonis Monitoring of humans traffic usinghierarchical temporal memory algorithms Elektronika ir Elektrotechnika 115(9)91ndash96 2011

Ke-Sheng Wang Vishal S Sharma and Zhen-You Zhang Scada data based conditionmonitoring of wind turbines Advances in Manufacturing 2(1)61ndash69 2014

WWEA 2014 half-year report wwea pp 1ndash8 Technical report 2014

Wenxian Yang Richard Court and Jiesheng Jiang Wind turbine condition moni-toring by the approach of scada data analysis Renewable Energy 53365ndash3762013

Hao Yu and Bogdan M Wilamowski Levenberg-marquardt training IndustrialElectronics Handbook 512ndash1 2011

53

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 58: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

Appendix A

Hyper-parameters

Parameters for the spatial pooler

Alias Default Description

columnCount - The number of cell columns in a corti-cal region

globalInhibition false If true inhibition phase of the winningcolumns are selected as the most activecolumns from the region as a whole

numActivePerInhArea 10 The maximum number of activecolumns per inhibition area

synPermActiveInc 01 The amount by which an activesynapse is incremented in each roundSpecified as a percent of a fully grownsynapse

synPermConnected 010 Controls the threshold of whensynapses are connected

synPermInactiveDec 001 The amount by which an inactivesynapse is decremented in each round

potentialRadius 16 This parameter determines the extentof the input that each column can po-tentially be connected to

Table A1 Table containing configuration parameters for the encoder

55

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 59: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

APPENDIX A HYPER-PARAMETERS

Parameters for the scalar encoder

Alias Symbol Description

w w number of bits to set in outputminval vmin the lower bound of the input valuemaxval vmax the upper bound of the input valuen n number of bits in the representation (n

must be gt w)radius r inputs separated by more than or

equal to this distance will have non-overlapping representations

resolution ψ inputs separated by more than orequal to this distance will have differ-ent representations

Table A2 Table containing configuration parameters for the spatial pooler

56

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 60: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

Parameter

Alias Default Description

activationThreshold 12 Activation threshold for segmentscellsPerColumn 32 Number of cells per columncolumnCount 2048 The number of cell columns in a cortical

regionglobalDecay 010 Decremented all synapses a little bit all the

timeinitialPerm 011 Initial permanence value for a synapseinputWidth - Size of the inputmaxAge 100000 Controls global decay Global decay will only

decay segments that have not been activatedfor maxAge iterations and will only do theglobal decay loop every maxAge iterations

maxSegmentsPerCel - The maximum number of segments can havemaxSynapsesPerSegment - The maximum number of cells a segment

can haveminThreshold 8 The minimum required activity for a seg-

ment to learnnewSynapseCount 15 The maximum number of synapses added to

a segment during learningpermanenceDec 010 How much permanence that is removed

from synapses when learning occurspermanenceInc 010 How much permanence that is added from

synapses when learning occurstemporalImp cpppy Controls what temporal memory to use

Table A3 Table containing configuration parameters for the temporal memory

57

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 61: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

Appendix B

Wind characteristics

Figure B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

59

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 62: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

APPENDIX B WIND CHARACTERISTICS

Figure B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of each dot reflectshow much normalized power was generated the stronger the more power

60

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 63: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

Appendix C

Error Distribution

61

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 64: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf1 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using nupic

Figure C1 Error distribution for different lead times WF 1

62

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 65: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using nupic

Figure C2 Error distribution for different lead times WF 2

63

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 66: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf3 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using nupic

Figure C3 Error distribution for different lead times WF 3

64

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 67: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using nupic

Figure C4 Error distribution for different lead times WF 4

65

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 68: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf5 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using nupic

Figure C5 Error distribution for different lead times WF 5

66

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 69: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using nupic

Figure C6 Error distribution for different lead times WF 6

67

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 70: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf7 using nupic

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using nupic

Figure C7 Error distribution for different lead times WF 7

68

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 71: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf1 using expektra

Figure C8 Error distribution for different lead times WF 1

69

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 72: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf2 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf2 using expektra

Figure C9 Error distribution for different lead times WF 2

70

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 73: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf3 using expektra

Figure C10 Error distribution for different lead times WF 3

71

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 74: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf4 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf4 using expektra

Figure C11 Error distribution for different lead times WF 4

72

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 75: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf5 using expektra

Figure C12 Error distribution for different lead times WF 5

73

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 76: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

APPENDIX C ERROR DISTRIBUTION

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cywf6 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf6 using expektra

Figure C13 Error distribution for different lead times WF 6

74

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 77: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

minus10 minus05 00 05 10

Error (lead time 48)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 40)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 30)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 20)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 10)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

minus10 minus05 00 05 10

Error (lead time 1)

0

10

20

30

40

50

60

70

Fre

quen

cy

wf7 using expektra

Figure C14 Error distribution for different lead times WF 7

75

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 78: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

List of Figures

21 A figure that presents the general outline when forecasting using thestatistical approach 6

22 A figure that presents the general steps when forecasting using a physicalmodel 7

31 The perceptron 2032 Architectural graph of the neural network that will produce a single

output value It consists of a collection of hidden neurons in each Hhidden layers as well as M input connections Each edge seen in thisgraph have a wij associated with it 21

33 Information flow of a single region predictive model created with the OPF 2334 The CLAClassifier 2835 Training a OPF model 29

41 Left Diagram Cumulated probability of wind-speed Right diagramScatter diagram of the power curve production vs wind speed Seenfor Wind Farm 1 GEFCom dataset (hourly data from January 1 2010 toJanuary 1 2012) 31

42 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 1 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

43 Left Diagram Wind-Speed vs Production Right diagram Wind-Speedvs Direction Seen for Wind Farm 2 GEFCom dataset (hourly data fromJanuary 1 2010 to January 1 2012) 32

44 Different error measurement for WF 1 3345 Different error measurement for WF 2 3446 Different error measurement for WF 3 3547 Different error measurement for WF 4 3648 Different error measurement for WF 5 37

76

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 79: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

49 Different error measurement for WF 6 38

410 Different error measurement for WF 7 39

411 Relative Input parameter importance using HIPR ldquoall-channelsrdquo reflectsthe reference point The reference point is the network when no channelhave been exposed to noise ws = wind speed hours week = times-tamps u v is a directional vector of the wind 41

412 Plot showing unoptimized version vs the optimized when training anetwork with 10 input neurons 10152025 hidden neurons 1 outputneuron Training was done for 100 epochs using LM 42

413 Summarized average improvement over all wind-farms with 95 confi-dence intervals 43

B1 Wind characteristics for WF 1 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 59

B2 Wind characteristics for WF 3-7 and WF The GEFCom dataset showingzonal and meridional wind components the intensity (alpha) of eachdot reflects how much normalized power was generated the strongerthe more power 60

C1 Error distribution for different lead times WF 1 62

C2 Error distribution for different lead times WF 2 63

C3 Error distribution for different lead times WF 3 64

C4 Error distribution for different lead times WF 4 65

C5 Error distribution for different lead times WF 5 66

C6 Error distribution for different lead times WF 6 67

C7 Error distribution for different lead times WF 7 68

C8 Error distribution for different lead times WF 1 69

C9 Error distribution for different lead times WF 2 70

C10 Error distribution for different lead times WF 3 71

C11 Error distribution for different lead times WF 4 72

C12 Error distribution for different lead times WF 5 73

C13 Error distribution for different lead times WF 6 74

77

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 80: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

List of Tables

C14 Error distribution for different lead times WF 7 75

List of Tables

31 The first repetition of the first period is 8 January 2011 at 0100 to 10January 2011 at 0000 The second repetition of the first period is 15January 2011 at 0100 to 17 January 2011 at 0000 In between theseperiods missing data with power observations are available for updatingthe models 16

32 Preprocessed features from the dataset The Forecast category representthe forecast given by the NWP there will exist multiple forecast for agiven date The latest issued forecast available is the features will willuse in training and testing 17

33 Example were n = 14 r = 5 ψ = 1 of various encoded scalar valuesusing a ScalarEncoder 24

41 NRMSE score of the entries published in [Hong et al 2014] The NuPICmodel and Expektra model are added so we can easily compare the result 40

A1 Table containing configuration parameters for the encoder 55A2 Table containing configuration parameters for the spatial pooler 56A3 Table containing configuration parameters for the temporal memory 57

78

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables
Page 81: Short-term wind power forecasting using artificial neural ...865336/FULLTEXT01.pdf · 1. This study will focus on short-term forecasting. i.e forecasts done for 1 −48 hours ahead.

wwwkthse

  • Contents
  • Introduction
    • Problem Formulation
    • The scope of the problem
      • Background
        • Neural Networks and Time Series Prediction
          • Method and Materials
            • Preliminaries
              • Remarks
              • Definitions
              • Reference models
              • Error metrics
              • Model selection
              • Evaluation
                • Experiments
                • Holdback Input Randomization
                • Optimization methods
                • Neural Networks
                  • Multilayer Perceptron
                  • Numenta Platform for Intelligent Computing
                      • Result
                        • Experimental results
                        • Input Importance
                          • Adaptation and Optimization
                            • Summary
                              • Discussion
                                • Method development issues
                                • Future improvements and directions
                                • Conclusions
                                  • Bibliography
                                  • Appendices
                                  • Hyper-parameters
                                  • Wind characteristics
                                  • Error Distribution
                                  • List of Figures
                                  • List of Tables