Time Series Forecasting With Feed-Forward Neural Networks:
-
Upload
adnanmukred -
Category
Documents
-
view
168 -
download
7
Transcript of Time Series Forecasting With Feed-Forward Neural Networks:
Eric Plummer
Computer Science Department
University of Wyoming
April 8, 2023
Time Series Forecasting WithFeed-Forward Neural Networks:
Guidelines And Limitations
April 8, 2023 Eric Plummer 2
TopicsTopics
• Thesis Goals• Time Series Forecasting• Neural Networks• K-Nearest-Neighbor• Test-Bed Application• Empirical Evaluation• Data Preprocessing• Contributions• Future Work• Conclusion• Demonstration
April 8, 2023 Eric Plummer 3
Thesis GoalsThesis Goals
• Compare neural networks and k-nearest-neighbor for time series forecasting
• Analyze the response of various configurations to data series with specific characteristics
• Identify when neural networks and k-nearest-neighbor are inadequate
• Evaluate the effectiveness of data preprocessing
April 8, 2023 Eric Plummer 4
Time Series Forecasting –Time Series Forecasting –DescriptionDescription
• What is it?– Given an existing data series, observe or model the
data series to make accurate forecasts
• Example data series– Financial (e.g., stocks, rates)
– Physically observed (e.g., weather, sunspots)
– Mathematical (e.g., Fibonacci sequence)
April 8, 2023 Eric Plummer 5
Time Series Forecasting –Time Series Forecasting –DifficultiesDifficulties
• Why is it difficult?– Limited quantity of data
• Observed data series sometimes too short to partition
– Noise • Erroneous data points• Obscuring component
– Moving Average
– Nonstationarity• Fundamentals change over time• Nonstationary mean: “Ascending” data series
– First-difference preprocessing
– Forecasting method selection • Statistics• Artificial intelligence
April 8, 2023 Eric Plummer 6
Time Series Forecasting –Time Series Forecasting –ImportanceImportance
• Why is it important?– Preventing undesirable events by forecasting the
event, identifying the circumstances preceding the event, and taking corrective action so the event can be avoided (e.g., inflationary economic period)
– Forecasting undesirable, yet unavoidable, events to preemptively lessen their impact (e.g., solar maximum w/ sunspots)
– Profiting from forecasting (e.g., financial markets)
April 8, 2023 Eric Plummer 7
Neural Networks – Neural Networks – BackgroundBackground
• Loosely based on the human brain’s neuron structure• Timeline
– 1940’s – McCulloch and Pitts – proposed neuron models in the form of binary threshold devices and stochastic algorithms
– 1950’s & 1960’s – Rosenblatt – class of learning machines called perceptrons
– Late 1960’s – Minsky and Papert – discouraging analysis of perceptrons (linearly separable classes)
– 1980’s – Rumelhart, Hinton, and Williams – generalized delta rule for learning by back-propagation for training multilayer perceptrons
– Present – many new training algorithms and architectures, but nothing “revolutionary”
April 8, 2023 Eric Plummer 8
Neural Networks –Neural Networks –ArchitectureArchitecture
• A feed-forward neural network can have any number of:– Layers– Units per layer– Network inputs– Network outputs
• Hidden layers (A, B)• Output layer (C)
April 8, 2023 Eric Plummer 9
Neural Networks –Neural Networks –UnitsUnits
• A unit has:– Connections– Weights– Bias– Activation function
• Weights and bias are randomly initialized before training
• Unit’s input consists of:– Sum of the products of each connection
value and associated weight– Add the bias
• Input is then fed into unit’s activation function
• Unit’s output is the output of activation function
– Hidden layers: Sigmoid– Output layer: Linear
April 8, 2023 Eric Plummer 10
Neural Networks –Neural Networks –TrainingTraining
• Partition data series into:– Training set– Validation set (optional)– Test set (optional)
• Typically, the training procedure is:– Perform backpropagation training with training set– After n epochs, compute total squared error on training set
and validation set– If consistently validation error and training error , stop
training.• Overfitting: Training set learned too well• Generalization: Given inputs not in training and validation sets,
able to accurately forecast
April 8, 2023 Eric Plummer 11
Neural Networks –Neural Networks –TrainingTraining
• Backpropagation training:– First, examples in the form of <input, output> pairs are
extracted from the data series– Then, the network is trained with backpropagation on the
examples:1. Present an example’s input vector to the network inputs and
run the network sequentially forward2. Propagate the error sequentially backward from the output layer 3. For every connection, change the weight modifying that
connection in proportion to the error
– When all three steps have been performed for all examples, one epoch has occurred
– Goal is to converge to a near-optimal solution based on the total squared error
April 8, 2023 Eric Plummer 12
Neural Networks –Neural Networks –TrainingTraining
Backpropagation training cycle
April 8, 2023 Eric Plummer 13
Neural Networks –Neural Networks –ForecastingForecasting
• Forecasting method depends on examples
• Examples depend on step-ahead size
If step-ahead size is one: Iterative forecasting
If step-ahead size is greater than one: Direct forecasting
April 8, 2023 Eric Plummer 14
Neural Networks –Neural Networks –ForecastingForecasting
Iterative forecasting
Can continue this indefinitely
April 8, 2023 Eric Plummer 15
Neural Networks –Neural Networks –ForecastingForecasting
Directly forecasting n steps
This is the only forecast
April 8, 2023 Eric Plummer 16
K-Nearest-Neighbor –K-Nearest-Neighbor –ForecastingForecasting
• No model to train• Simple linear
search• Compare
reference to candidates
• Select k candidates with lowest error
• Forecast is average of k next values
April 8, 2023 Eric Plummer 17
Test-Bed Application –Test-Bed Application –FORECASTERFORECASTER
• Written in Visual C++ with MFC• Object-oriented• Multithreaded• Wizard-based• Easily modified• Implements feed-forward neural networks & k-
nearest-neighbor• Used for time series forecasting• Eventually will be upgraded for classification
problems
Empirical Evaluation – Data SeriesEmpirical Evaluation – Data Series
Original
0
5
10
15
20
25
30
35
0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105
112
119
126
133
140
147
154
161
168
175
182
189
196
203
210
Data Point
Va
lue
Original with Less Noisy
-5
0
5
10
15
20
25
30
35
0 7 14
21
28
35
42
49
56
63
70
77
84
91
98
105
112
119
126
133
140
147
154
161
168
175
182
189
196
203
210
Data Point
Va
lue
Original Less Noisy
Original with More Noisy
-10
-5
0
5
10
15
20
25
30
35
40
0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105
112
119
126
133
140
147
154
161
168
175
182
189
196
203
210
Data Point
Va
lue
Original More NoisyOriginal with Ascending
0
10
20
30
40
50
60
0 7 14
21
28
35
42
49
56
63
70
77
84
91
98
105
112
119
126
133
140
147
154
161
168
175
182
189
196
203
210
Data Point
Va
lue
Original Ascending
Sunspots 1784-1983
0
20
40
60
80
100
120
140
160
180
200
178
4
179
1
179
8
180
5
181
2
181
9
182
6
183
3
184
0
184
7
185
4
186
1
186
8
187
5
188
2
188
9
189
6
190
3
191
0
191
7
192
4
193
1
193
8
194
5
195
2
195
9
196
6
197
3
198
0
Year
Co
un
t
Original
More Noisy
Less Noisy
Ascending
Sunspots
April 8, 2023 Eric Plummer 19
Empirical Evaluation –Empirical Evaluation –Neural Network ArchitecturesNeural Network Architectures
• Number of network inputs based on data series
• Need to make unambiguous examples
• For “sawtooths”:– 24 inputs are necessary– Test networks with 25 &
35 inputs– Test networks with 1
hidden layer with 2, 10, & 20 hidden layer units
– One output layer unit
• For sunspots:– 30 inputs– 1 hidden layer with 30
units• For real-world data series,
selection may be trial-and-error!
April 8, 2023 Eric Plummer 20
Empirical Evaluation –Empirical Evaluation –Neural Network TrainingNeural Network Training
• Heuristic method:– Start with aggressive
learning rate– Gradually lower learning
rate as validation error increases
– Stop training when learning rate cannot be lowered anymore
• Simple method:– Use conservative
learning rate– Training stops when:
• Number of training epochs equals the epochs limit -or-
• Training error is less than or equal to error limit
April 8, 2023 Eric Plummer 21
Empirical Evaluation –Empirical Evaluation –Neural Network ForecastingNeural Network Forecasting
• Metric to compare forecasts: Coefficient of Determination– Value may be (-, 1]– Want value between 0
and 1, where 0 is forecasting the mean of the data series and 1 is forecasting the actual value
– Must have actual values to compare with forecasted values
• For networks trained on original, less noisy, and more noisy data series, forecast will be compared to original series
• For networks trained on ascending data series, forecast will be compared to continuation of ascending series
• For networks trained on sunspots data series, forecast will be compared to test set
April 8, 2023 Eric Plummer 22
Empirical Evaluation –Empirical Evaluation –K-Nearest-NeighborK-Nearest-Neighbor
• Choosing window size analogous to choosing number of neural network inputs
• For sawtooth data series:– k = 2
– Test window sizes of 20, 24, and 30
• For sunspots data series:– k = 3
– Window size of 10
• Compare forecasts via coefficient of determination
April 8, 2023 Eric Plummer 23
Empirical Evaluation –Empirical Evaluation –Candidate SelectionCandidate Selection
• Neural networks– For each training method, data series, and
architecture, 3 candidates were trained
– Also, average of 3 candidates’ forecasts was taken: forecasting by committee
– Best forecast was selected based on coefficient of determination
• K-nearest-neighbor– For each data series, k, and window size, only one
search was performed (only one needed)
Empirical Evaluation – Original Data SeriesEmpirical Evaluation – Original Data Series
Nets Trained on Original
-10
-5
0
5
10
15
20
25
30
35
21
6
21
9
22
2
22
5
22
8
23
1
23
4
23
7
24
0
24
3
24
6
24
9
25
2
25
5
25
8
26
1
26
4
26
7
27
0
27
3
27
6
27
9
28
2
28
5
Data Point
Va
lue
Original 35,2 35,10 35,20
Nets Trained on Original
-5
0
5
10
15
20
25
30
35
21
6
21
9
22
2
22
5
22
8
23
1
23
4
23
7
24
0
24
3
24
6
24
9
25
2
25
5
25
8
26
1
26
4
26
7
27
0
27
3
27
6
27
9
28
2
28
5
Data Point
Va
lue
Original 35,2 35,10 35,20
Nets Trained on Original
-150
-100
-50
0
50
100
150
216
219
222
225
228
231
234
237
240
243
246
249
252
255
258
261
264
267
270
273
276
279
282
285
Data Point
Va
lue
Original 25,10 25,20
K-Nearest-Neighbor on Original
0
5
10
15
20
25
30
35
216
219
222
225
228
231
234
237
240
243
246
249
252
255
258
261
264
267
270
273
276
279
282
285
Data Point
Va
lue
Original 2,20 2,24 2,30
Simple NNHeuristic NN
Smaller NN K-N-N
Empirical Evaluation – Less Noisy Data SeriesEmpirical Evaluation – Less Noisy Data Series
Simple NNHeuristic NN
K-N-N
Nets Trained on Less Noisy
-10
-5
0
5
10
15
20
25
30
35
40
216
219
222
225
228
231
234
237
240
243
246
249
252
255
258
261
264
267
270
273
276
279
282
285
Data Point
Va
lue
Original 35,2 35,10 35,20
Nets Trained on Less Noisy
-20
-10
0
10
20
30
40
216
219
222
225
228
231
234
237
240
243
246
249
252
255
258
261
264
267
270
273
276
279
282
285
Data Point
Va
lue
Original 35,2 35,10 35,20
K-Nearest-Neighbor on Less Noisy
0
5
10
15
20
25
30
35
216
219
222
225
228
231
234
237
240
243
246
249
252
255
258
261
264
267
270
273
276
279
282
285
Data Point
Va
lue
Original 2,20 2,24 2,30
Empirical Evaluation – More Noisy Data SeriesEmpirical Evaluation – More Noisy Data Series
Simple NNHeuristic NN
K-N-N
Nets Trained on More Noisy
-20
-10
0
10
20
30
40
50
60
216
219
222
225
228
231
234
237
240
243
246
249
252
255
258
261
264
267
270
273
276
279
282
285
Data Point
Va
lue
Original 35,10 35,20
Nets Trained on More Noisy
-30
-20
-10
0
10
20
30
40
50
60
216
219
222
225
228
231
234
237
240
243
246
249
252
255
258
261
264
267
270
273
276
279
282
285
Data Point
Va
lue
Original 35,10 35,20
K-Nearest-Neighbor on More Noisy
-10
-5
0
5
10
15
20
25
30
35
216
219
222
225
228
231
234
237
240
243
246
249
252
255
258
261
264
267
270
273
276
279
282
285
Data Point
Va
lue
Original 2,20 2,24 2,30
Empirical Evaluation – Ascending Data SeriesEmpirical Evaluation – Ascending Data Series
Simple NNHeuristic NNNets Trained on Ascending
0
10
20
30
40
50
60
70
216
219
222
225
228
231
234
237
240
243
246
249
252
255
258
261
264
267
270
273
276
279
282
285
Data Point
Va
lue
Ascending 35,10 35,20
Nets Trained on Ascending
0
10
20
30
40
50
60
70
216
219
222
225
228
231
234
237
240
243
246
249
252
255
258
261
264
267
270
273
276
279
282
285
Data Point
Va
lue
Ascending 35,2 35,10 35,20
Empirical Evaluation – Longer ForecastEmpirical Evaluation – Longer Forecast
Nets Trained on Less Noisy (Longer Forecast)
-80
-60
-40
-20
0
20
40
60
216
221
226
231
236
241
246
251
256
261
266
271
276
281
286
291
296
301
306
311
316
321
326
331
336
341
346
351
356
Data Point
Va
lue
Original 35,2 35,10 35,20
Nets Trained on More Noisy (Longer Forecast)
-100
-50
0
50
100
150
216
221
226
231
236
241
246
251
256
261
266
271
276
281
286
291
296
301
306
311
316
321
326
331
336
341
346
351
356
Data Point
Va
lue
Original 35,10 35,20
Heuristic NN
Empirical Evaluation – Sunspots Data SeriesEmpirical Evaluation – Sunspots Data Series
Sunspots 1950-1983
-50
0
50
100
150
200
250
19
50
19
52
19
54
19
56
19
58
19
60
19
62
19
64
19
66
19
68
19
70
19
72
19
74
19
76
19
78
19
80
19
82
Year
Co
un
t
Test Set 30,30 Neural Net 3,10 K-Nearest-Neighbor
Simple NN & K-N-N
April 8, 2023 Eric Plummer 30
Empirical Evaluation –Empirical Evaluation –DiscussionDiscussion
• Heuristic training method observations:
– Networks train longer (more epochs) on smoother data series like the original and ascending data series
– The total squared error and unscaled error are higher for noisy data series
– Neither the number of epochs nor the errors appear to correlate well with the coefficient of determination
– In most cases, the committee forecast is worse than the best candidate's forecast
• When actual values are unavailable, choosing the best candidate is difficult!
April 8, 2023 Eric Plummer 31
Empirical Evaluation –Empirical Evaluation –DiscussionDiscussion
• Simple training method observations:
– The total squared error and unscaled error are higher for noisy data series with the exception of the 35:10:1 network trained on the more noisy data series
– The errors do not appear to correlate well with the coefficient of determination
– In most cases, the committee forecast is worse than the best candidate's forecast
– There are four networks whose coefficient of determination is negative, compared with two for the heuristic training method
Coefficient of Determination Comparison
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Original Less Noisy More Noisy Ascending
Data Series
Co
eff
icie
nt
of
De
term
ina
tio
n
35,2 35,10 35,20
Coefficient of Determination Comparison
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Original Less Noisy More Noisy Ascending
Data Series
Co
eff
icie
nt
of
De
term
ina
tio
n
35,2 35,10 35,20
April 8, 2023 Eric Plummer 32
Empirical Evaluation –Empirical Evaluation –DiscussionDiscussion
• General observations:– One training method did not appear to be clearly better – Increasingly noisy data series increasingly degraded the forecasting
performance– Nonstationarity in the mean degraded the performance– Too few hidden units (e.g., 35:2:1) forecasted well on simpler data
series, but failed for more complex ones– Excessive numbers of hidden units (e.g, 35:20:1) did not hurt
performance– Twenty-five network inputs was not sufficient– K-nearest-neighbor was consistently better than the neural networks – Feed-forward neural networks are extremely sensitive to architecture
and parameter choices, and making such choices is currently more art than science, more trial-and-error than absolute, more practice than theory!
April 8, 2023 Eric Plummer 33
Data PreprocessingData Preprocessing
• First-difference– For ascending data series, a neural network trained on first-
difference can forecast near perfectly– In that case, it is better to train and forecast on first-
difference– FORECASTER reconstitutes forecast from its first-difference
• Moving average– For noisy data series, moving average would eliminate much
of the noise– But would also smooth out peaks and valleys– Series may then be easier to learn and forecast– But in some series, the “noise” may be important data (e.g.,
utility load forecasting)
April 8, 2023 Eric Plummer 34
ContributionsContributions
• Filled a void within feed-forward neural network time series forecasting literature: know how networks respond to various data series characteristics in a controlled environment
• Showed that k-nearest-neighbor is a better forecasting method for the data series used in this research
• Reaffirmed that neural networks are very sensitive to architecture, parameter, and learning method changes
• Presented some insight into neural network architecture selection: selecting number of network inputs based on data series
• Presented a neural network training heuristic that produced good results
April 8, 2023 Eric Plummer 35
Future WorkFuture Work
• Upgrade FORECASTER to work with classification problems
• Add more complex network types, including wavelet networks for time series forecasting
• Investigate k-nearest-neighbor further• Add other forecasting methods, (e.g., decision trees
for classification)
April 8, 2023 Eric Plummer 36
ConclusionConclusion
• Presented:– Time series forecasting
– Neural networks
– K-nearest-neighbor
– Empirical evaluation
• Learned a lot about the implementation details of the forecasting techniques
• Learned a lot about MFC programming
April 8, 2023 Eric Plummer 37
DemonstrationDemonstration
Various files can be found at:http://w3.uwyo.edu/~eplummer
xHiddenc
P
ppcpcHiddenc e
xhwherebwihO
1
1)(
1,,
xxhwherebwihO Outputc
P
ppcpcOutputc
)(
1,,
))(( ccOutputc ODxh
N
ncnnHiddenc wxh
1,)(
pcpc Ow ,
Unit Output, Error, and Weight Unit Output, Error, and Weight Change FormulasChange Formulas
xthanforecastworseaisxifk
xxgenerallyif
xthanforecastbetteraisxifk
xxiif
r
i
i
i
ii
ˆ0
ˆ0
ˆ10
ˆ1
2
C
cccC ODE
1
2
2
1
C
cccC UOUDUE
1
n
ii
n
iii
xx
xxr
1
2
1
2
2
)(
)ˆ(1
Forecast Error FormulasForecast Error Formulas
April 8, 2023 Eric Plummer 40
Related WorkRelated Work
• Drossu and Obradovic (1996): hybrid stochastic and neural network approach to time series forecasting
• Zhang and Thearling (1994): parallel implementations of neural networks and memory-based reasoning
• Geva (1998): multiscale fast wavelet transform and an array of feed-forward neural networks
• Lawrence, Tsoi, and Giles (1996): encodes the series with a self-organizing map and uses recurrent neural networks
• Kingdon (1997): automated intelligent system for financial forecasting and uses neural networks and genetic algorithms