Layered Cascade Artificial Neural Network - CECS -...
Transcript of Layered Cascade Artificial Neural Network - CECS -...
Layered Cascade Artificial Neural Network
A thesis submitted for the degree of
Master of Computing in Computer Science of
The Australian National University
by
Tengfei Shen
Supervisor: Prof. Tom Gedeon
Research School of Computer Science
College of Engineering & Computer Science
The Australian National University
November 2011
Tengfei Shen (4981890)
2
Acknowledgement
I am heartily thankful to my supervisor, Prof. Tom Gedeon, whose encouragement, guidance and support helped me in many respects during the completion of the project.
Tengfei Shen
Layered Cascade Neural Network
3
Abstract
Constructive algorithms have proved to be powerful methods for training feedforward
neural networks. The CasPer algorithm is a constructive neural network algorithm, it
generates networks from a simple architecture and then expands it. The A_CasPer
algorithm is a modified version of the CasPer algorithm which uses a candidate pool
instead of a single neuron being trained. This project adds an extension to the A_CasPer
algorithm in terms of the network architecture – The Layered_CasPer algorithm. The
hidden neurons form as layers in the new version of the network structure which results
the less computational cost being required. Beyond the network structure, other aspects
of Layered_CasPer are the same as A_CasPer. The Layered_CasPer algorithm extension is
benchmarked on a number of classification and regression problems and compared to
other constructive algorithms, which are CasCor, CasPer, A_CasPer, and AT_CasPer. It is
shown that Layered_CasPer has a better performance on the datasets which have a large
number of inputs for classification tasks. The Layered_CasPer algorithm has an advantage
over other cascade style constructive algorithms in being more similar in topology to the
familiar layered structure of traditional feedforward neural networks. This may lead to
good acceptance of this technique.
Furthermore, an implement action of CasPer, A_CasPer, AT_CasPer and Layered_CasPer is
presented in this thesis. At the end of the thesis, two new ideas for improving the
Layered_CasPer algorithm are suggested for future work.
Tengfei Shen (4981890)
4
Table of Contents 1 Introduction ................................................................................................................................................. 7
1.1 Motivation ........................................................................................................................................... 7
1.2 Objective of Project ............................................................................................................................. 7
1.3 Contribution ........................................................................................................................................ 7
1.4 Report Organization ........................................................................................................................... 8
2 Relevant Techniques and Concepts............................................................................................................ 8
2.1 Cascade Correlation Algorithm (CasCor Algorithm) ........................................................................ 8
2.2 CasPer Algorithm & A_CasPer Algorithm ........................................................................................ 10
2.3 AT_CasPer Algorithm ........................................................................................................................ 12
3 Implementation of Layered Cascade Neural Network ............................................................................ 13
3.1 Layered Cascade Neural Network ................................................................................................... 13
3.2 Program Introduction ...................................................................................................................... 16
3.3 Key Techniques ................................................................................................................................. 22
3.4 Testing of Program ........................................................................................................................... 24
4 Evaluation of Layered CasPer Neural Network ....................................................................................... 24
4.1 Experiment 1: Results Comparison of Classification Tasks ........................................................... 24
4.1.1 Experiment Description ............................................................................................................... 24
4.1.2 The Process and Evaluation of Experiment ................................................................................ 25
4.2 Experiment 2: Two Spirals Benchmark .......................................................................................... 30
4.2.1 Experiment Description ............................................................................................................... 30
4.2.2 Evaluation of Experiment ............................................................................................................ 31
4.3 Experiment 3: Results Comparison of Regression Tasks............................................................... 33
4.3.1 Experiment Description ............................................................................................................... 33
4.3.2 The Process and Evaluation of Experiment ................................................................................ 33
5 Potential of Layered Cascade Neural Network ........................................................................................ 36
5.1 Self-Evaluating Layered Cascade Neural Network ...................................................................... 36
5.2 Random Limit Layered Cascade Neural Network .......................................................................... 36
5.3 Limited Connections Layered Cascade Neural Network ............................................................... 37
6 Conclusion .................................................................................................................................................. 37
References .......................................................................................................................................................... 39
Appendix A: Screenshots of Cascade Neural Network Toolbox v1.2 ............................................................. 40
Appendix B: Experiment Results ...................................................................................................................... 44
Layered Cascade Neural Network
5
List of Figures
Fig. 1 The Cascade architecture, initial state and after adding two hidden units…...………………….9
Fig. 2 The CasPer structure (Fahlman ,1990)...................……….………………………………………………….………..10 Fig. 3 A cascade tower architecture with a tower size of 3(Treadgold & Gedeon, 1998…………..11 Fig. 4.1 New hidden neuron installation comparison between CasPer and Layered_CasPer………...…….14 Fig. 4.2 New hidden neuron installation comparison between CasPer and Layered_CasPer
in Fahlman display method……………………………......………………...………………………………………..……15 Fig. 5 Sample of net file……………………………………………………………………………………………...…………………..17 Fig. 6 Working process of the program………………………………………………………….……...………………………..19 Fig. 7.1 Sample of the connection matrix…………………………………………………………………………………………...23 Fig. 7.2 Network Structure Representations of Fig. 7.1………………………………………………….………………….23 Fig. 8 Glass hidden neuron results ………….……………………………..……………..…………………………………….…28 Fig. 9 Soybean hidden neuron results …….………………………………………...………………………………….......….28 Fig. 10 Soybean Test Results …….………………………………………………………………………………………………….29 Fig. 11 Cancer Test Results …….………………………………………………………………………………………..………….29 Fig. 12 Connections in a given size network……………………………...…………………………….….……………………30 Fig. 13 The two spirals training set………………………………………………………………..…………………………….31 Fig. 14 The two spirals testing set …………………………………………………………………….……………………...…..32 Fig. 15 Result of the Layered_CasPer algorithm ………………………………………………….……..…….……………32 Fig. 16 Result of the CasPer algorithm (Treadgold & Gedeon, 1998)…………………………………..…………32 Fig. 17 Result of the CasCor algorithm (Treadgold & Gedeon, 1998)………………………………………….….32 Fig. 18 Harm test results – noise free …………………………………………………………….…………….....…………….35 Fig. 19 Harm test results - noisy …………………………………………………………………………………………...…….35 Fig. 20 Sample of Random Limit Layered Cascade Neural Network without connection details……….37
Tengfei Shen (4981890)
6
List of Tables
Table 1 Function Table………………………………………………………………………………………………...………………….20
Table 2 Attributes of Proben 1 data set used in the experiment.....……….……..…………………….………..24 Table 3 Comparison of A_CasPer, AT_CasPer, Layered_CasPer1(without candidate pool)
and Layered_CasPer2(with candidate pool)……………………….……………………… …………..26 Table 4 Test results on data sets with large number of inputs…..…………………..…………………………...……….27 Table 5 Comparison of FVUs of A_CasPer, AT_CasPer and Layered_CasPer………,………………………34 . .
Layered Cascade Neural Network
7
1 Introduction
1.1 Motivation
As a challenge faced in the field of feedforward neural networks, model selection involves
matching the complexity of the function to be similar to the complexity of the model. The
factors that determine the complexity of model are connection topology, weight number
and magnitude. Underfitting and poor generalization happen if a model does not have
enough complexity to approximate the target function. Whereas overfitting and poor
generalization occur if a model is too complex. There are three selection technique
groups: “those that perform a search through models, those that begin with an overly
complex model which is then simplified, and those that begin with a simple model whose
complexity is increased” [1]. Cascade Correlation (CasCor) [2], CasPer [3], [4], A_CasPer
[1] and Layered CasPer are four constructive algorithms that select a small size initial
network, so they all belong to the third group. These algorithms spend less network
training time than the algorithms which start training with an oversize network. They
also tend to avoid the problem of encountering poorly performing local minima.
Layered_CasPer was suggested by Tom Gedeon in 2011. This constructive algorithm
provides an improvement in understandability of the connection method of the network
as it is more similar to the familiar layered structure of many neural network models. The
hidden neurons form as layers in the new network architecture. It is clear that the
number of connections of network in Layered_CasPer is less than CasPer’s. That means
the calculation cost is less than CasPer. The motivation of this project is to implement and
evaluate this proposed algorithm.
1.2 Objective of Project
The aim of this project is to understand, implement and evaluate the Layered_CasPer
constructive algorithm. Matlab is selected as the programming platform to achieve the
objective of this project.
1.3 Contribution
The program for implementation of the Layered_CasPer constructive learning algorithm
by using Matlab is the main contribution of this report. Another significant contribution
of the report is a series of experiments for evaluating the Layered_CasPer constructive
learning algorithm.
Tengfei Shen (4981890)
8
1.4 Report Organization
Chapter 2 gives an overview of the relevant techniques and concepts. Chapter 3
introduces the Layered_CasPer algorithm and the program which can implement
Layered_CasPer as well as some comparison algorithms. Chapter 4 describes three
experiments which evaluate the Layered_CasPer algorithm. The next two chapters,
chapter 5 introduces two ideas for improving the Layered_CasPer algorithm, chapter 6 is
the conclusion of this thesis.
2 Relevant Techniques and Concepts
2.1 Cascade Correlation Algorithm (CasCor Algorithm)
Cascade-Correlation is a constructive and supervised learning algorithm for neural
network. It was introduced by Scott Fahlman and Christian Lebiere in 1991. Cascade-
Correlation starts with a minimal size network, then repeatedly trains and installs new
hidden neurons one by one, generating a multi-layer topology instead of just adjusting
the weights in a network with a fixed topology. A very interesting feature of this
algorithm is that a new hidden neuron’s input weights are frozen once it has been
installed into the network. This unit then becomes a fixed unit in the network, available
for giving outputs for generating other more complex units.
As shown in Fig .1, the architecture of Cascade-Correlation algorithm begins with the pre-
set inputs and outputs but without hidden neurons. The initial structure of the network is
dictated by the problem and by the I/O representation which is chosen by the
experimenter. There is also a bias input, which is constantly set to +1. Hidden neurons are
installed into network one by one. Every new hidden neuron receives a connection from
each previous hidden neuron and from the original inputs of the network. The installed
hidden neuron’s input weights are frozen, only the output connections are trained.
Layered Cascade Neural Network
9
Fig. 1. The Cascade architecture, initial state and after adding two hidden units. The
vertical lines sum all incoming activation. Boxed connections (□) are frozen, X
connections (x) are trained repeatedly [2]
Tengfei Shen (4981890)
10
The Cascade-Correlation architecture has several advantages over existing algorithms: “It
learns very quickly, the network determines its own size and topology, it retains the
structures it has built even if the training set changes, and it requires no back-
propagation of error signals through the connections of the network” [2].
2.2 CasPer Algorithm & A_CasPer Algorithm
The CasPer algorithm was introduced by Nick Treadgold and Tom Gedeon in 1996. As a
constructive neural network algorithm, CasPer builds network structures in a similar way
to Cascade Correlation: they all begin with a single hidden neuron and successively install
hidden neurons. The main distinction between CasPer and Cascade Correlation is the
training method. As previously mentioned, the hidden neurons’ input weights are frozen
and only the output connections are trained in Cascade Correlation, whereas CasPer
trains all connections of the network.
Using a modified version of RPROP algorithm — Progressive RPROP, to train the network
after adding new hidden neuron is a difference between CasPer and Cascade Correlation.
RPROP is a gradient descent algorithm using individual adaptive learning rates for each
weight, which starts with an initial learning rate that is then adapted based on the sign of
the error gradient seen by that weight as it climbs the error surface [4]. Fig. 2 shows the
network is separated into three different groups, and each group has its own learning
rate: LR1, LR2 and LR3. The first group includes all weights which connect to the new
neuron from previous hidden neurons and inputs. The second group is made up of all
weights that connect the output of the new hidden unit to the outputs. The third group
consists of the rest of the weights. The relationship between the magnitudes of LR1, LR2
and LR3 is LR1>>LR2>LR3. It is similar to the correlation measure of Cascade
Correlation: the highest value of LR1 allows the new hidden unit to learn the rest of the
network error. Similarly, the high value of LR2 as compared to LR3 allows the new
hidden unit to cut down the error of network and avoids over interference from other
weights.
Layered Cascade Neural Network
11
Fig. 2. The CasPer structure – a second hidden neuron has
just been added. The vertical lines sum all incoming values.
(This display method was introduced by Fahlman 1990)
A_CasPer is a modified version of the CasPer algorithm with the following modifications.
First, there is a candidate pool of hidden neurons trained instead of a single hidden
neuron. Each hidden neuron in the pool is continuously connected to the network in the
usual manner of CasPer. Each hidden neuron in the candidate pool has its own training
process and weights. Finally, the network with the best generalization performance is
selected, and its weights are kept. A new candidate pool is then generated and the process
is repeated until the convergence criterion is satisfied. Another important point is that a
different decay level is used for the network each time a new neuron in the pool is
inserted in that process.
The CasPer algorithm has been shown to create networks with fewer hidden units than
the CasCor algorithm, and also has better generalization [5]. “A_CasPer is generally able
to improve generalization results compared to CasPer using optimized decay levels. This
is especially apparent in the data sets containing noise, where A_CasPer not only obtains
better generalization results, but are also able to avoid overfitting as the network
continues to grow”[6].
Tengfei Shen (4981890)
12
The reason to compare these four algorithms in the experiments of this report is that they
are all constructive algorithms using cascade architecture network and with good
evaluation results. The A_CasPer algorithm performs best from CasPer, A_CasPer and
AT_CasPer for classification and regression tasks. So the most relevant comparison for
Layered_CasPer is the A_CasPer algorithm.
2.3 AT_CasPer Algorithm
The AT_CasPer algorithm is modified version of the CasPer algorithm which uses a series
of cascade tower instead of a single cascade of hidden neurons to build the networks. The
main target of this algorithm is to limit the network depth. The network training manner
of AT_CasPer is the same as CasPer’s. Each hidden neuron receives a connection from the
inputs and connects to the outputs. The hidden neurons connect to each other only in the
same tower. When the maximum cascade depth is reached, the next hidden neuron
begins a new cascade tower. There is no connection between towers. An example of this
structure is shown in Fig. 3. The AT_CasPer algorithm produces slightly less good results
than A_CasPer but reduces computational cost.
Fig. 3. A cascade tower architecture with a tower size of 3[7]
Layered Cascade Neural Network
13
3 Implementation of Layered Cascade Neural Network
3.1 Layered Cascade Neural Network
The Layered Cascade model is an idea for improving the CasPer algorithm which was
suggested by Tom Gedeon. It is suggested a modified version of the CasPer algorithm for
constructing networks. Layered_CasPer builds cascade networks in a similar manner to
CasPer: Layered_CasPer begins with a simple architecture and installs single hidden
neurons successively and it uses RPROP gradient descent algorithm to train the whole
network each time a hidden neuron is installed. The candidate pool can also be used in
Layered_CasPer. As a very important parameter, the maximum size of each layer should
be set first.
This modification of CasPer focuses on the architecture of the network. In the layered
cascade neural network, the hidden neurons form as layers and there are no connections
between neurons which are in the same layer. Fig. 4.1 shows the different manners of
adding a new hidden neuron between CasPer and Layered_CasPer. Fig 4.2 illustrates the
same comparison but in Fahlman’s display method. New neurons are added beside previous
neurons up to a limit then a new layer neuron is added. Each new neuron receives a
connection from each of the network’s original inputs and every hidden neuron of each
pre-existing layer. In the same layer, the connection pattern of each hidden neuron is the
same and they do not connect to each other. Actually the CasPer neural network is a
special case of a Layered_CasPer neural network in which the size of the layer is 1, and
Layered_CasPer is effectively a CasPer network which copies each hidden neuron several
times. The current known advantage of the Layered_CasPer algorithm before
experiments is that fewer connections are required than the CasPer algorithm if they
have the same number of hidden neurons. The reduction of connections required can be
calculated as:
𝐶 = (𝑁/𝑆) ∗ (𝑆 − 1) + (𝑁 𝑚𝑜𝑑 𝑆) − 1
where N is the number of installed neurons and S is the size of each layer. That means the
computational cost of Layered_CasPer is much lower than CasPer’s given a network with
a large number of hidden neurons. Of course, fewer connections tend to reduce the power
of the network and may affect its ability to generalize. This will be evaluated by
experiments.
Tengfei Shen (4981890)
14
Fig. 4.1 New hidden neuron installation comparison between CasPer and Layered_CasPer
Layered Cascade Neural Network
15
Fig. 4.2 New hidden neuron installation comparison between CasPer and Layered_CasPer in
Fahlman’s display method
Tengfei Shen (4981890)
16
3.2 Program Introduction
The program for implementing Layered_CasPer is written in the programming language
of Matlab. The name of this program is “Cascade Neural Network Toolbox”. It has more
than 2000 lines of code and the whole development process cost over 8 weeks with 3
releases. The program allows users to design their experimental tasks, including setting
task type, training algorithm, training cycles and the number of times to run tasks and so
on. The final performance statistics can be displayed with tables and diagrams and also
saved as a csv file.
What the program can do:
Implement CasPer, A_CasPer, AT_CasPer and Layered_CasPer cascade neural
networks.
Display statistics for each stage of the network building process, including
Number of epochs trained
Number of installed hidden neurons
Number of cross connections
The best training root-mean-square error (RMSE)
RMSE of validation set
Correct percent for classification or fraction of variance unexplained (FVU) for
regression of validation
RMSE of test set
Correct percent for classification or fraction of variance unexplained (FVU) for
regression of testing
Training epoch-RMSE Curve Diagram
Save above statistics as a csv file.
Display the final performance statistics and write as a csv file, including
Number of total epochs trained
Number of installed hidden neurons
Number of cross connections
RMSE of validation set
Correct percent for classification or fraction of variance unexplained (FVU) for
regression of validation
RMSE of test set
Correct percent for classification or fraction of variance unexplained (FVU) for
regression of testing
Save above statistics as a csv file.
Save the matrix of final weights as a csv file
Read a weight matrix from a csv file into current weight matrix
Layered Cascade Neural Network
17
All information and data of a running task are from a .net file which is specified by the
user. This .net file is an extension of the .net file used in the original Quickprop
implementation of Regier [8] and it contains the neuron’s output type, initial network
architecture and attributes information, training set, validation set and testing set of a
dataset. The following figure is a sample .net file. The second line describes the initial
architecture of the network, including number of input units, hidden units and output
units. The third line defines the type of output of neuron, 1 represents sigmoid type
where output = +0.5 to -0.5, 2 represents asymmetric sigmoid where output = 0.0 to 1.0.
The fourth and fifth lines contain the parameters of connections details which are used to
build CasPer, A_CasPer and AT_CasPer cascade neural networks. The following lines
describe the training set, validation set and testing set with their number of patterns.
Fig. 5. Sample of net file
Tengfei Shen (4981890)
18
Beyond the net file, the user needs to set other parameters for an experiment task:
Task Type: Classification or Regression
Algorithm: CasPer, A_CasPer, AT_CasPer or Layered_CasPer
Training Epoch Limit
Candidate Pool: Install new hidden neurons by using the candidate pool or not
Maximum size of layer for Layered_CasPer or size of tower for AT_CasPer
Maximum number of neurons to install
The number of times to run tasks
The main working process of the program is shown in Fig. 6. Firstly the program reads a
net file and generates all required data by parameters which are set by the user. Then the
related functions construct the network and initialize it. After building the network, the
training stage starts, and the program trains the network repeatedly by using the RPROP
training algorithm till it reaches the maximum number of training epochs. In next stage of
testing, the program tests the network by using the validation set and the testing set and
returns the output RMSE. If the output error is worse than the expected RMSE and the
network does not reach the maximum number of neurons to install, a new hidden neuron
is added into the current network. Before a new hidden neuron is added, the program
shall check whether the current layer is full. Once it is full, a new layer will be generated
and the new neuron will be added to it as the first neuron. All related parameters update
when a new hidden neuron or layer is added. Beyond the normal parameters they also
include the learning rates of the different groups of weights. Then the program builds,
trains and tests the network again. The building training testing adding process
will continue until the network reaches the maximum number of installed neurons or its
testing RMSE is smaller than the target RMSE. Finally the program displays the final
performance statistics for this run and saves it as a csv file.
Layered Cascade Neural Network
19
Fig. 6. Working process of the program
Get architecture parameters, training set, validation set and
testing set from a net file
Related Functions: GET_NETWORK_CONFIGURATION
BUILD_DATA_STRUCTURES
Build and initialize networkRelated functions:CONNECT_LAYERSCONNECT_LAYERS2
TrainingRelated functions:TRAIN TRAIN_ONE_EPOCHFORWARD_PASS BACKWARD_PASSACTIVATION ACTIVATION_PRIMEERRFUN UPDATE_WEIGHTS
TestingRelated functions:FORWARD_PASS ACTIVATIONERRFUN
reach maximum number of neurons
to install?
no
yes
Is current layer full?
Install this new hidden neuron as a
new layer
Install a new hidden neuron
Related functions:
CHANGE_NET_CONFIG
yes
start
reach expected error?
yes
end
Display performance statistics
Related functions:PRINT_STATSPRINT_OUTPRINT_OUT2
no
Tengfei Shen (4981890)
20
All functions of the program with their descriptions are given in the function table:
Function Table
Function Name Description
Main Functions:
GET_NETWORK_CONFIGURATION
(FileName)
Get parameters of network from a .net file and
initialize it
Get training set, validation set and test set from
the .net file
BUILD_DATA_STRUCTURES
(ninputs, nhidden, noutputs)
Sub-function of
GET_NETWORK_CONFIGURATION
Set parameters of network: number of units,
inputs, outputs and hidden neurons and indices of
the first hidden neuron and the first output
neuron in the connection matrix.
Install bias unit into the network.
CONNECT_LAYERS
(start1, end1, start2, end2)
Build the CasPer and A_CasPer network by set
parameters
Connect layer(neurons from start1 to end1) to
layer (neurons from start2 to end2) and generate
connections matrix
Generate weights and slopes matrix
CONNECT_LAYERS2(end) Directly build the whole Layered CasPer network
by set parameters.
TRAIN() Train the network until error plateaus
TRAIN_ONE_EPOCH()
Sub-function of TRAIN_ONE_EPOCH()
Perform forward and back propagation once for
each pattern in the training set, collecting deltas.
Then burn in the weights.
FORWARD_PASS (input) Perform the forward pass in backpropagation
algorithm and return the output of each neuron
Layered Cascade Neural Network
21
BACKWARD_PASS(goal)
Goal is a matrix of desired values for the output
neurons. Propagate the error back through the
net, accumulating weight deltas.
ACTIVATION(sum,type)
Sub-function of FORWARD_PASS (input)
Give the sum of weighted inputs and compute the
unit's activation value.
Defined neuron type parameters are SIGMOID and
ASYMSIGMOID.
ACTIVATION_PRIME
(value,type)
Sub-function of BACKWARD_PASS (input)
Give the sum of weighted inputs and neuron's
activation value and compute the derivative of the
activation with respect to the sum.
Defined neuron types are SIGMOID and
ASYMSIGMOID.
ERRFUN (desired, actual) Compute the squared error for one output neuron
UPDATE_WEIGHTS()
Update all weights of network by each weight's
current slope, previous slope, and the size of the
last jump.
TEST(print) Test the current network and return error rate for
classification/FUV for regression.
CHANGE_NET_CONFIG()
Install a new neuron to current network.
Update the parameters of network and
reconstruct the network
Other Functions
RESTORE_WEIGHTS() Restores previous weights for each neuron
RANDOM_WEIGHT (range) Generate a double between –range and +range for
initial weights
CLEAR_SLOPES() Save the current slope matrix as previous slope
matrix and clear the current slope matrix.
RESET_PARAMS(group) Reset some parameters
WEIGHT_SAVE() Save the current weight matrix
Tengfei Shen (4981890)
22
DUMP_WEIGHTS(fname) Write the current weight matrix into a .csv file
GET_WEIGHTS(fname) Read a weight matrix from a .csv file as the
current weight matrix
PRINT_OUT()
/PRINT_OUT2()
/PRINT_STATS()
Display the performance statistics after a new
neuron adding, including
Number of epochs trained
Number of installed hidden neurons
Number of connections
The best training root-mean-square
error (RMSE)
RMSE of validation set
Correct percent for classification or
fraction of variance unexplained (FVU)
for regression of validation
RMSE of test set
Correct percent for classification or
fraction of variance unexplained (FVU)
for regression of testing
For PRINT_STATS, it prints the same information
for the network which has the best validation
result
INITIALIZE_GLOBALS() Reset all global variables
Table 1. Function Table
3.3 Key Techniques
How to represent a neural network is the core of the program. Fig. 7.1 illustrates the
network representation method of this program. In the program, the network is
represented as a matrix, which is called the connection matrix. This matrix contains all
hidden neurons’ connection information, which means cell(x,y) in the yth neuron
connecting to the xth neuron. For example, in Fig. 7.1, cell(4,2) represents the connection
between 2nd input and 2nd hidden neuron. To reduce the number of dimensions of
searching, the indices of abscissa are included in cells and there is an array of number of
connections of each neuron to support search. For instance, the program needs the
connections information of the 2nd hidden neuron. Each neuron has its own index which
is set as a global variable and the 2nd hidden neuron’s index is 4. Firstly the program
obtains the number of connections for hidden neuron 2 from “Nconnection Array”, which
is recorded in the 4th place, it is 3. Then the program gets the 4th array of the connection
matrix and collects the first 3 cells, 0, 1 and 2. Now the program knows that the 2nd
Layered Cascade Neural Network
23
hidden neuron connects to bias input, the 1st input and the 2nd input. Fig. 7.2 is the
network architecture of the sample in Fig. 7.1.
Fig. 7.1 Sample of the connection matrix1
Fig. 7.2 Network Structure Representations of Fig. 7.1
* Array of number of connections of neurons
Tengfei Shen (4981890)
24
3.4 Testing of Program
The program was tested by using normal software testing methods. The main testing
method of this program is white box testing. Beyond the white box testing for individual
functions, there are 20 testing tasks for the whole program by using fixed initial weights
and checking all outputs of testing tasks are correct. The program implements CasPer,
A_CasPer, AT_CasPer and Layered_CasPer strictly by the principle of each algorithm as
described in published documents available.
4 Evaluation of Layered CasPer Neural Network
4.1 Experiment 1: Results Comparison of Classification Tasks
4.1.1 Experiment Description
In this experiment, comparing the performance on classification tasks is the main goal.
The Cascade Correlation (CasCor) algorithm, the CasPer algorithm, the A_CasPer
algorithm, the AT_CasPer algorithm and a modified version of Layered_CasPer which
does not include the candidate pool are compared to the Layered_CasPer algorithm
introduced in this thesis, on some data sets from Proben1.
The Proben1 data sets are a collection of “real word” data sets and consist of ten
classification and four regression tasks [1]. Table 2 contains the attribute information for
each dataset.
The comparison in this experiment focuses on the number of hidden neurons, connection
crossings and error rate percentage.
Table 2. Attributes of Proben 1 data set used in the experiment
Layered Cascade Neural Network
25
4.1.2 The Process and Evaluation of Experiment
Firstly the Layered_CasPer algorithm is compared to the A_CasPer algorithm and the
AT_CasPer algorithm on Proben1 data sets in terms of the average number of hidden
neurons for which the network gets the best result, average connection crossings and
mean of test error percentage.
The training epoch number is 100 and the maximum number of installed hidden neurons
is 15 for each algorithm, and where used the size of each tower/layer and candidate pool
is 3 for AT_CasPer/Layered_CasPer. For the Layered_CasPer, there are two versions in
this comparison: the Layered_CasPer1 does not use a candidate pool, while
Layered_CasPer2 does. Table 3 shows the test results of A_CasPer, AT_CasPer,
Layered_CasPer1 and Layered_CasPer2. It is clear that Layered_CasPer1 is not as good as
Layered_CasPer2 but has less connection crossings, this can be attributed to the
candidate pool which helps the training process produce the best performance on data
sets Card, Gene, Horse, Soybean, Glass and Heartc. It is interesting that the first four of
these data sets are the four data sets with the largest number of inputs. Therefore a
further comparison is done for these four dataset and the CasCor algorithm is added in.
Table 4 shows this comparison, with the results of the CasCor algorithm from Treadgold
and Gedeon’s paper [1]. These results suggest that Layered_CasPer may have good
performance on data sets which have a large number of inputs, however, it still needs
more experiments to conclusively demonstrate that. For the Heart data set, the result of
the Layered_CasPer algorithm is better than the Pym-Tower algorithm [9]. Only in terms
of the average of number of hidden neurons, Layered_CasPer has higher value than those
of other algorithms. The high number of hidden neurons may cause a lower
generalization.
Tengfei Shen (4981890)
26
Table 3. Comparison of A_CasPer, AT_CasPer, Layered_CasPer1(without candidate pool) and
Layered_CasPer2(with candidate pool)
Layered Cascade Neural Network
27
Table 4. Test results on data sets with large number of inputs
Another comparison in this experiment focuses on the number of hidden neurons for the
network which gets the best result and the test error percentage. It compares the
A_CasPer algorithm, AT_CasPer algorithm and Layered_CasPer algorithm with these
aspects. For the hidden neuron number comparison, data sets Glass (9 inputs), Cancer (9
inputs) and Soybean (82 inputs) are used. As Fig. 8 shows, Layered_CasPer has a better
convergence than AT_CasPer’s but worse than A_CasPer’s, though the total performance
is good. That means Layered_CasPer may have a relatively steady hidden neuron number
for which the network has the best validation result on Glass dataset. In Fig. 9, it can be
seen that although some results of Layered_CasPer are much better than AT_CasPer and
A_CasPer, its variance of results is large. Fig. 10-11 show that Layered_CasPer also has a
large variance in test error percentage on Soybean and normal centrality on Cancer. From
this comparison, it seems that Layered_CasPer may have a large variance in the number
of hidden neurons for datasets which have a large number of inputs. Fig. 12 illustrates the
connections number comparison between AT_CasPer, A_CasPer and Layered_CasPer,
where the number of inputs is 3 and the size of each tower/layer for AT_CasPer/
Layered_CasPer is 3. The number of connections for Layered_CasPer is less than, and
follows a similar curve to, A_CasPer.
Tengfei Shen (4981890)
28
Fig. 8. Glass hidden neuron results
Fig. 9. Soybean hidden neuron results
Layered Cascade Neural Network
29
Fig. 10. Soybean Test Results
Fig. 11. Cancer Test Results
Tengfei Shen (4981890)
30
Fig. 12. Connections in a given size network
4.2 Experiment 2: Two Spirals Benchmark
4.2.1 Experiment Description
The two spirals benchmark is used in this experiment. As Fig. 13 shows, it contains two
interlocked spirals, each spiral made up of 97 points. The network needs to learn to
distinguish these two spirals. This problem was used by Fahlman [2] to evaluate the
CasCor algorithm. A simple version of this benchmark is used in the experiment, and only
the results figures are compared. The CasCor algorithm, the CasPer algorithm and the
Layered_CasPer algorithm are compared in this experiment. In this comparison, the size
of the training set is 194, the size of the testing set is 17,161 the maximum number of
hidden neurons to install is 15, and where used the size of layer of the Layered_CasPer
algorithm is 3.
Layered Cascade Neural Network
31
Fig. 13. The two spirals training set
4.2.2 Evaluation of Experiment
Fig. 14 is the plots of the testing set and Fig. 15, 16 and 17 are the plots of results of the
Layered_CasPer algorithm, the CasPer algorithm and the CasCor algorithm. Each instance
of class 1 is black, and class 2 is white. It can be seen that the similarity of the
Layered_CasPer algorithm is much better than the CasCor algorithm’s and similar to the
CasPer algorithm’s. For other two spirals problem solving methods, the result of
Layered_CasPer is close to those of IDS Method [10], MLP with Neuro-Glial Network [11],
Chaos Glial Network’s [12] Neuro-Fuzzy Classifier [13] and Gao’s Paper [14].
Tengfei Shen (4981890)
32
Fig. 14. The two spirals testing set Fig. 15. Result of the Layered_CasPer
algorithm
(15 hidden neurons)
Fig. 16. Result of the CasPer algorithm
(12 hidden neurons) [3]
Fig. 17. Result of the CasCor algorithm
(17 hidden neurons) [3]
Layered Cascade Neural Network
33
4.3 Experiment 3: Results Comparison of Regression Tasks
4.3.1 Experiment Description
Comparing the performance of regression tasks is the main purpose of this experiment.
The Cascade Correlation algorithm (CasCor), the CasPer algorithm and the modified
version of CasPer algorithm which includes a candidate pool (A_CasPer) are compared to
the Layered_CasPer algorithm on a number of data sets. This is a series of five regression
data sets of varying complexities, as originally described in [1]. The number of attributes
and the sizes of training, validation and testing set of each dataset are same. They all have
2 inputs, 1 output, 225 training patterns, 110 validation patterns and 10000 testing
patterns. The comparison of this experiment focuses on the number of hidden neurons
and the value of the fraction of variance unexplained (FVU). The FVU is the measure that
compares the performance on the test set. The FVU is defined as:
4.3.2 The Process and Evaluation of Experiment
The first comparison is a normal test result comparison in terms of the FVU results of
CasCor, A_CasPer and Layered_CasPer on Sif, Cadd, Harm and Cif data set. Table 5 shows
this comparison. The performance of Layered_CasPer is normal. It is not as good as
A_CasPer's, but much better than CasCor's on most data sets. On data set Cadd, the
standard deviation of the results of Layered_CasPer is higher than both CasCor and
A_CasPer's. The problem of stability (high variance) which has been already discussed in
the classification tasks, happens again in this experiment.
Tengfei Shen (4981890)
34
Table 5. Comparison of FVUs of A_CasPer, AT_CasPer and Layered_CasPer
The second comparison focuses on the relationship between FVU and number of hidden
neurons installed. In this comparison, the number of training epoch is 100 and the
maximum number of hidden neurons to install is 30, and where used the size of
tower/layer for AT_CasPer/Layered_CasPer is 3. There are two data sets used in this
comparison, Harm without noise and Harm with noise. Fig. 18 illustrates the FVUs for
each hidden neuron added on Harm data set without noise. It can be seen that the
performance of Layered_CasPer is better than CasCor's and similar to A_CasPer's. In Fig.
19 on Harm data set with noise, the performance of Layered_CasPer with small numbers
of hidden neurons is worse than CasCor and A_CasPer, until there are “enough” hidden
neurons in the network.
Layered Cascade Neural Network
35
Fig. 18. Harm test results – noise free
Fig. 19. Harm test results – noisy
Tengfei Shen (4981890)
36
5 Potential of Layered Cascade Neural Network
In this chapter, there are three ideas to improve the introduced layered cascade neural
network. The first two ideas focus on making the layered cascade neural network “really
automatically constructed” by removing the layer size selection step. The goal of the third
one is the reduction of the number of connections of these neural networks.
5.1 Self-Evaluating Layered Cascade Neural Network
One idea for improving layered cascade neural network is to let the algorithm decide
whether to install the new hidden neuron in the current layer or to add it as a new layer.
This could be done by a version of the candidate pool which has six candidate neurons,
where three are potentially for the current layer, and the other three are for a new layer.
The network could be pre-trained by installing all candidate neurons and selects one
neuron to install, which has the best validation result. It may have a better performance
than normal layered cascade neural networks’ but the complexity of the candidate pool
would have a higher computational cost. This may be acceptable as would not need to set
the size of layers in the first instance.
5.2 Random Limit Layered Cascade Neural Network
In the previous experiments, the better results with high variances happen frequently.
Therefore enhancing the randomness of the network building process is an idea to
improve the layered cascade neural network. In layered cascade neural networks with
random layer size, the size of each layer is a random number where the lower limit and
upper limit are only set by the users or left at some default values. Otherwise, the
working process of the random limit layered cascade neural network would be the same
as the layered cascade neural network. The change would increase the variance of results
to increase the probability of occurrence of a better result. The final step of constructing a
layered cascade neural network is selecting the structure and weights of the network
which has the best result. To solve a “real-world” problem, getting the best structure is
most important. The random limit layered CasPer algorithm would generate a network
with a random structure and reaches extreme cases frequently, and then catches the best
result from the extreme cases. Fig 20 shows a sample drawing of a random limit layered
cascade neural network without connection details.
Layered Cascade Neural Network
37
Fig. 20. Sample of Random Limit Layered Cascade Neural Network
without connection details(upper limit of random number is 8,
lower limit of random number is 2)
5.3 Limited Connections Layered Cascade Neural Network
The Layered_CasPer algorithm reduces the number of connections compared to CasPer or
CasCor. To further reduce the number of connections, a limit on the maximum number of
input connections could be set. If this was lower than the number of inputs then some
systematic or random procedure could be used to choose input connections for any new
neurons. This would most likely work best with a candidate pool. For second or later
layer neurons, connections could be biased towards previous larger neurons over inputs
ranging from no bias (any connections) to a complete bias (only previous layer
connections, more similar to traditional feedforward neural networks).
6 Conclusion
The Layered_CasPer extension of the A_CasPer algorithm, which uses a layered network
architecture, results in a similar performance on most datasets to A_CasPer and better
performance on classification datasets which have a large number of inputs. However
Tengfei Shen (4981890)
38
Layered_CasPer does not otherwise surpass A_CasPer and AT_CasPer in terms of
generalization and convergence on regression datasets. With the computational cost
reducing, the variance of results of Layered_CasPer is increased where compared to the
A_CasPer algorithm’s. Experimental results on the two spirals problem and additional
regression problems show similar trends as for the other benchmark datasets, with
results consistently slightly less than A_CasPer. The key benefit of the Layered_CasPer
algorithm is that it performs similarity to A_CasPer, and has layers. Neural network with
layers are more familiar to neural network users and it could lead to greater acceptance
of these cascade neural networks.
Layered Cascade Neural Network
39
References
[1] N. K. Treadgold and T. D. Gedeon, “Exploring constructive cascade networks,” IEEE Trans. Neural Netw., vol. 10, no. 6, pp. 1335–1350, Nov. 1999. [2] Fahlman, S.E. and Lebiere, C. ‘The cascade-correlation learning architecture,” Advances in Neural Information Processing, vol. 2, D.S. Touretzky, (Ed.) San Mateo, CA:Morgan Kauffman, 1990, pp. 524-532. [3] Treadgold, N.K. and Gedeon, T.D. “A Cascade Network Employing Progressive RPROP,” Int. Work Con$ on Artificial and Natural Neural Networks, 1997, pp. 733-742. [4] Treadgold, N.K. and Gedeon, T.D. “Extending CasPer: A Regression Survey,” Int. Con& On Neural Information Processing, 1997, pp.310-313. [5] M. Riedmiller and H. Braun, “A direct adaptive method for faster backpropagation learning: The RPROP algorithm,” in Proc. ICNN 93, San Francisco, CA, 1993, pp. 586–591. [6] N. K. Treadgold and T. D. Gedeon, “Increased Generalization through Selective Decay in a Constructive Cascade Network”, IEEE International Conference on Systems, Man, and Cybernetics., vol. 5, no.10, pp 4465-4469, Oct, 1998. [7] N.K. Treadgold and T.D. Gedeon, “Exploring architecture variations in constructive cascade networks”, IEEE World Congress on Computational Intelligence, IEEE International Joint Conference on Neural Networks Proceedings, vol. 1,pp 343-348, May, 1998. [8] Terry Regier 1988, Quickpin Program, Quickprop 1, Design of Intelligent Systems, Michigan
State University, US, viewed August 2011, <www.cse.msu.edu/~cse449/spr98/demos/quickprop.c>
[9] Su, L., Guan, S.-U., 2000. “Two-dimensional extensions of cascade correlation networks,” In: The Fourth International Conference on High-Performance Computing in the Asia-Pacific Region. vol. 1. IEEE, pp. 138-141. [10] Masayuki Murakami, Nakaji Honda, “Classification Performance of the IDS Method Based on the Two-Spiral Benchmark”, Systems Man and Cybernetics, 2005 IEEE International Conference on, Oct. 2005. [11] Chihiro Ikuta, Yoko Uwate and Yoshifumi Nishio, “Multi-Layer Perceptron Having Neuro-Glia Network”, 2010 International Symposium on Nonlinear Theory and its Applications NOLTA2010, Krakow, Poland, September 5-8, 2010. [12] C. Ikuta, Y. Uwate and Y. Nishio, “Chaos Glial Network Connected to Multi-Layer Perceptron for Solving Two-Spiral Problem,” Proc. ISCAS’10, May 2010. [13] C.T. Sun and J.S. Jang, “A neuro-fuzzy classifier and its applications,” in Proc. IEEE Int. Conf. Fuzzy Syst., San Francisco, CA, vol. I, pp. 94–98, Mar. 1993. [14] Gao Daqi, Li Hao, and Yang Yunfan, “Task Decomposition and Modular Perceptons with Sigmoid Activation Functions for Solving the Two-Spirals Problem”, Networking, Sensing and Control, 2006. ICNSC, Pages: 218 – 223, 2006
Tengfei Shen (4981890)
40
Appendix A: Screenshots of Cascade Neural Network Toolbox v1.2
Output: Diagram of Training Errors
Layered Cascade Neural Network
41
Output: Hinton diagram of the weights of the network
Tengfei Shen (4981890)
42
Output: Statistics information for each new neuron added
Layered Cascade Neural Network
43
Output: Final performance statistics of the network which has the best validation/testing result
Tengfei Shen (4981890)
44
Appendix B: Experiment Results
Experiment Results: Classification Task