Layered Cascade Artificial Neural Network - CECS -...

Layered Cascade Artificial Neural Network

A thesis submitted for the degree of

Master of Computing in Computer Science of

The Australian National University

by

Tengfei Shen

<[email protected]>

Supervisor: Prof. Tom Gedeon

Research School of Computer Science

College of Engineering & Computer Science

The Australian National University

November 2011

Tengfei Shen (4981890)

2

Acknowledgement

I am heartily thankful to my supervisor, Prof. Tom Gedeon, whose encouragement, guidance and support helped me in many respects during the completion of the project.

Tengfei Shen

Layered Cascade Neural Network

3

Abstract

Constructive algorithms have proved to be powerful methods for training feedforward

neural networks. The CasPer algorithm is a constructive neural network algorithm, it

generates networks from a simple architecture and then expands it. The A_CasPer

algorithm is a modified version of the CasPer algorithm which uses a candidate pool

instead of a single neuron being trained. This project adds an extension to the A_CasPer

algorithm in terms of the network architecture – The Layered_CasPer algorithm. The

hidden neurons form as layers in the new version of the network structure which results

the less computational cost being required. Beyond the network structure, other aspects

of Layered_CasPer are the same as A_CasPer. The Layered_CasPer algorithm extension is

benchmarked on a number of classification and regression problems and compared to

other constructive algorithms, which are CasCor, CasPer, A_CasPer, and AT_CasPer. It is

shown that Layered_CasPer has a better performance on the datasets which have a large

number of inputs for classification tasks. The Layered_CasPer algorithm has an advantage

over other cascade style constructive algorithms in being more similar in topology to the

familiar layered structure of traditional feedforward neural networks. This may lead to

good acceptance of this technique.

Furthermore, an implement action of CasPer, A_CasPer, AT_CasPer and Layered_CasPer is

presented in this thesis. At the end of the thesis, two new ideas for improving the

Layered_CasPer algorithm are suggested for future work.


4

Table of Contents 1 Introduction ................................................................................................................................................. 7

1.1 Motivation ........................................................................................................................................... 7

1.2 Objective of Project ............................................................................................................................. 7

1.3 Contribution ........................................................................................................................................ 7

1.4 Report Organization ........................................................................................................................... 8

2 Relevant Techniques and Concepts............................................................................................................ 8

2.1 Cascade Correlation Algorithm (CasCor Algorithm) ........................................................................ 8

2.2 CasPer Algorithm & A_CasPer Algorithm ........................................................................................ 10

2.3 AT_CasPer Algorithm ........................................................................................................................ 12

3 Implementation of Layered Cascade Neural Network ............................................................................ 13

3.1 Layered Cascade Neural Network ................................................................................................... 13

3.2 Program Introduction ...................................................................................................................... 16

3.3 Key Techniques ................................................................................................................................. 22

3.4 Testing of Program ........................................................................................................................... 24

4 Evaluation of Layered CasPer Neural Network ....................................................................................... 24

4.1 Experiment 1: Results Comparison of Classification Tasks ........................................................... 24

4.1.1 Experiment Description ............................................................................................................... 24

4.1.2 The Process and Evaluation of Experiment ................................................................................ 25

4.2 Experiment 2: Two Spirals Benchmark .......................................................................................... 30


4.2.2 Evaluation of Experiment ............................................................................................................ 31

4.3 Experiment 3: Results Comparison of Regression Tasks............................................................... 33


4.3.2 The Process and Evaluation of Experiment ................................................................................ 33

5 Potential of Layered Cascade Neural Network ........................................................................................ 36

5.1 Self－Evaluating Layered Cascade Neural Network ...................................................................... 36

5.2 Random Limit Layered Cascade Neural Network .......................................................................... 36

5.3 Limited Connections Layered Cascade Neural Network ............................................................... 37

6 Conclusion .................................................................................................................................................. 37

References .......................................................................................................................................................... 39

Appendix A: Screenshots of Cascade Neural Network Toolbox v1.2 ............................................................. 40

Appendix B: Experiment Results ...................................................................................................................... 44


5

List of Figures

Fig. 1 The Cascade architecture, initial state and after adding two hidden units…...………………….9

Fig. 2 The CasPer structure (Fahlman ,1990)...................……….………………………………………………….………..10 Fig. 3 A cascade tower architecture with a tower size of 3(Treadgold & Gedeon, 1998…………..11 Fig. 4.1 New hidden neuron installation comparison between CasPer and Layered_CasPer………...…….14 Fig. 4.2 New hidden neuron installation comparison between CasPer and Layered_CasPer

in Fahlman display method……………………………......………………...………………………………………..……15 Fig. 5 Sample of net file……………………………………………………………………………………………...…………………..17 Fig. 6 Working process of the program………………………………………………………….……...………………………..19 Fig. 7.1 Sample of the connection matrix…………………………………………………………………………………………...23 Fig. 7.2 Network Structure Representations of Fig. 7.1………………………………………………….………………….23 Fig. 8 Glass hidden neuron results ………….……………………………..……………..…………………………………….…28 Fig. 9 Soybean hidden neuron results …….………………………………………...………………………………….......….28 Fig. 10 Soybean Test Results …….………………………………………………………………………………………………….29 Fig. 11 Cancer Test Results …….………………………………………………………………………………………..………….29 Fig. 12 Connections in a given size network……………………………...…………………………….….……………………30 Fig. 13 The two spirals training set………………………………………………………………..…………………………….31 Fig. 14 The two spirals testing set …………………………………………………………………….……………………...…..32 Fig. 15 Result of the Layered_CasPer algorithm ………………………………………………….……..…….……………32 Fig. 16 Result of the CasPer algorithm (Treadgold & Gedeon, 1998)…………………………………..…………32 Fig. 17 Result of the CasCor algorithm (Treadgold & Gedeon, 1998)………………………………………….….32 Fig. 18 Harm test results – noise free …………………………………………………………….…………….....…………….35 Fig. 19 Harm test results - noisy …………………………………………………………………………………………...…….35 Fig. 20 Sample of Random Limit Layered Cascade Neural Network without connection details……….37


6

List of Tables

Table 1 Function Table………………………………………………………………………………………………...………………….20

Table 2 Attributes of Proben 1 data set used in the experiment.....……….……..…………………….………..24 Table 3 Comparison of A_CasPer, AT_CasPer, Layered_CasPer1(without candidate pool)

and Layered_CasPer2(with candidate pool)……………………….……………………… …………..26 Table 4 Test results on data sets with large number of inputs…..…………………..…………………………...……….27 Table 5 Comparison of FVUs of A_CasPer, AT_CasPer and Layered_CasPer………,………………………34 . .


7

1 Introduction

1.1 Motivation

As a challenge faced in the field of feedforward neural networks, model selection involves

matching the complexity of the function to be similar to the complexity of the model. The

factors that determine the complexity of model are connection topology, weight number

and magnitude. Underfitting and poor generalization happen if a model does not have

enough complexity to approximate the target function. Whereas overfitting and poor

generalization occur if a model is too complex. There are three selection technique

groups: “those that perform a search through models, those that begin with an overly

complex model which is then simplified, and those that begin with a simple model whose

complexity is increased” [1]. Cascade Correlation (CasCor) [2], CasPer [3], [4], A_CasPer

[1] and Layered CasPer are four constructive algorithms that select a small size initial

network, so they all belong to the third group. These algorithms spend less network

training time than the algorithms which start training with an oversize network. They

also tend to avoid the problem of encountering poorly performing local minima.

Layered_CasPer was suggested by Tom Gedeon in 2011. This constructive algorithm

provides an improvement in understandability of the connection method of the network

as it is more similar to the familiar layered structure of many neural network models. The

hidden neurons form as layers in the new network architecture. It is clear that the

number of connections of network in Layered_CasPer is less than CasPer’s. That means

the calculation cost is less than CasPer. The motivation of this project is to implement and

evaluate this proposed algorithm.

1.2 Objective of Project

The aim of this project is to understand, implement and evaluate the Layered_CasPer

constructive algorithm. Matlab is selected as the programming platform to achieve the

objective of this project.

1.3 Contribution

The program for implementation of the Layered_CasPer constructive learning algorithm

by using Matlab is the main contribution of this report. Another significant contribution

of the report is a series of experiments for evaluating the Layered_CasPer constructive

learning algorithm.


8

1.4 Report Organization

Chapter 2 gives an overview of the relevant techniques and concepts. Chapter 3

introduces the Layered_CasPer algorithm and the program which can implement

Layered_CasPer as well as some comparison algorithms. Chapter 4 describes three

experiments which evaluate the Layered_CasPer algorithm. The next two chapters,

chapter 5 introduces two ideas for improving the Layered_CasPer algorithm, chapter 6 is

the conclusion of this thesis.

2 Relevant Techniques and Concepts

2.1 Cascade Correlation Algorithm (CasCor Algorithm)

Cascade-Correlation is a constructive and supervised learning algorithm for neural

network. It was introduced by Scott Fahlman and Christian Lebiere in 1991. Cascade-

Correlation starts with a minimal size network, then repeatedly trains and installs new

hidden neurons one by one, generating a multi-layer topology instead of just adjusting

the weights in a network with a fixed topology. A very interesting feature of this

algorithm is that a new hidden neuron’s input weights are frozen once it has been

installed into the network. This unit then becomes a fixed unit in the network, available

for giving outputs for generating other more complex units.

As shown in Fig .1, the architecture of Cascade-Correlation algorithm begins with the pre-

set inputs and outputs but without hidden neurons. The initial structure of the network is

dictated by the problem and by the I/O representation which is chosen by the

experimenter. There is also a bias input, which is constantly set to +1. Hidden neurons are

installed into network one by one. Every new hidden neuron receives a connection from

each previous hidden neuron and from the original inputs of the network. The installed

hidden neuron’s input weights are frozen, only the output connections are trained.


9

Fig. 1. The Cascade architecture, initial state and after adding two hidden units. The

vertical lines sum all incoming activation. Boxed connections (□) are frozen, X

connections (x) are trained repeatedly [2]


10

The Cascade-Correlation architecture has several advantages over existing algorithms: “It

learns very quickly, the network determines its own size and topology, it retains the

structures it has built even if the training set changes, and it requires no back-

propagation of error signals through the connections of the network” [2].

2.2 CasPer Algorithm & A_CasPer Algorithm

The CasPer algorithm was introduced by Nick Treadgold and Tom Gedeon in 1996. As a

constructive neural network algorithm, CasPer builds network structures in a similar way

to Cascade Correlation: they all begin with a single hidden neuron and successively install

hidden neurons. The main distinction between CasPer and Cascade Correlation is the

training method. As previously mentioned, the hidden neurons’ input weights are frozen

and only the output connections are trained in Cascade Correlation, whereas CasPer

trains all connections of the network.

Using a modified version of RPROP algorithm — Progressive RPROP, to train the network

after adding new hidden neuron is a difference between CasPer and Cascade Correlation.

RPROP is a gradient descent algorithm using individual adaptive learning rates for each

weight, which starts with an initial learning rate that is then adapted based on the sign of

the error gradient seen by that weight as it climbs the error surface [4]. Fig. 2 shows the

network is separated into three different groups, and each group has its own learning

rate: LR1, LR2 and LR3. The first group includes all weights which connect to the new

neuron from previous hidden neurons and inputs. The second group is made up of all

weights that connect the output of the new hidden unit to the outputs. The third group

consists of the rest of the weights. The relationship between the magnitudes of LR1, LR2

and LR3 is LR1>>LR2>LR3. It is similar to the correlation measure of Cascade

Correlation: the highest value of LR1 allows the new hidden unit to learn the rest of the

network error. Similarly, the high value of LR2 as compared to LR3 allows the new

hidden unit to cut down the error of network and avoids over interference from other

weights.


11

Fig. 2. The CasPer structure – a second hidden neuron has

just been added. The vertical lines sum all incoming values.

(This display method was introduced by Fahlman 1990)

A_CasPer is a modified version of the CasPer algorithm with the following modifications.

First, there is a candidate pool of hidden neurons trained instead of a single hidden

neuron. Each hidden neuron in the pool is continuously connected to the network in the

usual manner of CasPer. Each hidden neuron in the candidate pool has its own training

process and weights. Finally, the network with the best generalization performance is

selected, and its weights are kept. A new candidate pool is then generated and the process

is repeated until the convergence criterion is satisfied. Another important point is that a

different decay level is used for the network each time a new neuron in the pool is

inserted in that process.

The CasPer algorithm has been shown to create networks with fewer hidden units than

the CasCor algorithm, and also has better generalization [5]. “A_CasPer is generally able

to improve generalization results compared to CasPer using optimized decay levels. This

is especially apparent in the data sets containing noise, where A_CasPer not only obtains

better generalization results, but are also able to avoid overfitting as the network

continues to grow”[6].


12

The reason to compare these four algorithms in the experiments of this report is that they

are all constructive algorithms using cascade architecture network and with good

evaluation results. The A_CasPer algorithm performs best from CasPer, A_CasPer and

AT_CasPer for classification and regression tasks. So the most relevant comparison for

Layered_CasPer is the A_CasPer algorithm.

2.3 AT_CasPer Algorithm

The AT_CasPer algorithm is modified version of the CasPer algorithm which uses a series

of cascade tower instead of a single cascade of hidden neurons to build the networks. The

main target of this algorithm is to limit the network depth. The network training manner

of AT_CasPer is the same as CasPer’s. Each hidden neuron receives a connection from the

inputs and connects to the outputs. The hidden neurons connect to each other only in the

same tower. When the maximum cascade depth is reached, the next hidden neuron

begins a new cascade tower. There is no connection between towers. An example of this

structure is shown in Fig. 3. The AT_CasPer algorithm produces slightly less good results

than A_CasPer but reduces computational cost.

Fig. 3. A cascade tower architecture with a tower size of 3[7]


13

3 Implementation of Layered Cascade Neural Network

3.1 Layered Cascade Neural Network

The Layered Cascade model is an idea for improving the CasPer algorithm which was

suggested by Tom Gedeon. It is suggested a modified version of the CasPer algorithm for

constructing networks. Layered_CasPer builds cascade networks in a similar manner to

CasPer: Layered_CasPer begins with a simple architecture and installs single hidden

neurons successively and it uses RPROP gradient descent algorithm to train the whole

network each time a hidden neuron is installed. The candidate pool can also be used in

Layered_CasPer. As a very important parameter, the maximum size of each layer should

be set first.

This modification of CasPer focuses on the architecture of the network. In the layered

cascade neural network, the hidden neurons form as layers and there are no connections

between neurons which are in the same layer. Fig. 4.1 shows the different manners of

adding a new hidden neuron between CasPer and Layered_CasPer. Fig 4.2 illustrates the

same comparison but in Fahlman’s display method. New neurons are added beside previous

neurons up to a limit then a new layer neuron is added. Each new neuron receives a

connection from each of the network’s original inputs and every hidden neuron of each

pre-existing layer. In the same layer, the connection pattern of each hidden neuron is the

same and they do not connect to each other. Actually the CasPer neural network is a

special case of a Layered_CasPer neural network in which the size of the layer is 1, and

Layered_CasPer is effectively a CasPer network which copies each hidden neuron several

times. The current known advantage of the Layered_CasPer algorithm before

experiments is that fewer connections are required than the CasPer algorithm if they

have the same number of hidden neurons. The reduction of connections required can be

calculated as:

𝐶 = (𝑁/𝑆) ∗ (𝑆 − 1) + (𝑁 𝑚𝑜𝑑 𝑆) − 1

where N is the number of installed neurons and S is the size of each layer. That means the

computational cost of Layered_CasPer is much lower than CasPer’s given a network with

a large number of hidden neurons. Of course, fewer connections tend to reduce the power

of the network and may affect its ability to generalize. This will be evaluated by

experiments.


14

Fig. 4.1 New hidden neuron installation comparison between CasPer and Layered_CasPer


15

Fig. 4.2 New hidden neuron installation comparison between CasPer and Layered_CasPer in

Fahlman’s display method


16

3.2 Program Introduction

The program for implementing Layered_CasPer is written in the programming language

of Matlab. The name of this program is “Cascade Neural Network Toolbox”. It has more

than 2000 lines of code and the whole development process cost over 8 weeks with 3

releases. The program allows users to design their experimental tasks, including setting

task type, training algorithm, training cycles and the number of times to run tasks and so

on. The final performance statistics can be displayed with tables and diagrams and also

saved as a csv file.

What the program can do:

Implement CasPer, A_CasPer, AT_CasPer and Layered_CasPer cascade neural

networks.

Display statistics for each stage of the network building process, including

Number of epochs trained

Number of installed hidden neurons

Number of cross connections

The best training root-mean-square error (RMSE)

RMSE of validation set

Correct percent for classification or fraction of variance unexplained (FVU) for

regression of validation

RMSE of test set


regression of testing

Training epoch-RMSE Curve Diagram

Save above statistics as a csv file.

Display the final performance statistics and write as a csv file, including

Number of total epochs trained


Number of cross connections



regression of validation

RMSE of test set


regression of testing

Save above statistics as a csv file.

Save the matrix of final weights as a csv file

Read a weight matrix from a csv file into current weight matrix


17

All information and data of a running task are from a .net file which is specified by the

user. This .net file is an extension of the .net file used in the original Quickprop

implementation of Regier [8] and it contains the neuron’s output type, initial network

architecture and attributes information, training set, validation set and testing set of a

dataset. The following figure is a sample .net file. The second line describes the initial

architecture of the network, including number of input units, hidden units and output

units. The third line defines the type of output of neuron, 1 represents sigmoid type

where output = +0.5 to -0.5, 2 represents asymmetric sigmoid where output = 0.0 to 1.0.

The fourth and fifth lines contain the parameters of connections details which are used to

build CasPer, A_CasPer and AT_CasPer cascade neural networks. The following lines

describe the training set, validation set and testing set with their number of patterns.

Fig. 5. Sample of net file


18

Beyond the net file, the user needs to set other parameters for an experiment task:

Task Type: Classification or Regression

Algorithm: CasPer, A_CasPer, AT_CasPer or Layered_CasPer

Training Epoch Limit

Candidate Pool: Install new hidden neurons by using the candidate pool or not

Maximum size of layer for Layered_CasPer or size of tower for AT_CasPer

Maximum number of neurons to install

The number of times to run tasks

The main working process of the program is shown in Fig. 6. Firstly the program reads a

net file and generates all required data by parameters which are set by the user. Then the

related functions construct the network and initialize it. After building the network, the

training stage starts, and the program trains the network repeatedly by using the RPROP

training algorithm till it reaches the maximum number of training epochs. In next stage of

testing, the program tests the network by using the validation set and the testing set and

returns the output RMSE. If the output error is worse than the expected RMSE and the

network does not reach the maximum number of neurons to install, a new hidden neuron

is added into the current network. Before a new hidden neuron is added, the program

shall check whether the current layer is full. Once it is full, a new layer will be generated

and the new neuron will be added to it as the first neuron. All related parameters update

when a new hidden neuron or layer is added. Beyond the normal parameters they also

include the learning rates of the different groups of weights. Then the program builds,

trains and tests the network again. The building training testing adding process

will continue until the network reaches the maximum number of installed neurons or its

testing RMSE is smaller than the target RMSE. Finally the program displays the final

performance statistics for this run and saves it as a csv file.


19

Fig. 6. Working process of the program

Get architecture parameters, training set, validation set and

testing set from a net file

Related Functions: GET_NETWORK_CONFIGURATION

BUILD_DATA_STRUCTURES

Build and initialize networkRelated functions:CONNECT_LAYERSCONNECT_LAYERS2

TrainingRelated functions:TRAIN TRAIN_ONE_EPOCHFORWARD_PASS BACKWARD_PASSACTIVATION ACTIVATION_PRIMEERRFUN UPDATE_WEIGHTS

TestingRelated functions:FORWARD_PASS ACTIVATIONERRFUN

reach maximum number of neurons

to install?

no

yes

Is current layer full?

Install this new hidden neuron as a

new layer

Install a new hidden neuron

Related functions:

CHANGE_NET_CONFIG

yes

start

reach expected error?

yes

end

Display performance statistics

Related functions:PRINT_STATSPRINT_OUTPRINT_OUT2

no


20

All functions of the program with their descriptions are given in the function table:

Function Table

Function Name Description

Main Functions:

GET_NETWORK_CONFIGURATION

(FileName)

Get parameters of network from a .net file and

initialize it

Get training set, validation set and test set from

the .net file

BUILD_DATA_STRUCTURES

(ninputs, nhidden, noutputs)

Sub-function of

GET_NETWORK_CONFIGURATION

Set parameters of network: number of units,

inputs, outputs and hidden neurons and indices of

the first hidden neuron and the first output

neuron in the connection matrix.

Install bias unit into the network.

CONNECT_LAYERS

(start1, end1, start2, end2)

Build the CasPer and A_CasPer network by set

parameters

Connect layer(neurons from start1 to end1) to

layer (neurons from start2 to end2) and generate

connections matrix

Generate weights and slopes matrix

CONNECT_LAYERS2(end) Directly build the whole Layered CasPer network

by set parameters.

TRAIN() Train the network until error plateaus

TRAIN_ONE_EPOCH()

Sub-function of TRAIN_ONE_EPOCH()

Perform forward and back propagation once for

each pattern in the training set, collecting deltas.

Then burn in the weights.

FORWARD_PASS (input) Perform the forward pass in backpropagation

algorithm and return the output of each neuron


21

BACKWARD_PASS(goal)

Goal is a matrix of desired values for the output

neurons. Propagate the error back through the

net, accumulating weight deltas.

ACTIVATION(sum,type)

Sub-function of FORWARD_PASS (input)

Give the sum of weighted inputs and compute the

unit's activation value.

Defined neuron type parameters are SIGMOID and

ASYMSIGMOID.

ACTIVATION_PRIME

(value,type)

Sub-function of BACKWARD_PASS (input)

Give the sum of weighted inputs and neuron's

activation value and compute the derivative of the

activation with respect to the sum.

Defined neuron types are SIGMOID and

ASYMSIGMOID.

ERRFUN (desired, actual) Compute the squared error for one output neuron

UPDATE_WEIGHTS()

Update all weights of network by each weight's

current slope, previous slope, and the size of the

last jump.

TEST(print) Test the current network and return error rate for

classification/FUV for regression.

CHANGE_NET_CONFIG()

Install a new neuron to current network.

Update the parameters of network and

reconstruct the network

Other Functions

RESTORE_WEIGHTS() Restores previous weights for each neuron

RANDOM_WEIGHT (range) Generate a double between –range and +range for

initial weights

CLEAR_SLOPES() Save the current slope matrix as previous slope

matrix and clear the current slope matrix.

RESET_PARAMS(group) Reset some parameters

WEIGHT_SAVE() Save the current weight matrix


22

DUMP_WEIGHTS(fname) Write the current weight matrix into a .csv file

GET_WEIGHTS(fname) Read a weight matrix from a .csv file as the

current weight matrix

PRINT_OUT()

/PRINT_OUT2()

/PRINT_STATS()

Display the performance statistics after a new

neuron adding, including

Number of epochs trained


Number of connections

The best training root-mean-square

error (RMSE)


Correct percent for classification or

fraction of variance unexplained (FVU)

for regression of validation

RMSE of test set

Correct percent for classification or

fraction of variance unexplained (FVU)

for regression of testing

For PRINT_STATS, it prints the same information

for the network which has the best validation

result

INITIALIZE_GLOBALS() Reset all global variables

Table 1. Function Table

3.3 Key Techniques

How to represent a neural network is the core of the program. Fig. 7.1 illustrates the

network representation method of this program. In the program, the network is

represented as a matrix, which is called the connection matrix. This matrix contains all

hidden neurons’ connection information, which means cell(x,y) in the yth neuron

connecting to the xth neuron. For example, in Fig. 7.1, cell(4,2) represents the connection

between 2nd input and 2nd hidden neuron. To reduce the number of dimensions of

searching, the indices of abscissa are included in cells and there is an array of number of

connections of each neuron to support search. For instance, the program needs the

connections information of the 2nd hidden neuron. Each neuron has its own index which

is set as a global variable and the 2nd hidden neuron’s index is 4. Firstly the program

obtains the number of connections for hidden neuron 2 from “Nconnection Array”, which

is recorded in the 4th place, it is 3. Then the program gets the 4th array of the connection

matrix and collects the first 3 cells, 0, 1 and 2. Now the program knows that the 2nd


23

hidden neuron connects to bias input, the 1st input and the 2nd input. Fig. 7.2 is the

network architecture of the sample in Fig. 7.1.

Fig. 7.1 Sample of the connection matrix1

Fig. 7.2 Network Structure Representations of Fig. 7.1

* Array of number of connections of neurons


24

3.4 Testing of Program

The program was tested by using normal software testing methods. The main testing

method of this program is white box testing. Beyond the white box testing for individual

functions, there are 20 testing tasks for the whole program by using fixed initial weights

and checking all outputs of testing tasks are correct. The program implements CasPer,

A_CasPer, AT_CasPer and Layered_CasPer strictly by the principle of each algorithm as

described in published documents available.

4 Evaluation of Layered CasPer Neural Network

4.1 Experiment 1: Results Comparison of Classification Tasks

4.1.1 Experiment Description

In this experiment, comparing the performance on classification tasks is the main goal.

The Cascade Correlation (CasCor) algorithm, the CasPer algorithm, the A_CasPer

algorithm, the AT_CasPer algorithm and a modified version of Layered_CasPer which

does not include the candidate pool are compared to the Layered_CasPer algorithm

introduced in this thesis, on some data sets from Proben1.

The Proben1 data sets are a collection of “real word” data sets and consist of ten

classification and four regression tasks [1]. Table 2 contains the attribute information for

each dataset.

The comparison in this experiment focuses on the number of hidden neurons, connection

crossings and error rate percentage.

Table 2. Attributes of Proben 1 data set used in the experiment


25

4.1.2 The Process and Evaluation of Experiment

Firstly the Layered_CasPer algorithm is compared to the A_CasPer algorithm and the

AT_CasPer algorithm on Proben1 data sets in terms of the average number of hidden

neurons for which the network gets the best result, average connection crossings and

mean of test error percentage.

The training epoch number is 100 and the maximum number of installed hidden neurons

is 15 for each algorithm, and where used the size of each tower/layer and candidate pool

is 3 for AT_CasPer/Layered_CasPer. For the Layered_CasPer, there are two versions in

this comparison: the Layered_CasPer1 does not use a candidate pool, while

Layered_CasPer2 does. Table 3 shows the test results of A_CasPer, AT_CasPer,

Layered_CasPer1 and Layered_CasPer2. It is clear that Layered_CasPer1 is not as good as

Layered_CasPer2 but has less connection crossings, this can be attributed to the

candidate pool which helps the training process produce the best performance on data

sets Card, Gene, Horse, Soybean, Glass and Heartc. It is interesting that the first four of

these data sets are the four data sets with the largest number of inputs. Therefore a

further comparison is done for these four dataset and the CasCor algorithm is added in.

Table 4 shows this comparison, with the results of the CasCor algorithm from Treadgold

and Gedeon’s paper [1]. These results suggest that Layered_CasPer may have good

performance on data sets which have a large number of inputs, however, it still needs

more experiments to conclusively demonstrate that. For the Heart data set, the result of

the Layered_CasPer algorithm is better than the Pym-Tower algorithm [9]. Only in terms

of the average of number of hidden neurons, Layered_CasPer has higher value than those

of other algorithms. The high number of hidden neurons may cause a lower

generalization.


26

Table 3. Comparison of A_CasPer, AT_CasPer, Layered_CasPer1(without candidate pool) and

Layered_CasPer2(with candidate pool)


27

Table 4. Test results on data sets with large number of inputs

Another comparison in this experiment focuses on the number of hidden neurons for the

network which gets the best result and the test error percentage. It compares the

A_CasPer algorithm, AT_CasPer algorithm and Layered_CasPer algorithm with these

aspects. For the hidden neuron number comparison, data sets Glass (9 inputs), Cancer (9

inputs) and Soybean (82 inputs) are used. As Fig. 8 shows, Layered_CasPer has a better

convergence than AT_CasPer’s but worse than A_CasPer’s, though the total performance

is good. That means Layered_CasPer may have a relatively steady hidden neuron number

for which the network has the best validation result on Glass dataset. In Fig. 9, it can be

seen that although some results of Layered_CasPer are much better than AT_CasPer and

A_CasPer, its variance of results is large. Fig. 10-11 show that Layered_CasPer also has a

large variance in test error percentage on Soybean and normal centrality on Cancer. From

this comparison, it seems that Layered_CasPer may have a large variance in the number

of hidden neurons for datasets which have a large number of inputs. Fig. 12 illustrates the

connections number comparison between AT_CasPer, A_CasPer and Layered_CasPer,

where the number of inputs is 3 and the size of each tower/layer for AT_CasPer/

Layered_CasPer is 3. The number of connections for Layered_CasPer is less than, and

follows a similar curve to, A_CasPer.


28

Fig. 8. Glass hidden neuron results

Fig. 9. Soybean hidden neuron results


29

Fig. 10. Soybean Test Results

Fig. 11. Cancer Test Results


30

Fig. 12. Connections in a given size network

4.2 Experiment 2: Two Spirals Benchmark


The two spirals benchmark is used in this experiment. As Fig. 13 shows, it contains two

interlocked spirals, each spiral made up of 97 points. The network needs to learn to

distinguish these two spirals. This problem was used by Fahlman [2] to evaluate the

CasCor algorithm. A simple version of this benchmark is used in the experiment, and only

the results figures are compared. The CasCor algorithm, the CasPer algorithm and the

Layered_CasPer algorithm are compared in this experiment. In this comparison, the size

of the training set is 194, the size of the testing set is 17,161 the maximum number of

hidden neurons to install is 15, and where used the size of layer of the Layered_CasPer

algorithm is 3.


31

Fig. 13. The two spirals training set

4.2.2 Evaluation of Experiment

Fig. 14 is the plots of the testing set and Fig. 15, 16 and 17 are the plots of results of the

Layered_CasPer algorithm, the CasPer algorithm and the CasCor algorithm. Each instance

of class 1 is black, and class 2 is white. It can be seen that the similarity of the

Layered_CasPer algorithm is much better than the CasCor algorithm’s and similar to the

CasPer algorithm’s. For other two spirals problem solving methods, the result of

Layered_CasPer is close to those of IDS Method [10], MLP with Neuro-Glial Network [11],

Chaos Glial Network’s [12] Neuro-Fuzzy Classifier [13] and Gao’s Paper [14].


32

Fig. 14. The two spirals testing set Fig. 15. Result of the Layered_CasPer

algorithm

(15 hidden neurons)

Fig. 16. Result of the CasPer algorithm

(12 hidden neurons) [3]

Fig. 17. Result of the CasCor algorithm

(17 hidden neurons) [3]


33

4.3 Experiment 3: Results Comparison of Regression Tasks


Comparing the performance of regression tasks is the main purpose of this experiment.

The Cascade Correlation algorithm (CasCor), the CasPer algorithm and the modified

version of CasPer algorithm which includes a candidate pool (A_CasPer) are compared to

the Layered_CasPer algorithm on a number of data sets. This is a series of five regression

data sets of varying complexities, as originally described in [1]. The number of attributes

and the sizes of training, validation and testing set of each dataset are same. They all have

2 inputs, 1 output, 225 training patterns, 110 validation patterns and 10000 testing

patterns. The comparison of this experiment focuses on the number of hidden neurons

and the value of the fraction of variance unexplained (FVU). The FVU is the measure that

compares the performance on the test set. The FVU is defined as:

4.3.2 The Process and Evaluation of Experiment

The first comparison is a normal test result comparison in terms of the FVU results of

CasCor, A_CasPer and Layered_CasPer on Sif, Cadd, Harm and Cif data set. Table 5 shows

this comparison. The performance of Layered_CasPer is normal. It is not as good as

A_CasPer's, but much better than CasCor's on most data sets. On data set Cadd, the

standard deviation of the results of Layered_CasPer is higher than both CasCor and

A_CasPer's. The problem of stability (high variance) which has been already discussed in

the classification tasks, happens again in this experiment.


34

Table 5. Comparison of FVUs of A_CasPer, AT_CasPer and Layered_CasPer

The second comparison focuses on the relationship between FVU and number of hidden

neurons installed. In this comparison, the number of training epoch is 100 and the

maximum number of hidden neurons to install is 30, and where used the size of

tower/layer for AT_CasPer/Layered_CasPer is 3. There are two data sets used in this

comparison, Harm without noise and Harm with noise. Fig. 18 illustrates the FVUs for

each hidden neuron added on Harm data set without noise. It can be seen that the

performance of Layered_CasPer is better than CasCor's and similar to A_CasPer's. In Fig.

19 on Harm data set with noise, the performance of Layered_CasPer with small numbers

of hidden neurons is worse than CasCor and A_CasPer, until there are “enough” hidden

neurons in the network.


35

Fig. 18. Harm test results – noise free

Fig. 19. Harm test results – noisy


36

5 Potential of Layered Cascade Neural Network

In this chapter, there are three ideas to improve the introduced layered cascade neural

network. The first two ideas focus on making the layered cascade neural network “really

automatically constructed” by removing the layer size selection step. The goal of the third

one is the reduction of the number of connections of these neural networks.

5.1 Self－Evaluating Layered Cascade Neural Network

One idea for improving layered cascade neural network is to let the algorithm decide

whether to install the new hidden neuron in the current layer or to add it as a new layer.

This could be done by a version of the candidate pool which has six candidate neurons,

where three are potentially for the current layer, and the other three are for a new layer.

The network could be pre-trained by installing all candidate neurons and selects one

neuron to install, which has the best validation result. It may have a better performance

than normal layered cascade neural networks’ but the complexity of the candidate pool

would have a higher computational cost. This may be acceptable as would not need to set

the size of layers in the first instance.

5.2 Random Limit Layered Cascade Neural Network

In the previous experiments, the better results with high variances happen frequently.

Therefore enhancing the randomness of the network building process is an idea to

improve the layered cascade neural network. In layered cascade neural networks with

random layer size, the size of each layer is a random number where the lower limit and

upper limit are only set by the users or left at some default values. Otherwise, the

working process of the random limit layered cascade neural network would be the same

as the layered cascade neural network. The change would increase the variance of results

to increase the probability of occurrence of a better result. The final step of constructing a

layered cascade neural network is selecting the structure and weights of the network

which has the best result. To solve a “real-world” problem, getting the best structure is

most important. The random limit layered CasPer algorithm would generate a network

with a random structure and reaches extreme cases frequently, and then catches the best

result from the extreme cases. Fig 20 shows a sample drawing of a random limit layered

cascade neural network without connection details.


37

Fig. 20. Sample of Random Limit Layered Cascade Neural Network

without connection details(upper limit of random number is 8,

lower limit of random number is 2)

5.3 Limited Connections Layered Cascade Neural Network

The Layered_CasPer algorithm reduces the number of connections compared to CasPer or

CasCor. To further reduce the number of connections, a limit on the maximum number of

input connections could be set. If this was lower than the number of inputs then some

systematic or random procedure could be used to choose input connections for any new

neurons. This would most likely work best with a candidate pool. For second or later

layer neurons, connections could be biased towards previous larger neurons over inputs

ranging from no bias (any connections) to a complete bias (only previous layer

connections, more similar to traditional feedforward neural networks).

6 Conclusion

The Layered_CasPer extension of the A_CasPer algorithm, which uses a layered network

architecture, results in a similar performance on most datasets to A_CasPer and better

performance on classification datasets which have a large number of inputs. However


38

Layered_CasPer does not otherwise surpass A_CasPer and AT_CasPer in terms of

generalization and convergence on regression datasets. With the computational cost

reducing, the variance of results of Layered_CasPer is increased where compared to the

A_CasPer algorithm’s. Experimental results on the two spirals problem and additional

regression problems show similar trends as for the other benchmark datasets, with

results consistently slightly less than A_CasPer. The key benefit of the Layered_CasPer

algorithm is that it performs similarity to A_CasPer, and has layers. Neural network with

layers are more familiar to neural network users and it could lead to greater acceptance

of these cascade neural networks.


39

References

[1] N. K. Treadgold and T. D. Gedeon, “Exploring constructive cascade networks,” IEEE Trans. Neural Netw., vol. 10, no. 6, pp. 1335–1350, Nov. 1999. [2] Fahlman, S.E. and Lebiere, C. ‘The cascade-correlation learning architecture,” Advances in Neural Information Processing, vol. 2, D.S. Touretzky, (Ed.) San Mateo, CA:Morgan Kauffman, 1990, pp. 524-532. [3] Treadgold, N.K. and Gedeon, T.D. “A Cascade Network Employing Progressive RPROP,” Int. Work Con$ on Artificial and Natural Neural Networks, 1997, pp. 733-742. [4] Treadgold, N.K. and Gedeon, T.D. “Extending CasPer: A Regression Survey,” Int. Con& On Neural Information Processing, 1997, pp.310-313. [5] M. Riedmiller and H. Braun, “A direct adaptive method for faster backpropagation learning: The RPROP algorithm,” in Proc. ICNN 93, San Francisco, CA, 1993, pp. 586–591. [6] N. K. Treadgold and T. D. Gedeon, “Increased Generalization through Selective Decay in a Constructive Cascade Network”, IEEE International Conference on Systems, Man, and Cybernetics., vol. 5, no.10, pp 4465-4469, Oct, 1998. [7] N.K. Treadgold and T.D. Gedeon, “Exploring architecture variations in constructive cascade networks”, IEEE World Congress on Computational Intelligence, IEEE International Joint Conference on Neural Networks Proceedings, vol. 1,pp 343-348, May, 1998. [8] Terry Regier 1988, Quickpin Program, Quickprop 1, Design of Intelligent Systems, Michigan

State University, US, viewed August 2011, <www.cse.msu.edu/~cse449/spr98/demos/quickprop.c>

[9] Su, L., Guan, S.-U., 2000. “Two-dimensional extensions of cascade correlation networks,” In: The Fourth International Conference on High-Performance Computing in the Asia-Pacific Region. vol. 1. IEEE, pp. 138-141. [10] Masayuki Murakami, Nakaji Honda, “Classification Performance of the IDS Method Based on the Two-Spiral Benchmark”, Systems Man and Cybernetics, 2005 IEEE International Conference on, Oct. 2005. [11] Chihiro Ikuta, Yoko Uwate and Yoshifumi Nishio, “Multi-Layer Perceptron Having Neuro-Glia Network”, 2010 International Symposium on Nonlinear Theory and its Applications NOLTA2010, Krakow, Poland, September 5-8, 2010. [12] C. Ikuta, Y. Uwate and Y. Nishio, “Chaos Glial Network Connected to Multi-Layer Perceptron for Solving Two-Spiral Problem,” Proc. ISCAS’10, May 2010. [13] C.T. Sun and J.S. Jang, “A neuro-fuzzy classifier and its applications,” in Proc. IEEE Int. Conf. Fuzzy Syst., San Francisco, CA, vol. I, pp. 94–98, Mar. 1993. [14] Gao Daqi, Li Hao, and Yang Yunfan, “Task Decomposition and Modular Perceptons with Sigmoid Activation Functions for Solving the Two-Spirals Problem”, Networking, Sensing and Control, 2006. ICNSC, Pages: 218 – 223, 2006


40

Appendix A: Screenshots of Cascade Neural Network Toolbox v1.2

Output: Diagram of Training Errors


41

Output: Hinton diagram of the weights of the network


42

Output: Statistics information for each new neuron added


43

Output: Final performance statistics of the network which has the best validation/testing result


44

Appendix B: Experiment Results

Experiment Results: Classification Task

Layered Cascade Artificial Neural Network - CECS -...

Documents

Transcript of Layered Cascade Artificial Neural Network - CECS -...