CAP 6615 Neural Networks - University of Floridaplaza.ufl.edu/maheshp/nn.pdf · 2004. 2. 24. ·...

DEPARTMENT OF ELECTRICAL & COMPUTER ENGINEERING

CAP 6615 Neural Networks

Rajesh Pydipati

CAP 6615: Neural Networks

© 2003 Rajesh Pydipati 4 Fall 2003

Introduction The objective of taking this course was to get a clear understanding of the concepts involved in neural network computing so that the technology can be tailored to solve a plethora of real world problems with wide ranging applications in various fields which are mentioned below. Emphasis was mainly on programming to implement the working of the algorithms. Course Description

• Objectives: Understand the concepts and learn the techniques of neural network computing. • Prerequisites: A familiarity with basic concepts in calculus, linear algebra, and probability theory.

Calculus requirements include: differentiation, chain rule, integration. Linear algebra requirements include: matrix multiplication, inverse, pseudo-inverse.

• Main topics: Introduction to neural computational models including classification, association, optimization, and self-organization. Learning and discovery. Knowledge-based neural network design and algorithms.

• Applications include: pattern recognition, expert systems, control, signal analysis, and computer vision.

Syllabus

• Basic neural computational models • Feedforward networks • Learning / back propagation • Association networks • Classification • Self-Organization • Radial Basis Function networks • Support Vector Machines • Networks based on lattice computation • Applications

Projects: A set of four projects were done as part of this course. A detailed description of each project with an approach to the solution is presented next.


Project 1: Problem statement: Project 1a: Implement the SLP learning algorithm. Implement the algorithm yourselves; do not use any ANN package. Train your SLP to classify the capital letter patterns A, B, C, and D, in two classes, C1 and -C1 as follows: A belongs to C1; B, C, and D all belong to -C1. After training, test whether your SLP correctly classifies the same four patterns. You may use either the unipolar or the bipolar version of the patterns Approach: This problem was to make us understand the basic working of a neural network circuit, the ‘perceptron’. The problem involved, was to identify four different letter patterns which were fed as a stream of ‘0’s and ‘1’s to the network. After constructing the network architecture, it was trained on some data. After training was complete, some patterns were tested to test the efficacy of the algorithm. Algorithms were written in MATLAB. Results: The network was able to perfectly classify the four letter patterns Project 2a: Implement an SLP to solve the following problem:

1. Randomly choose 1000 points on either side of the line y = 0.5x + 2. Do not choose points exactly on the line. Also, pick the points between fixed bounds b1 <= x <= b2 such that b2-b1 < 100, as shown in the figure below.

2. Train the SLP to discriminate between the two classes of points. Use a sequential application of the points. Then pick 5 test points (not from the training set!) from either side of the line to test your SLP on. Do they classify correctly? (The difficulty will be when the test points are close to the line.) Print out the equation of the line obtained from the final weights. If the 5 points were not correctly classified, use them as additional training points and retrain the network. Then pick another 5 points and test again.




Approach: This problem was to make us understand the working of a single layer perceptron in making decision surfaces to distinguish various clusters of data.The problem involved in classifying the data points as belonging to two clusters by forming a hyper plane as a decision boundary. After constructing the network architecture, it was trained on some data. After training was complete, some data points were tested to test the efficacy of the algorithm. Algorithms were written in MATLAB. Results: The network was able to perfectly classify the data points according to the decision boundary. Project 2 Problem statement: Project 2 is more of a research project and consists of implementing the backpropagation training algorithm for multilayer perceptrons. Use Fisher's Iris dataset to train and test a multiclass MLP using the backpropagation method. The dataset comprises 3 groups (classes) of 50 patterns each. One group corresponds to one species of Iris flower. Every pattern has 4 real-valued features. The number of input and output neurons is known; the number of hidden layers and hidden neurons are your choice.

• Train your network using 13 exemplar patterns from each class (roughly 25% of the patterns) picked at random. Then use the remaining patterns in the dataset to test the network and report the results.

• Next, train your network using 25 exemplar patterns from each class (i.e., 50% of the patterns) picked at random. Use the remaining patterns in the set to test the network and report the results.

• Next, train your network using 38 exemplar patterns from each class (roughly 75% of the patterns) picked at random. Use the remaining patterns to test the network and report the results.

• Finally, train your network on the entire pattern set. Then use the same patterns to test the network and report the results. Note that this last experiment may pose serious convergence problems.

Explore techniques such as momentum to increase convergence speed, try various network architectures (number of hidden layers and neurons in the hidden layers), investigate various stopping criteria and ways to adjust learning rate and other parameters you might have. Fisher's Iris Dataset R.A. Fisher's Iris dataset is often referenced in the field of pattern recognition. It consists of 3 groups (classes) of 50 patterns each. One group corresponds to one species of Iris flower: Iris Setosa (class C ), Iris Versicolor (class C ), and Iris Verginica (class C ). Every pattern has 4 features (attributes), representing petal width, petal length, sepal width, and sepal length (expressed in centimeters). The

1

2 3dataset file contains

one pattern per line, starting with the class number, followed by the 4 features. Lines are terminated with single LF characters. Patterns are grouped by class.

http://www.cise.ufl.edu/~liancu/cap6615/datau/iris

http://www.cise.ufl.edu/~liancu/cap6615/datau/iris


Irises by Vincent van Gogh (oil on canvas 1889) Approach: A network based on back propagation principle was constructed to train the network and classify the various classes as mentioned above. The material below explains the architecture in detail.

Basic functions implemented in the neural networks algorithm: A one hidden layer MLP network with

feed-forward different and randomly selected training and test data sets.

Software used: MATLAB (Run on Windows Platform)

Important Considerations: Number of layers , Number of Processing elements in each layer, Randomizing the training and test data sets , Expressive power , Training error , Activation function , Scaling input , Target values , Initializing weights , Learning rate , Momentum learning , Stopping criterion ,Criterion function

1. Number of layers:

Multilayer neural networks implement linear discriminants, in a space where the inputs have been mapped non-linearly. Non-linear multilayer networks have greater computational or expressive power than simple 2-layer networks (input & output layers), and can implement more functions. Given a sufficient number of hidden units, any function can be represented. For this project, a one hidden layer MLP has been chosen in order to reduce the complexity of the decision hyper plane.

2. Number of Processing elements in each layer:

The number of PE’s in the input and output layer can be easily understood based on the key features in the input and output space. A clear observation of the input of each feature set reveals the principal components that can be used as distinguished features between the three classes of IRIS leaves that we plan to classify. Every pattern has 4 features (attributes), representing petal width, petal length, sepal width, and sepal length (expressed in centimeters). An attempt at trying to reduce the input space, in order to reduce the overall complexity of the classifier has been made. However, it should not be




forgotten that neglecting any of the features without proper reasoning might amount to losing key features and hence reduce accuracy of our classifier. Thus the input space has been analyzed for the principal components using the PCA algorithm, which has also been implemented in the source code used for arriving at the neural networks based solution in this project. The number of output PE’s is 3 due to the need for classifying each input space data set into one of the three different classes as mentioned in the problem definition. Choosing the number of PE’s in the hidden layer is, however, a more intuitive task. It is found that PE’s equal to 3 give best results. However varying the PE’s with in a acceptable number did not alter the accuracies much.

3. Randomizing the training and test data sets:

For most practical outputs, the need for randomizing the training and test data sets is important. A total of 13,25,38,50 training data sets, respectively, were used for each class, in each part of the project. Correspondingly, a total 37,25,12,50 test data sets have been used. This data has been randomly permuted before being fed forward in the network.

4. Expressive power:

Although we will have cause to use networks with different activation function for each layer or each unit of each layer, to simplify the mathematical analysis, identical non-linear activation functions were used.

5. Training error:

The training error on a pattern is the sum over output units of the squared difference between the desired output d k and the actual output y k.

J (w) = ½* || d – y || 2

The training error for the hidden layer is calculated by the back propagation of the output layer errors.

6. Activation function:

The important constraint is that this should be continuous and differentiable. The sigmoid is a smooth, differentiable, non-linear and saturating function. A minor benefit is that the derivative can be easily expressed in terms of itself. That’s why, this function was chosen.

7. Scaling input:

In order to avoid difficulty due to difference of scale for each input, the input patterns should be shifted so that the average over the training set of each feature is zero. Since online protocols do not have the full data set at any one time, the scaling of the inputs was not found necessary.

8. Target values:



For a finite value of net k, the output could never reach the saturation value, and thus there would be error. Full training would never terminate because weights would become extremely large, as the net k would be driven to plus or minus infinity. Thus target values corresponding to 2*(desired-1) were used here.

9. Initializing weights:

For uniform learning, i.e. for all weights to reach their equilibrium values at the same time, initializing the weights is very crucial. In case of non-uniform learning, one category is learnt well before the others and so overall error rate is typically higher than necessary, due to redistribution of error. To ensure uniform learning, the weights have been randomly initialized for each given layer.

10. Learning rate:

The optimal step size is given by step opt = (d 2 J/ d w 2) –1. The proper setting of this parameter greatly affects the convergence as well as the classification accuracy. After a lot of trials, it was found that this parameter should be very small (say a value in the range (1/10 to 1/1000)), to get close to accurate results.

11. Momentum learning:

Error surfaces often have multiple minima in which d J (w) / d w is very small. These arise when there are too many weights and thus the error depends only weakly upon any one of them. Momentum allows the network to learn more quickly. The effect of the momentum term for the narrow steep regions of the weight learning space is to focus the movement in a downhill direction by averaging out the components of the gradient which alternate in sign [Gupta, Homma et al]. After a lot of trails, it was found that the momentum learning parameter should be less than 1 for this particular application. Varying this parameter with in between 0 and 1, did not adversely affect the performance. However it should be noted that increasing this parameter value significantly reduces the convergence speed.

12. Stopping criterion:

Usually the stopping criterion used is when the error falls below the error achieved on a separate validation set (in which no data set for the test set has also been in the training set). But here after training and testing individually for each class, it was found that the error is typically 0.4 when it converges. So this was the stopping criterion used, in order to avoid over fitting and due to its simplicity in implementing.

13. Criterion function:

The squared error has been used as the criterion function for this project.



Results:

The results of the neural networks based classifier has been presented in the form of a confusion matrix, the columns of which represent the classification results and the rows of which represent the class numbers of our classification. The diagonal terms illustrate the correct results and the off diagonal terms in each column; illustrate the wrong classification results, for that particular set of features (corresponding to each class). Various experiments were conducted on maximizing the circuit performance. A sample result is shown below.

trainConf classification results on the training data

testConf classification results on the test data

Step size change

a) Initial step size = 0.0400

Number of processing elements in the hidden layer = 3

Momentum factor = 0.9000

trainConf = 13 0 0

0 13 0

0 0 13

testConf = 37 0 0

0 33 0

0 4 37



Project 3 Problem statement Project 3 consists of implementing a training algorithm for a morphological perceptron with dendritic structures (MPDS). Write a program that trains a two-input, two-output MPDS to solve the embedded spirals problem. The parametric equations of the two spirals are:

x1(theta) = theta * cos(theta) * 2 / pi y1(theta) = theta * sin(theta) * 2 / pi

and x2(theta) = – x1(theta) y2(theta) = – y1(theta)

where

theta = 0*pi/16+pi/2, 1*pi/16+pi/2, 2*pi/16+pi/2, ..., 64*pi/16+pi/2. The spirals are initially sampled in 65 points, at angles ranging from pi/2 to 4*pi+pi/2 in uniform increments of pi/16. These 2*65 points are provided for your convenience as a dataset file in the Datasets section and will represent the first training set. The program will run in stages. At each stage, it will train the SLMP, then double the number of points by subsampling the spirals (substituting theta), and then test the SLMP on the entire set (consisting of the original training points together with the intermediate test points). The stages are repeated until either correct classification occurs for all points, or the number of points per spiral reaches 1025. The figure below illustrates the two spirals, each with the initial 65 training points depicted as solid dots and the first test set of 64 intermediate points as empty circles.

To summarize, your implementation must perform the following tasks:

• constructs an SLMP and trains it on the initial training set; • generates 64 intermediate points per spiral, each point being on the spiral (and not on the edge

connecting two points) dividing an arc piece in two halves, resulting in 129 points per spiral (as in the above figure);

• tests the SLMP on the entire, 2*129 point, set and reports the results; • if recognition is 100% correct, then the program stops; otherwise, it retrains the SLMP on the new

set of data and continues; • doubles the number of points by generating a new set of intermediate points on each spiral; thus, the

new set will consist of 2*257 points; • tests the SLMP on the entire, 2*257 point, set and reports the results; • repeats this procedure (retraining, doubling the number of points, and then testing) until either

recognition is 100% accurate, or the total number of points per spiral has reached 1025; reports the classification results on the last test set and then stops.


.

Approach:

As explained in the problem, an algorithm was written in MATLAB to implement the Morphological perceptron with dendritic structures. This particular algorithm mimics the exact functioning of a human brain growing and shrinking dendrites all the time, as it progresses in its learning and hence its name.

Results:

The algorithm was able to classify the spirals accurately.




Project 4 consists of implementing several types of associative memories. Project 4a: Implement the algorithm to create a Hopfield auto-associative memory that stores the capital letter patterns A, B, C, E, and X. Test the memory on all of the following patterns:

o perfect (undistorted) A, B, C, E, and X; o corrupted A5%, B5%, C5%, E5%, and X5% with 5% dilative, erosive, and random noise; o corrupted A10%, B10%, C10%, E10%, and X10% with 10% dilative, erosive, and random noise.

Remember, however, that the Hopfield network requires bipolar data, so be sure to make the necessary conversions.

• Project 4b: Implement the algorithm to create a pair of morphological auto-associative memories M and W that store the same capital letter patterns A, B, C, E, and X. As before, test the memory on all of the following patterns:

o perfect (undistorted) A, B, C, E, and X; o corrupted A5%, B5%, C5%, E5%, and X5% with 5% dilative, erosive, and random noise; o corrupted A10%, B10%, C10%, E10%, and X10% with 10% dilative, erosive, and random noise.

Approach:

Hop field associative memory

The code for implementing the Hopfield associative memory is written in MATLAB. Observations: 1) In all the cases (undistorted, 5% dilative, erosive and random noise cases as well as 10% dilative, erosive and random noise cases) the letter patterns ‘A’ and ‘x’ were classified correctly. The main reason for this may be because the patters themselves are largely unassociated as there is wide disparity in the letter form itself in between these cases. The other patterns are not classified correctly. The main reason for this may be because the letter patterns ‘B’,’C’,’E’ were somewhat similar in one way or another among themselves . The results are plotted using the ‘imshow’ function in MATLAB. Make sure that Image processing tool box is available in your version of Matlab, otherwise we can not observe the results. The code was written in MATLAB on windows platform using version 6.12. While executing the code no significant problems were encountered. The results are very obvious from the figures that pop up after executing the code. However appropriate insights regarding those results are also mentioned above.


Results: Some results are shown here for the case of patterns with 5% dilative noise added:

Pattern ‘A’ recalled Pattern ’B’ not recalled

Pattern ‘C’ not recalled Pattern ‘E’ not recalled

Pattern ‘X’ recalled Final Observation: In the case of Hopfield associative memory we observe that recall is successful only when the patterns themselves are not very similar. In case the patterns are similar the memory is confused and pops out garbage results.



Morphological Associative memories using matrices ‘W’ and ‘M’

Observations: In these memories, the entire letter patterns (undistorted, 5% dilative, erosive and random noise cases) were classified correctly both by the W and M associative memories. In the 10% noise case we observe that

1) W is robust in the presence of Erosive noise. 2) M is robust in the presence of Dilative noise.

Even this, is very subtle to observe as the recalls are perfect in all patterns for both W and M associative memories, except in one pattern ‘E’ (10% erosive noise case) where W performs better than M associative memory reinforcing the observations mentioned above. Probably, if more noise is added we would be able to better appreciate the facts mentioned in (1) and (2) above. Results: For the case of 10% erosive noise added patterns the following recalls were obtained.



This is Pattern ‘E’ (not clear on the white background)

Final Observation:

In general morphological memories performed better than Hopfield associative memories.



Additional patterns that were generated and their recalls using Wand M memories:



Results of additional patterns with Hopfield associative memory




References:

1) Static and Dynamic Neural Networks: From Fundamentals to Advanced Theory Madan M. Gupta, Liang Jin, Noriyasu Homma

2) Neural Networks: A Comprehensive Foundation Simon S. Haykin

3) Pattern Classification Richard O. Duda, Peter E. Hart, David G. Stork

4) Class notes of Prof. Gerhard Ritter of CAP 6615

CAP 6615 Neural Networks - University of Floridaplaza.ufl.edu/maheshp/nn.pdf · 2004. 2. 24. ·...

Documents

Transcript of CAP 6615 Neural Networks - University of Floridaplaza.ufl.edu/maheshp/nn.pdf · 2004. 2. 24. ·...