Artificial Neural Networks

��

Chapter XVIArtificial Neural Networks

Xiaojun YangFlorida State University, USA

Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.

AbstrAct

Artificial neural networks are increasingly being used to model complex, nonlinear phenomena. The purpose of this chapter is to review the fundamentals of artificial neural networks and their major applications in geoinformatics. It begins with a discussion on the basic structure of artificial neural networks with the focus on the multilayer perceptron networks given their robustness and popularity. This is followed by a review on the major applications of artificial neural networks in geoinformat-ics, including pattern recognition and image classification, hydrological modeling, and urban growth prediction. Finally, several areas are identified for further research in order to improve the success of artificial neural networks for problem solving in geoinformatics.

IntroductIon

An artificial neural network (commonly just neural network) is an interconnected assemblage of artificial neurons that uses a mathematical or computational model of theorized mind and brain activity, attempting to parallel and simulate the powerful capabilities for knowledge acquisi-

tion, recall, synthesis, and problem solving. It originated from the concept of artificial neuron introduced by McCulloch and Pitts in 1943. Over the past six decades, artificial neural networks have evolved from the preliminary development of artificial neuron, through the rediscovery and popularization of the back-propagation training algorithm, to the implementation of artificial neu-

��

Artificial Neural Networks

ral networks using dedicated hardware. Theoreti-cally, artificial neural networks are highly robust in data distribution, and can handle incomplete, noisy and ambiguous data. They are well suited for modeling complex, nonlinear phenomena ranging from financial management, hydrologi-cal modeling to natural hazard prediction. The purpose of the article is to introduce the basic structure of artificial neural networks, review their major applications in geoinformatics, and discuss future and emerging trends.

bAckground

The basic structure of an artificial neural network involves a network of many interconnected neu-rons. These neurons are very simple processing elements that individually handle pieces of a big problem. A neuron computes an output using an activation function that considers the weighted sum of all its inputs. These activation functions can have many different types but the logistic sigmoid function is quite common:

1f ( x )1 e x

where f(x) is the output of a neuron and x rep-resents the weighted sum of inputs to a neuron. As suggested from Equation 1, the principles of computation at the neuron level are quite simple, and the power of neural computation relies upon the use of distributed, adaptive and nonlinear computing. The distributed comput-ing environment is realized through the massive interconnected neurons that share the load of the overall processing task. The adaptive property is embedded with the network by adjusting the weights that interconnect the neurons during the training phase. The use of an activation function in each neuron introduces the nonlinear behavior to the network.

There are many different types of neural net-works, but most can fall into one of the five major

paradigms listed in Table 1. Each paradigm has advantages and disadvantages depending upon specific applications. A detailed discussion about these paradigms can be found elsewhere (e.g., Bishop, 1995; Rojas, 1996; Haykin, 1999; and Principe et al., 2000). This article will concentrate upon multilayer perceptron networks due to their technological robustness and popularity (Bishop, 1995).

Figure 1 illustrates a simple multilayer per-ceptron neural network with a 4×5×4×1 structure. This is a typical feed-forward network that al-lows the connections between neurons to flow in one direction. Information flow starts from the neurons in the input layer, and then moves along weighted links to neurons in the hidden layers for processing. The weights are normally determined through training. Each neuron contains a nonlinear activation function that combines information from all neurons in the preceding layers.The output layer is a complex function of inputs and internal network transformations.

The topology of a neural network is critical for neural computing to solve problems with reasonable training time and performance. For any neural computing, training time is always the biggest bottleneck and thus, every effort is needed to make training effective and affordable. Training time is a function of the complexity of the network topology which is ultimately deter-mined by the combination of hidden layers and neurons. A trade-off is needed to balance the processing purpose of the hidden layers and the training time needed. A network without a hidden layer is only able to solve a linear problem. To tackle a nonlinear problem, a reasonable number of hidden layers is needed. A network with one hidden layer has the power to approximate any function provided that the number of neurons and the training time are not constrained (Hornik, 1993). But in practice, many functions are dif-ficult to approximate with one hidden layer and thus, Flood and Kartam (1994) suggested using two hidden layers as a starting point.

��


No. Type Example Brief description

1 Feed-forward neural network

Multi-layer perceptron It consists of multiple layers of processing units that are usually interconnected in a feed-forward way

Radial basis functions As powerful interpolation techniques, they are used to replace the sigmoidal hidden layer transfer function in multi-layer perceptrons

Kohonen self-organiz-ing networks

They use a form of unsupervised learning method to map points in an input space to coordinate in an output space.

2 Recurrent network Simple recurrent networks

Contrary to feed-forward networks, recurrent neural networks use bi-directional data flow and propagate data from later processing stages to earlier stagesHopfield network

3 Stochastic neural networks

Boltzmann machine They introduce random variations, often viewed as a form of statis-tical sampling, into the networks

4 Modular neural networks

Committee of machine They use several small networks that cooperate or compete to solve problems.

5 Other types Dynamic neural net-works

They not only deal with nonlinear multivariate behavior, but also include learning of time-dependent behavior.

Cascading neural networks

They begin their training without any hidden neurons. When the output error reaches a predefined error threshold, the networks add a new hidden neuron.

Neuro-fuzzy networks They are a fuzzy inference system in the body which introduces the processes such as fuzzification, inference, aggregation and defuzzi-fication into a neural network.

Table 1. Classification of artificial neural networks (Source: Haykin, 1999)

Figure 1. A simple multilayer perceptron(MLP) neutral network with a 4 X 5 X 4 X 1 structure

��


The number of neurons for the input and output layers can be defined according to the research problem identified in an actual application. The critical aspect is related to the choice of the number of neurons in hidden layers and hence the number of connection weights. If there are too few neu-rons in hidden layers, the network may be unable to approximate very complex functions because of insufficient degrees of freedom. On the other hand, if there are too many neurons, the network tends to have a large number of degrees of free-dom which may lead to overtraining and hence poor performance in generalization (Rojas, 1996). Thus, it is crucial to find the ‘optimum’ number of neurons in hidden layers that adequately capture the relationship in the training data. This optimi-zation can be achieved by using trial and error or several systematic approaches such as pruning and constructive algorithms (Reed, 1993).

Training is a learning process by which the connection weights are adjusted until the network is optimal. This involves the use of training sam-ples, an error measure and a learning algorithm. Training samples are presented to the network with input and output data over many iterations. They should not only be large in size but also be representative of the entire data set to ensure sufficient generalization ability. There are several different error measures such as the mean squared error (MSE), the mean squared relative error (MSRE), the coefficient of efficiency (CE), and the coefficient of determination (r2) (Dawson and Wilby, 2001). The MSE has been most commonly used. The overall goal of training is to optimize errors through either a local or global learning algorithm. Local methods adjust weights of the network by using its localized input signals and localized first- or second- derivative of the error function. They are computationally effective for changing the weights in a feed-forward network but are susceptible to local minima in the er-ror surface. Global methods are able to escape local minima in the error surface and thus can find optimal weight configurations (Maier and Dandy, 2000).

By far the most popular algorithm for opti-mizing feed-forward neural networks is error back-propagation (Rumelhart et al., 1986). This is a first-order local method. It is based on the method of steepest descent, in which the descent direction is equal to the negative of the gradient of the error. The drawback of this method is that its search for the optimal weight can become caught in local minima, thus resulting in suboptimal solutions. This vulnerability could increase when the step size taken in weight space becomes too small. Increasing the step size can help escape lo-cal error minima, but when the step size becomes too large, training can fall into oscillatory traps (Rojas, 1996). If that happens, the algorithm will diverge and the error will increase rather than decrease.

Apparently, it is difficult to find a step size that can balance high learning speed and minimiza-tion of the risk of divergence. Recently, several algorithms have been introduced to help adapt step sizes during training (e.g., Maier and Dandy, 2000). In practice, however, a trial-and-error approach has often been used to optimize step size. Another sensitive issue in back-propagation training is the choice of initial weights. In the absence of any a priori knowledge, random values should be used for initial weights.

The stop criteria for learning are very im-portant. Training can be stopped when the total number of iterations specified or a targeted value of error is reached, or when the training is at the point of diminishing returns. It should be noted that using low error level is not always safe to stop the training because of possible overtraining or overfitting. When this happens, the network memorizes the training patterns, thus losing the ability to generalize. A highly recommended method for stopping the training is through cross validation (e.g., Amari et al., 1997). In doing so, an independent data set is required for test pur-poses, and close monitoring of the error in the training set and the test set is needed. Once the error in the test set increases, the training should

��


be stopped since the point of best generalization has been reached.

AppLIcAtIons

Artificial neural networks are applicable when a relationship between the independent variables and dependent variables exists. They have been applied for such generic tasks as regression analy-sis, time series prediction and modeling, pattern recognition and image classification, and data processing. The applications of artificial neural networks in geoinformatics have concentrated on a few major areas such as pattern recognition and image classification (Bruzzone et al., 1999), hydrological modeling (Maier and Dandy, 2000) and urban growth prediction (Yang, 2009). The following paragraphs will provide a brief review on these areas.

Pattern recognition and image classification are among the most common applications of artificial neural networks in remote sensing, and the documented cases overwhelmingly relied upon the use of multi-layer perceptron networks. The major advantages of artificial neural networks over conventional parametric statistical approaches to image classification, such as the Euclidean, maxi-mum likelihood (ML), and Mahalanobis distance classifiers, are that they are distribution-free with less severe statistical assumptions needed and that they are suitable for data integration from various sources (Foody, 1995). Artificial neural networks are found to be accurate in the classification of remotely sensed data, although improvements in accuracies have generally been small or modest (Campbell, 2002).

Artificial neural networks are being used in-creasingly to predict and forecast water resource variables such as algae concentration, nitrogen concentration, runoff, total volume, discharge, or flow (Maier and Dandy, 2000; Dawson and Wilby, 2001). Most of the documented cases used a multi-layer perceptron that was trained by using

the back-propagation algorithm. Based on the results obtained so far, there is little doubt that artificial neural networks have the potential to be a useful tool for the prediction and forecasting of water resource variables.

The application of artificial neural networks for urban predictive modeling is a new but rapidly expanding area of research (Yang, 2009). Neural networks have been used to compute develop-ment probability by integrating a set of predictive variables as the core of a land transformation model (e.g., Pijanowski et al., 2002) or a cellular automata-based model (e.g., Yeh and Li, 2003). All the applications documented so far involved the use of the multilayer perceptron network, a grid-based modeling framework, and a Geographic Information Systems (GIS) that was loosely or tightly integrated with the network for input data preparation, modeling validation and analysis.

concLusIon And future trends

Based on many documented applications within recent years, the prospect of artificial neural networks in geoinformatics seems to be quite promising. On the other hand, the capability of neural networks tends to be oversold as an all-inclusive ‘black box’ that is capable to formulate an optimal solution to any problem regardless of network architecture, system conceptualiza-tion, or data quality. Thus, this field has been characterized by inconsistent research design and poor modeling practice. Several researchers recently emphasized the need to adopt a system-atic approach for effective neural network model development that considers problem conceptual-ization, data preprocessing, network architecture design, training methods, and model validation in a sequential mode (e.g., Mailer and Dandy, 2000; Dawson and Wilby, 2001; Yang, 2009).

There are a few areas where further research is needed. Firstly, there are many arbitrary decisions

��


involved in the construction of a neural network model, and therefore, there is a need to develop guidance that helps identify the circumstances under which particular approaches should be adopted and how to optimize the parameters that control them. For this purpose, more empirical, inter-model comparisons and rigorous assessment of neural network performance with different inputs, architectures, and internal parameters are needed. Secondly, data preprocessing is an area where little guidance can be found. There are many theoretical assumptions that have not been confirmed by empirical trials. It is not clear how different preprocessing methods could affect the model outcome. Future investigation is needed to explore the impact of data quality and different methods in data division, data standardization, or data reduction. Thirdly, continuing research is needed to develop effective strategies and prob-ing tools for mining the knowledge contained in the connection weights of trained neural network models for prediction purposes. This can help uncover the ‘black-box’ construction of the neural network, thus facilitating the understanding of the physical meanings of spatial factors and their contribution to geoinformatics. This should help improve the success of neural network applica-tions for problem solving in geoinformatics.

references

Amari, S., Murata, N., Muller, K. R., Finke, M., & Yang, H. H. (1997). Asymptotic statistical theory of overtraining and cross-validation. IEEE Trans-actions On Neural Networks, 8(5), 985-996.

Bishop, C. ( 1995). Neural Networks for Pattern Recognition (p. 504). Oxford: University Press.

Bruzzone, L., Prieto, D. F., & Serpico, S. B. (1999). A neural-statistical approach to multitemporal and multisource remote-sensing image classification. IEEE Transactions on Geoscience and Remote Sensing, 37(3), 1350-1359.

Campbell, J. B. (2002). Introduction to Remote Sensing (3rd ) (p. 620). New York: The Guiford Press.

Dawson, C. W., & Wilby, R. L. (2001). Hydro-logical modelling using artificial neural networks. Progress in Physical Geography, 25(1), 80-108.

Flood, I., & Kartam, N. (1994). Neural networks in civil engineering.2. systems and application. Journal of Computing in Civil Engineering, 8(2), 149-162.

Foody, G. M. (1995). Land cover classification using an artificial neural network with ancillary information. International Journal of Geographi-cal Information Systems, 9, 527- 542.

Haykin, S. (1999). Neural Networks: A Compre-hensive Foundation (p. 842). Prentice Hall.

Hornik, K. (1993). Some new results on neural-network approximation. Neural Networks, 6(8), 1069-1072.

Kwok, T. Y., & Yeung, D. Y. (1997). Constructive algorithms for structure learning in feed-forward neural networks for regression problems. IEEE Transactions On Neural Networks, 8(3), 630-645.

Maier, H. R., & Dandy, G. C. (2000). Neural networks for the prediction and forecasting of water resources variables: A review of modeling issues and applications. Environmental Modelling & Software, 15, 101-124.

Pijanowski, B. C., Brown, D., Shellito, B., & Manik, G. (2002). Using neural networks and GIS to forecast land use changes: A land transforma-tion model. Computers, Environment and Urban Systems, 26, 553–575.

Principe, J. C., Euliano, N. R., & Lefebvre, W. C. (2000). Neural and Adaptive Systems: Fun-damentals Through Simulations (p. 565). New York: John Wiley & Sons.

��


Reed, R. (1993). Pruning algorithms - a survey. IEEE Transactions On Neural Networks, 4(5), 740-747.

Rojas, R. (1996). Neural Networks: A Systematic Introduction (p. 502). Springer-Verlag, Berlin.

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In Parallel Distributed Processing D. E. Rumelhart, & J. L. McClelland. Cambridge: MIT Press.

Yang, X. (2009). Artificial neural networks for urban modeling. In Manual of Geographic Information Systems, M. Madden. American Society for Photogrammetry and Remote Sens-ing (in press).

Yeh, A. G. O., & Li, X. (2003). Simulation of development alternatives using neural networks, cellular automata, and GIS for urban planning. Photogrammetric Engineering and Remote Sens-ing, 69(9), 1043-1052.

key terMs

Architecture: The structure of a neural network including the number and connectivity of neurons. A network generally consists of an input layer, one or more hidden layers, and an output layer.

Back-Propagation: The training algorithm for the feed-forward, multi-layer perceptron networks which works by propagating errors back through a network and adjusting weights in the direction opposite to the largest local gradient.

Error Space: The n-dimensional surface in which weights in a networks are adjusted by the back-propagation algorithm to minimize model error.

Feed-Forward: A network in which all the connections between neurons flow in one direc-tion from an input layer, through hidden layers, to an output layer.

Multiplayer Perceptron: The most popular network which consists of multiple layers of in-terconnected processing units in a feed-forward way.

Neuron: The basic building block of a neural network. A neuron sums the weighed inputs, processes them using an activation function, and produces an output response.

Pruning Algorithm: A training algorithm that optimizes the number of hidden layer neurons by removing or disabling unnecessary weights or neurons from a large network that is initially constructed to capture the input-output relationship.

Training/Learning: The processing by which the connection weights are adjusted until the network is optimal.

Artificial Neural Networks

Documents

Transcript of Artificial Neural Networks