[IEEE information Services (ICICIS) - Hong Kong (2011.09.17-2011.09.18)] 2011 International...

A Heuristic Genetic Neural Network for Intrusion Detection

Biying Zhang College of Computer and Information Engineering

Harbin University of Commerce Harbin, China

[email protected]

Abstract—In order to model normal behaviors accurately and improve the performance of intrusion detection, a heuristic genetic neural network(HGNN) is presented. Feature selection, structure design and weight adaptation are evolved jointly in consideration of the interdependence of input features, network structure and connection weights. The penalty factors for the number of input nodes and hidden nodes are introduced into fitness function. The crossover operator based on generated subnet is adopted considering the relationship between genotype and phenotype. An adaptive mutation rate is applied, and the mutation type is selected heuristically from weight adaptation, node deletion and node addition. When the population is not evolved continuously for many generations, in order to jump from the local optima and extend the search space, the mutation rate will be increased and the mutation type will be changed. Experimental results with the KDD-99 dataset show that the HGNN achieves better detection performance in terms of detection rate and false positive rate.

Keywords- intrusion detection; neural network; genetic algorithm; mutation operator; penalty factor

I. INTRODUCTION

With the rapid development of computer network, the number of various attacks and criminals concerning computer networks increases. Intrusion detection is an emerging technique to protect the computer network and has become an essential tool for the network security [1]. According to the type of used pattern, intrusion detection techniques can be classified into two categories: misuse detection and anomaly detection [1][2]. Misuse detection, which is a rule-based approach, uses stored signatures of known attacks to detect intrusions. This approach can detect known intrusions reliably with low false positive rate. However, it fails to detect novel attacks, and the signature database has to be updated manually for future use when new attacks occur. Anomaly detection uses normal patterns to model intrusions, and any deviation from the constructed models for normal behaviors is considered as anomalies [3]. However, it is very difficult to precisely model all normal behaviors, and it is easy to mistakenly classify normal behaviors as attacks, which will result in high false positive rate. Therefore, the key of anomaly detection is to select an appropriate model for normal behaviors.

Recently, machine learning approaches, such as rule learning [4], hidden Markov model [2], support vector machine [5] and neural network [1][6]-[11], have been employed in the field of intrusion detection. Since the neural network has the capability to generalize from limited, noisy and incomplete information, the intrusion detector modeled by neural network can recognize not only previously observed attacks but also future unseen attacks

[1]. Therefore, the neural network has been considered as a promising technique for intrusion detection.

Feature selection, structure design and weight adaptation are considered as three key tasks for the application of neural network, and a number of researches on these three problems have been done [1][6]-[15]. Since an attribute corresponds to an input node in neural networks, feature selection with neural network is actually the procedure of picking up input nodes and can be considered as a special case of structure optimization. Generally, feature selection and structure design are performed separately, i.e., one task is performed without considering another task [1][6]-[10][12][13]. However, this neglects the fact that the subset of input features and the structure of neural network are interdependent and make a joint contribution to the performance of neural network. Recently, a variety of integrated approaches of feature selection and structure design have been proposed to solve the above problem [11][14], i.e., the input feature subset and the network structure are optimized simultaneously. The relationship between input feature subset and neural network structure is taken into full consideration in these approaches, which results in the improvement of the performance of neural network.

In the process of the simultaneous optimization of input features and network structure, in order to evaluate the united goodness of the selected features and the selected network structure, connection weights have to be learned after a near-optimal feature subset and network structure is found. However, one major problem of this kind of approach is noisy evaluation, which will result in the inaccuracy and inefficiency of the optimization. In order to alleviate the above problem, we proposed a joint evolutionary neural network(JENN) in the previous work[15], in which input features, network structure and connection weights were evolved jointly with genetic algorithm. However, the crossover was a primary way for the joint evolution and the mutation adopted a simple strategy, which selected randomly the mutation type from the five mutation operations. Because of the limitation of crossover operator and the simpleness of the applied mutation, it is easy to be trapped into local optima and can not search the overall solution space. In this paper, on the basis of our previous work in [15], an improved genetic algorithm is presented to evolve the neural network, in which a heuristic mutation operator is used to solve the above problem.

II. HEURISTIC EVOLUTION OF NEURAL NETWORK

A. Hybrid Representation

The feedforward neural network structure is employed in this paper. In order to perform feature selection,

2011 International Conference on Internet Computing and Information Services

978-0-7695-4539-4/11 $26.00 © 2011 IEEE

DOI 10.1109/ICICIS.2011.133

510

structure determination and connection weights training jointly, a hybrid representation scheme is used. The genotype of an individual is represented by a connection matrix and a node vector, which is shown in Fig. 1. For the sake of saving space, the compact matrix (see Fig. 1(a)) proposed by [13] is used as the connection matrix. The size of the connection matrix is (h + n)×(m + h), where m and h are the maximum number of input nodes and hidden nodes, and n is the number of output nodes. The entry wij of the matrix is a real number, which indicates the weight value from the node j to the node m + i. There is no connection from node j to node m + i if wij = 0. There is only single direct connection in feed-forward neural networks, so entries of the top-right triangle are 0.

The node vector is shown in Fig. 1(b). The dimension of the vector is m+h, whose entries can only be 0 or 1. The entry indicates whether the node corresponding to the vector index is available or not, where 1 means available and 0 means not available. The former m entries denote input nodes and the latter h entries denote hidden nodes.

B. Fitness Evaluation

To evolve input features and network structure simultaneously, the fitness, which is based on the detection accuracy rate, include a penalty factor for the number of input nodes and a penalty factor for the number of hidden nodes. The fitness function of the individual a is defined by

)()()()( aaadrateafit (1) where drate is the detection accuracy rate, ψ is the penalty factor for the number of input nodes, φ is the penalty factor for the number of hidden nodes.

The detection accuracy rate of the individual a is defined by

sumacorrectadrate /)()( (2) where correct is the number of accurate detections, sum is the total number of detections which include both normal instances and abnormal instances.

The penalty factor ψ is defined by pinmiacina ))((1)( (3)

where cin is the number of input nodes, mi is the minimum number of input nodes, pin is a very small user-

defined parameter which is used to control the influence of the number of input nodes on fitness evaluation.

The penalty factor φ is defined by phidemhideachidea ))((1)( (4)

where chide is the number of hidden nodes, mhide is the minimum number of hidden nodes, phide is a very small user-defined parameter which is used to control the influence of the number of hidden nodes on fitness evaluation.

C. Subnet Crossover Operator

The generated subnet Gen(i) of the node i is defined by

Pi

Hi

Mi

iiCI

iiCOiCI

iCO

iGen

,

,

,

)(

)()(

)(

)( (5)

where M is the set of input nodes, H is the set of hidden nodes, P is the set of output nodes, CI(i) is the set of all input connections of the node i, CO(i) is the set of all output connections of the node i .

The generated subnet of an input node is the set of its all output connections. The generated subnet of a hidden node is the set of itself, its all input and output connections. The generated subnet of an output node is the set of itself and its all input connections. The connection information includes whether or not the connection exists and the connection weight value. The node information includes bias and activation function.

In this paper, the subnet crossover operator is employed in consideration of the relationship between genotype and phenotype.

The overall procedure of crossover operator is described as follows:

1. Select randomly a node i∈M∪H∪P. 2. Cross the generated subnet Gen(i) of node i.

D. Heuristic Mutation Operator

1) Adaptive mutation rate Adaptive mutation rate is defined by

05.0)1()( MGgP (6) where g denotes the current generation, MG denotes the maximum number of generations for which the population is not optimized continuously, u denotes a user-defined parameter which is used to control the increase of the mutation rate.

If there are better solutions at each generation than the previous generation, MG is equal to 0, and P(g) is equal to 0.05. It can be considered that the crossover operator is working efficiently, and the mutation operator is not necessary for evolution. The more generations for which the population is not be enhanced, the larger MG is. This results in the increase of mutation rate P(g). Here, as the mutation rate increases, it can be considered that the evolutionary procedure by the crossover operator may be trapped into local minima and it becomes more difficult to search the optimal solution only by the crossover operator, and the mutation operator should be used to extend the space of solutions.

2) Heuristic selection of mutation operations

input layer hidden layer

hidden layer *

0 *

output layer * *

(a)

1 … 0 0 … 1

1 … m m+1 … m+h

(b)

Fig. 1. Representation scheme. (a) Connection matrix. (b) Node vector

511

The mutation operator is composed of three operations: weight adaptation, node deletion and node addition.

Weight adaptation is implemented by perturbing the weights of neural network using Gaussian noise, which is described by

))(,0( gPaNww (7)

where ))(,0( gPaN is Gaussian random variable with mean 0 and standard deviation P(g), w is a weight, P(g) is the adaptive mutation rate defined in (6), a is a user-defined constant used to scale P(g).

Node deletion means setting all connections on the deleted node 0, and node addition means assigning connection weights between the added node and the other relative nodes. Under the encoding scheme in this paper, deleting a node is to set the corresponding columns and rows 0, and adding a node is to assign random values to the corresponding columns and rows.

The mutation operator is defined by

)(

)(

)(

gPnNA

ngPmND

mgPWA

Operator (8)

where WA denotes the adaptation of connection weights, ND denotes the node deletion, NA denotes the node addition, P(g) is the adaptive mutation rate defined in (6), m and n are the user-defined parameter which satisfy 0<m<n<1. Obviously, which mutation operation is performed depends highly the mutation rate, and mutation rate is determined by the maximum number of generations the population do not get better. Hence, the adaptive mutation rate given in (6) is the foundation of the total mutation operator. When the subnet crossover operator is working efficiently and improving the performance of the neural network continuously, MG and P(g) are very small, the weight adaptation will be performed. As the efficiency of the crossover operator becomes worse, MG and P(g) becomes larger, the node deletion and addition will be carried out. The node deletion has more priority than the node addition to be apt to smaller network structure.

3) The heuristic mutation procedure The heuristic mutation operator is described as follows:

1: Calculate the adaptive mutation rate P(g); 2: While ( each individual in the population) 3: If (U(0,1)< P(g)) 4: Select a mutation operation from weight

adaptation, node deletion and node addition according to (8);

5: If (weight adaptation is selected) 6: Uniformly select a connection weight; 7: Adapt the selected weight with (7); 8: Else If (node deletion is selected) 9: Uniformly select a node, not all

connection weights on which is 0; 10: Set all connection weights to 0 between

the node and the other nodes; 11: Else If (node addition is selected) 12: Uniformly select a node, all connection

weights on which is 0; 13: Assign the random values between -1

and 1 to the connections between the node and the other nodes;

14: End If 15: End If 16: End while

E. Overall evolutionary framework

The initial population of the neural networks is generated with random weights and full connection. First, all initial or selected individuals are evaluated with the fitness function, and then, the best ones are selected from the parent and child individuals. Subsequently, the subnet crossover operator is performed on the selected individuals. Finally, the proposed heuristic mutation operator is carried out.

The overall evolutionary procedure is described as follows: 1. Initialize: Randomly generate an initial population of

neural networks. 2. Evaluation: Evaluate all individuals of the initial

population according to the fitness function. 3. While (Stopping conditions are not satisfied) 4. Selection: Select best n individuals from all parent

and child individuals according to the fitness function.

5. Crossover: Perform the subnet crossover operator on the selected individuals.

6. Mutation: Perform the heuristic mutation operator on the selected individual.

7. Evaluation: Evaluate all evolved individuals. 8. End While

III. EXPERIMENTS

A. Datasets

The experiments were performed on the KDD Cup 1999 dataset to assess the effectiveness of the proposed heuristic genetic neural network(HGNN). The dataset is classified into five major categories: Normal, denial of service (DOS), remote to local (R2L), user to root (U2R) and Probe. A 10% selection from training dataset is used to train neural network. The numbers of samples of various types in the training dataset are listed in Table 1. After the neural network is evolved, the best one is picked up from the population and tested with the labeled test dataset. The numbers of samples of various types in the test dataset are listed in Table 2, where “known” denotes the attacks whose types are available in the training set, while “unknown” represents the attacks whose types do not appear in the training set.

B. Results and Analysis

Firstly, the proposed HGNN was compared with the JENN and the constrained JENN in [15]. The JENN evolved input features, network structure and connection weights of neural network jointly using random mutation operator. The constrained JENN, which used the same evolutionary algorithm as the JENN, only evolves network structure and connection weights simultaneously without considering feature selection by constraining the minimum number of the input nodes to be equal to 41. Table 3 shows the comparative results of the HGNN, the JENN and the constrained JENN. Although the HGNN did not achieve the smaller input features than the JENN, it get smaller hidden nodes, i.e., the compacted network structure.

512

Especially, the total detection rate and false positive rate achieved by the HGNN are superior to that of JENN. It can be seen that the HGNN using heuristic operator has better performance than the JENN using random operator.

Secondly, the HGNN is compared with some other neural network based methods such as RWNN [8], BMPNN [9] and ENN [10], which do not optimize all aspects of input features, network structure and connection weights. The comparative results are shown in Table 4. The detection rates for DOS, Probe and U2R achieved by the HGNN are 98.28%, 96.39 and 55.17 respectively, and are superior to that by other methods. More importantly, the false positive rate achieved by the HGNN is only 1.14% and much lower than that by other methods. The disadvantage of the HGNN is that the detection rates for R2L attacks are 60.32%, and are inferior to that by RWNN, which may result from few training samples of R2L. However, the HGNN still outperforms other neural network based methods as a whole. This results from the united consideration of input features, network structure and connection weights, because these three aspects have join contributions to the performance of neural network.

IV. CONCLUSIONS

A heuristic genetic neural network was proposed to improve the performance of intrusion detection, in which input features, network structure and connection weights were evolved jointly. The experimental results showed that the HGNN accomplishes feature selection, structure optimization and weight adaptation effectively. Through the comparative analysis, it can be seen that the proposed HGNN achieves better detection performance and compacter structure of neural network.

REFERENCES [1] S. J. Han and S. B. Cho, “Evolutionary neural networks for

anomaly detection based on the behavior of a program”, IEEE Transactions on Systems, Man, and Cybernetics, Part B, Vol. 36, No. 3, pp. 559-570, Jun. 2006.

[2] S.B. Cho, “Incorporating Soft Computing Techniques Into a Probabilistic Intrusion Detection System”, IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 32, no. 2, pp. 154 -160, May 2002.

[3] W. Hu, W. Hu, S. Maybank,, “AdaBoost-Based Algorithm for Network Intrusion Detection”, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 38, no. 2, pp. 577-583, Apr. 2008.

[4] L. Li, D. Yang, and F. Shen, “A novel rule-based Intrusion Detection System using data mining”, Proceedings of 2010 IEEE International Conference on Computer Science and Information Technology (ICCSIT), vol.6, pp.169-172, July 2010.

[5] G. Zhu and J. Liao, “Research of Intrusion Detection Based on Support Vector Machine”, Proceedings of International Conference on Advanced Computer Theory and Engineering, pp. 434-438, Dec. 2008.

[6] W. Tian and J. Liu, “Network intrusion detection analysis with neural network and particle swarm optimization algorithm”, Proceedings of Chinese Control and Decision Conference (CCDC), pp.1749-1752, May 2010.

[7] H. Deng and Y. Wang, “An artificial-neural-network-based multiple classifiers intrusion detection system”, Proceedings of International Conference on Wavelet Analysis and Pattern Recognition, vol. 2, pp. 683-686, Nov. 2007.

[8] L. Yu, B. Chen, and J. Xiao, ”An Integrated System of Intrusion Detection Based on Rough Set and Wavelet Neural Network”, Proceedings of the Third International Conference on Natural Computation, Vol. 3, pp. 194-199, Aug. 2007.

[9] T.P. Tran and T. Jan, “Boosted Modified Probabilistic Neural Network (BMPNN) for Network Intrusion Detection”, Proceedings of International Joint Conference on Neural Networks, Vancouver, pp. 2354-2361, Jul. 2006.

[10] E. Michailidis, S.K. Katsikas, and E. Georgopoulos, “Intrusion Detection Using Evolutionary Neural Networks”, Proceedings of Panhellenic Conference on Informatics, pp. 8-12, Aug. 2008.

[11] Buchtala, M. Klimek, and B. Sick, “Evolutionary optimization of radial basis function classifiers for data mining applications”, IEEE Transactions on Systems, Man, and Cybernetics, Part B, Vol. 35, No. 5, pp. 928-947, Oct. 2005.

[12] R. J. May, H. R. Maier, G. C. Dandy, and T. M. K. G. Fernando, “Non-linear variable selection for artificial neural networks using partial mutual information”, Environmental Modelling & Software, Vol. 23, No. 10-11, pp. 1312-1326, Oct. 2008.

[13] N. Li, Z. Xie, J. Xie, and S. Chen, “SEFNN - A Feed-Forward Neural Network Design Algorithm Based on Structure Evolution”, Journal of Computer Research and Development, Vol. 43, No. 10, pp. 1713-1718, 2006. (in Chinese)

[14] R. Li, Z. Wang, “A Structure-Adaptive Approach for Neural-Network-Based Feature Selection”, Journal of Computer Research and Development, Vol. 39, No. 12, pp. 1613-1617, Dec. 2002. (in Chinese)

[15] B. Zhang and X. Jin, “A joint evolutionary neural network for intrusion detection”, Proceedings of 2009 International Conference on Information Engineering and Computer Science (ICIECS 2009), Wuhan, China, pp.1-4, Dec. 2009.

Table 3. Comparison of the proposed algorithm, the JENN and the constrained JENN

MethodNum. of

input featuresNum. of

hidden nodes Total detection

rate (%) False positive

rate (%) constrained

JENN41 16 87.46 3.45

JENN 15 10 91.51 1.31

HGNN 15 9 92.86 1.14

Table 4. Comparison between the proposed algorithm and other methods

Method Detection rate (%) False positive

rate (%) DOS Probe R2L U2R

RWNN 95.53 91.30 80.01 54.70 9.06

BMPNN 96.78 96.05 48.47 38.62 3.12

ENN 97.74 92.20 8.30 52.86 4.83

HGNN 98.28 96.39 60.32 55.17 1.14

Table 1. Number of Samples of Various Types in Training Set

Normal DOS Probe R2L U2R Total

97278 391458 4107 1126 52 494021

Table 2. Number of Samples of Various Types in Test Set

Normal DOS Probe R2L U2R Total

Known 60593 223298 2377 5993 39 292300

Unknown 0 6555 1789 10196 189 18729

Total 60593 229853 4166 16189 228 311029

513

[IEEE information Services (ICICIS) - Hong Kong (2011.09.17-2011.09.18)] 2011 International...

Documents

Transcript of [IEEE information Services (ICICIS) - Hong Kong (2011.09.17-2011.09.18)] 2011 International...