Paper-1 Significance of One-Class Classification in Outlier Detection

download Paper-1 Significance of One-Class Classification in Outlier Detection

of 11

Transcript of Paper-1 Significance of One-Class Classification in Outlier Detection

  • 7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection

    1/11

    International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823

    4

    Significance of One-Class Classification in Outlier Detection

    Anandkumar Prakasam1

    Nickolas Savarimuthu2

    1ROOT Research Consultancy, Tiruchirappalli, Tamilnadu, India.

    [email protected] Professor, National Institute of Technology, Tiruchirappalli, Tamilnadu, India.

    [email protected]

    Abstract

    Outlier or Novelty detection is one of the important aspects in the current learning scenario. It helps to

    discover unknown knowledge from the available data. Novelty detection is also called outlier detection, since in

    some areas; these data are considered out-of-the ordinary and need to be eliminated. The current paper discusses

    various methods for detecting the novelties and compares them to obtain the best method that can be used fornovelty detection. The current paper considers Generalized Extreme Studentized Deviate (GESD); an outlier

    detection method, SVM; a binary classifier, Nave Bayes Classifier; a multi-class classifier and a one-class

    classification method. Results reveal that the one-class classification method provides the best results in most of the

    scenarios, where the availability of training data for the outliers is minimal, and sometimes not available.

    Keywords

    One-Class Classification; GESD; SVM; Nave Bayes; Outlier Detection; Novelty Detection; Classification

    1. IntroductionMulti-class classification is one of the majorly used algorithms in data mining. However, sometimes it is not

    necessary to classify the data into multiple classes. If there is only one class that we are interested in, it is sufficient

    that the one specific class is separated from the rest of the data. This kind of data mining is called one-class

    classification. Usually, in one-class classification, the data instances that do not belong to the normal data or to the

    majority of the data are separated.

    For example, credit card frauds [1] or intrusion detection [2] can be regarded as anomaly detection, while

    detecting of previously unobserved patterns in data can be regarded as novelty detection [3] [4]. Such kind of

    novelty detection techniques can be used, for example, for detecting a new discussion topic in news groups. The

    difference between anomaly and novelty detection is that often the novelty detection method includes the discovered

    novelty patterns into the model [5]. One-class classification can be considered, for example, if the number of

    anomalies is much smaller than the number of normal data instances[21][22].

  • 7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection

    2/11

    International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823

    5

    1.1 Anomaly Detection TechniquesAnomaly detection techniques could be divided into four different categories: classification, distance-based,

    statistical and other techniques. Classification based anomaly detection techniques uses a model to predict if the test

    instance is normal or anomalous. In general, a set of training data is provided and then the system is provided with

    actual data for performing the classification process. If there exists multiple normal classes, then it is considered as

    multi-class [6] and if there is only one normal class, it is considered as one-class anomaly detection technique [5].

    Both of these classifiers are trained with only the normal data instances, so they belong to semi-supervised learning

    methods. Typical classification based anomaly detection techniques are, for example, Bayesian networks [7], neural

    networks [8], rule-based techniques [7] and support vector machines [9]. Distance based techniques use the distance

    between points as a basic measure in detecting anomalies. Statistical techniques formulate statistical models to a

    given data set that can calculate probabilities to test instances and declare if they are anomalous or normal.

    Support vector machine is a method used in pattern recognition and classification. It is a classifier to predict or

    classify patterns into two categories; fraudulent or non fraudulent. It is well suited for binary classifications. As any

    artificial intelligence tool, it has to be trained to obtain a learned model. SVM has been used in many classification

    pattern recognition problems such as text categorization, and face detection. SVM is correlated to and having the

    basics of non-parametric applied statistics, neural networks and machine learning. [10][11][12][13].

    SVM weight implements cost sensitive learning. Similar to SVM, the weighted SVM is used to maximize the

    margin of separation and minimize the classification error. The margin boundary is used to separate the classes. In

    CS-SVM different weights are assigned to the classes. Effective decision boundary is learned by adjusting the

    weights of the different classes. It improves the accuracy of the prediction rate.

    All of the anomaly detection techniques have their strengths and weaknesses and there is no single technique, which

    is suitable for all the situations. We will have to analyze its suitability for a particular application and aptly choose

    one.

    In the current paper, we provide a comparative study on GESD (Generalized Extreme Studentized Deviate), Nave

    Bayes and SVM Classifiers with one-class classifiers. The rest of this paper is organized as follows. Section 2

    provides an overview of One-Class Classification technique used for the comparison study. Section 3 describes the

    datasets used, Section 4 describes the experimental results and Section 5 concludes the study.

    2. One-Class Classification : An OverviewThe traditional methods of classification have always been those that use all data classes to build models. Such

    models are discriminatory in nature, since they learn to discriminatebetween classes. However, many real worldsituations are such that it is only possible to have data from one class, the target class; data from other classes, the

    outlier classes, is either very difficult or impossible to obtain. Examples for such problems include Fraud Detection,

    Medicine, Machine fault detection, Wireless Sensor Networks, Intrusion detection and Object recognition such as

    Face Detection. These are the situations where one-class classification plays a major role in detection of anomalies.

  • 7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection

    3/11

  • 7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection

    4/11

    International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823

    7

    correctly, good performances can be achieved. These parameters are often called the magic parameters because

    they often have a big influence on the final performance and no clear rules are given how to set them. These

    numbers cannot be intuitively given beforehand, and only by trial and error a reasonable network size is found.

    Computation and storage requirements: A final consideration is the computational requirements of the methods.

    Although computers offer more power and storage every year, methods which require several minutes for the

    evaluation of a single test object might be unusable in practice. Since training is often done off-line, training costs

    are often not very important. However, in the case of adapting the method to a changing environment, these training

    costs might become very important. The most straightforward method to obtain a one-class classifier is to estimate

    the density of the training data and to set a threshold on this density.

    In our approach we have used the GESD for outlier detection, SVM and Nave Bayes classifier for multi-class

    classification and the one-class classification method described in [15] for analysis of outliers.

    3. Dataset Description3.1. KEEL DatasetThe KEEL Datasets [23] available under the category of Classification was used for the analysis. The dataset

    includes target instances as well as outlier instances. All the datasets are binary problems and they do not contain

    any missing attributes. No categorical attributes are considered, and all the attributes available in the datasets are

    numerical.

    Table 1: Dataset Description

    Name No of InstancesBanana 5300

    Phoneme 5404

    Appendicitis 106

    Titanic 2201

    Mammographic 830

    4. Experimental Results and Discussion4.1.

    Results on KEEL Dataset

    Analysis of the results on various datasets shows that the One-Class Classifiers detect more outliers, when compared

    to other classification methods. While the SVM shows an almost similar performance, its performance is slightly

    lower than one-class classifiers. Figure 1-5 shows the performance of GESD, Nave Bayes, SVM and One-Class

    Classifiers on various datasets.

  • 7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection

    5/11

    International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823

    8

    Figure 1: Banana Dataset

    Figure 2: Phoneme Dataset

    0

    20

    40

    60

    80

    100

    120

    140

    Banana

    N

    o

    o

    f

    O

    u

    t

    l

    i

    e

    r

    s

    DataSet

    GESD

    NaiveBayes

    SVM

    OCC

    0

    50

    100

    150

    200

    250

    300

    350

    Phoneme

    N

    o

    o

    f

    O

    u

    t

    l

    i

    e

    r

    s

    DataSet

    GESD

    NaiveBayes

    SVM

    OCC

  • 7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection

    6/11

    International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823

    9

    Figure 3: Appendicitis Dataset

    Figure 4: Titanic Dataset

    0

    10

    20

    30

    40

    50

    60

    Appendicitis

    N

    o

    o

    f

    O

    u

    t

    l

    i

    e

    r

    s

    DataSet

    GESD

    NaiveBayes

    SVM

    OCC

    0

    510

    15

    20

    25

    30

    35

    40

    Titanic

    N

    o

    o

    f

    O

    u

    t

    l

    i

    e

    r

    s

    DataSet

    GESD

    NaiveBayes

    SVM

    OCC

  • 7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection

    7/11

    International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823

    10

    Figure 5: Mammographic Dataset

    Experiments were considered by varying the number of outliers during the training phase and then observing the

    performance of each method. Since GESD is a statistics based outlier detection method, its performance deteriorated

    slightly. While considering Nave Bayes and SVM, they rely completely on the training data while classifying data,

    hence as the number of outliers in the training data reduces, the detection rate of SVM and Nave Bayes drops

    drastically. The one-class classifier relies completely on the normal data and not the anomalies, hence the drop in

    data detection rate is very low when compared to other methods.

    Figure 6: Banana Dataset (X-Axis: percentage of outliers in training dataset & Y-Axis: number of outliers

    detected)

    0

    10

    20

    30

    40

    50

    60

    Mammographic

    N

    o

    o

    f

    O

    u

    t

    l

    i

    e

    r

    s

    DataSet

    GESD

    NaiveBayes

    SVM

    OCC

    0

    20

    40

    60

    80

    100

    120

    140

    100 90 80 70 60 50 40 30 20 10

    Banana

    GESD

    NaiveBayes

    SVM

    OCC

  • 7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection

    8/11

    International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823

    11

    Figures 6-10 represent the detection rates of various algorithms, when we increased the imbalance in training

    dataset. Analysis of the datasets show consistent performance of the one-class classifier, while deviations are

    observed in the other detection methods.

    Figure 7: Phoneme Dataset (X-Axis: Percentage of outliers in training dataset & Y-Axis: number of outliers

    detected)

    Figure 8: Appendicitis Dataset(X-Axis: Percentage of outliers in training dataset & Y-Axis: number of

    outliers detected)

    0

    50

    100

    150

    200

    250

    300

    350

    400

    100 90 80 70 60 50 40 30 20 10

    Phoneme

    GESD

    NaiveBayes

    SVM

    OCC

    0

    10

    20

    30

    40

    50

    60

    100 90 80 70 60 50 40 30 20 10

    Appendicitis

    GESD

    NaiveBayes

    SVM

    OCC

  • 7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection

    9/11

    International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823

    12

    Figure 9: Titanic Dataset(X-Axis: Percentage of outliers in training dataset & Y-Axis: number of outliers

    detected)

    Figure 10: Mammographic Dataset(X-Axis: Percentage of outliers in training dataset & Y-Axis: number of

    outliers detected)

    0

    5

    10

    15

    20

    25

    30

    35

    40

    100 90 80 70 60 50 40 30 20 10

    Titanic

    GESD

    NaiveBayes

    SVM

    OCC

    0

    10

    20

    30

    40

    50

    60

    100 90 80 70 60 50 40 30 20 10

    Mammographic

    GESD

    NaiveBayes

    SVM

    OCC

  • 7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection

    10/11

    International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823

    13

    4.2. DiscussionsFrom figures 6-10, where we analyses and compared the detection rates of various methods, we could infer from the

    slope of the line that One Class Classification approach doesnt show much difference when the imbalance in the

    training dataset is increased to a greater extent. SVM a popular binary classifier showed considerable performance

    initially but later deteriorated when higher levels of imbalance were introduced. The slope of GESD, which is

    completely based on statistical approach and One Class Classification, which doesnt rely on them, remained

    constant showing the fact that they are not affected by increasing imbalance in the training datasets. This shows that

    One Class Classifier could be used as a reliable technique when class imbalance is very high in the training data.

    5. ConclusionOne-class classification becomes significant when in a conventional classification problem, classes in the (training)

    data are highly imbalanced, i.e. one of the classes is severely under-represented due to the measuring costs for that

    class caused by the low frequency of occurrence. It might also appear that it is completely unclear, what the

    representative distribution of the data is. Since most of the real time data do not contain instances for anomalies, thisfails in most cases. Multi-class classification methods in general require training set containing samples for both

    legitimate data and the anomalies. Hence, as the number of anomalies starts decreasing, the accuracy of these

    methods will deteriorate. While considering the outlier detection methods, the false positive rates of these methods

    are high, and are not reliable when considering real-time scenarios. Hence one-class classification proves to be the

    best approach for usage in real-time scenarios where the occurrence of anomalies is limited and data about the

    anomalies could not be obtained. Feature Selection[17][18][20] can be incorporated into one-class classification

    scenarios to provide better and optimal results. This also helps to avoid unimportant parameters and provide

    increased accuracy rate. Selective sampling[19] can also be used for reducing the cost, by labeling only the

    important parameters.

    References

    [1] R. Brause, T. Langsdorf and M. Hepp, Neural Data Mining for Credit Card Fraud Detection, in Proceedingsof the 11th IEEE International Conference on Tools with Artificial Intelligence, pages 103 - 106, Washington,

    DC, USA, 1999.

    [2] F. A. Gonzlez and D. Dasgupta, Anomaly Detection Using Real-Valued Negative Selection, in GeneticProgramming And Evolvable Machines, Volume 4, Issue 4, pages 383 - 403, Klower Academic Publishers

    Hingham, MA, USA, December 2003.

    [3] M. Markou and S. Singh, Novelty Detection: A Review - Part 1: Statistical Approaches, Signal Processing,Volume 83, Issue 12, pages 2481 - 2497, December 2003.

    [4] M. Markou and S. Singh, Novelty Detection: A Review - Part 2: Neural Network Based Approaches, SignalProcessing, Volume 83, Issue 12, pages 2499 - 2521, December 2003.

    [5] V. Chandola, A. Banerjee and V. Kumar, Anomaly Detection: A Survey, in ACM Computing Surveys,Volume 41, Number 3, Article 15, ACM, New York, USA, July 2009.

    [6] D. Barbar, N. Wu and S. Jajodia, Detecting novel network intrusions using Bayes estimators, in Proceedingsof the first SIAM Conference on Data Mining, Chicago, April 2001.

  • 7/28/2019 Paper-1 Significance of One-Class Classification in Outlier Detection

    11/11

    International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6ISSN: 1837-7823

    14

    [7] Ethem Alpaydin, Introduction to Machine Learning 2nd edition, pages 109-112 and 489 - 493, The MIT Press,Cambridge, Massachusetts, London, England, 2010.

    [8] Christopher M. Bishop, Neural networks for pattern recognition, Oxford University Press, Oxford 1996.[9] Corinna Cortes and Vladimir Vapnik, Support-vector networks, Machine Learning 20, pages 273 - 297, 1995.[10] Piyaphol Phoungphol, Yanqing Zhang, Yichuan Zha, and Bismita Srichandan, Multiclass SVM with Ramp

    Loss for Imbalanced Data Classification, IEEE International Conference on Granular Computing, 2012.

    [11] Yuchun Tang, Bo Jin, Yi Sun, and Yan-Qing Zhang, Member, Granular Support Vector Machines forMedical Binary Classification Problems, IEEE, 2004.

    [12] Yuchun Tang, Member, IEEE, Yan-Qing Zhang, Member, IEEE, Nitesh V. Chawla, Member, IEEE, and SvenKrasser, Member, SVMs Modeling for Highly Imbalanced Classification, IEEE, 2009.

    [13] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin, A Practical Guide to Support Vector Classification,2010.

    [14] David Martinus Johannes TAX, One-class classification-Concept-learning in the absence of counter-examples, ISBN: 90-75691-05-x, 2001.

    [15] Zineb Noumir, Paul Honeine, Cedric Richard, On Simple One-Class Classification Methods, 2012 IEEEInternational Symposium on Information Theory Proceedings, 978-1-4673-2579-0/12/$31.00 2012 IEEE.

    [16] Defeng Wang, Daniel S. Yeung, Structured One-Class Classification, IEEE Transactions On Systems, Man,And CyberneticsPart B: Cybernetics, Vol. 36, No. 6, December 2006

    [17] Young-Seon Jeong, In-Ho Kang, Myong-Kee Jeong, Dongjoon Kong, A New Feature Selection Method forOne-Class Classification Problems, IEEE Transactions On Systems, Man, And CyberneticsPart C:

    Applications And Reviews, Vol. 42, No. 6, November 2012.

    [18] George Gomes Cabral, Adriano Lorena Inacio de Oliveira,A Novel One-Class Classification Method Basedon Feature Analysis and Prototype Reduction, 978-1-4577-0653-0/11/$26.00 2011 IEEE

    [19] Piotr Juszczak, Robert P.W. Duin ,Selective Sampling Methods in One-Class Classification Problems, O.Kaynak et al. (Eds.): ICANN/ICONIP 2003, LNCS 2714, pp. 140148, 2003.

    [20] David M.J. Tax, Klaus-R. Muller ,Feature Extraction for One-Class Classification, O. Kaynak et al. (Eds.):ICANN/ICONIP 2003, LNCS 2714, pp. 342349, 2003.

    [21] Kathryn Hempstalk, Eibe Frank ,Discriminating Against New Classes: One-class versus Multi-classClassification, W. Wobcke and M. Zhang (Eds.): AI 2008, LNAI 5360, pp. 325336, 2008.

    [22] Kenneth Kennedy, Brian Mac Namee, Sarah Jane Delany,Learning without Default: A Study of One-ClassClassification and the Low-Default Portfolio Problem, L. Coyle and J. Freyne (Eds.): AICS 2009, LNAI

    6206, pp. 174187, 2010.

    [23] Knowledge Extraction based on Evolutionary Learning Datasets: http://sci2s.ugr.es/keel/datasets.php.