Fuzzy Entropy based feature selection for classification of hyperspectral data Mahesh Pal Department...

Fuzzy Entropy based feature selection for classification of hyperspectral data

Mahesh Pal

Department of Civil Engineering

National Institute of Technology

Kurukshetra

Hyperspectral data1.Measurement of radiation in the visible to the infrared spectral region in many finely spaced spectral wavebands.

2.Provide greater detail on the spectral variation of targets than conventional multispectral systems.

3.The availability of large amounts of data represents a challenge to classification analyses.

4.Each spectral waveband used in the classification process should add an independent set of information. However, features are highly correlated, suggesting a degree of redundancy in the available information which can have a negative impact on classification accuracy.

An example:

MULTISPECTRAL DATA Discrete wave-bands for example Landsat 7

Band 1- 0.45-0.515 µmBand2- 0.525-0.605 µm

Between 0.45 -2.235 µm - A total of six bands

HYPERSPECTRAL DATADAIS data: Between 0.502-2.395 µm - A total of 72 bands

Continuous bands at 10-45 nm bandwidth

0.4-0.7 µm – visible, 0.7-1.3 µm- NIR, 1.0-3.0 µm-MIR, 3-100 µm- Thermal

Various approaches could be adopted for the appropriate classification of high dimensional data:

1.Adoption of a classifier that is relatively insensitive to the

Hughes effect (Vapnik, 1995).

2.Using a methods to effectively increase training set size i.e.

semi-supervised classification (Chi and Bruzzone, 2005) and use

of unlabelled data (Shahshahani and D. A. Landgrebe, 1994)

3. Use of some form of dimensionality reduction procedure prior

to the classification analysis.

Feature reduction1. Two broad categories are: feature selection and feature

extraction.

2. Feature reduction may speed-up the classification process by

reducing data set size.

3. May increase the predictive accuracy.

4. May increase the ability to understand the classification rules.

5. feature selection select a subset of the original features those

maintains the useful information to separate the classes by

removing redundant features.

Feature selectionThree approaches of feature selection are:

Filters: uses a search algorithm to search through the space of possible

features and evaluate each feature by using a filter such as correlation and

mutual information

Wrappers: uses a search algorithm to search through the space of

possible features and evaluate each subset by using a classification

algorithm.

Embedded: some classification processes such as random forest produce

a ranked list of features during classification.

This study aims to explore the usefulness of four filter based feature selection

approaches.

Feature selection approaches

Four filter based feature selection approaches were used.

1.Entropy

2.Fuzzy entropy

3.Signal-to-noise ratio

4.RELIEF

For a finite set ,if P is the probability distribution on X, Yager’s entropy is defined by:

n21 xxxX ,,.........,

xxpx πlogH(X) 2

For a given fuzzy information system defined by (U, A, V, f), where U is a finite

set of objects (Hu and Yu, 2005), A is set of features i.e.

If Q is a subset of attribute set A, and is the fuzzy relation matrix by an

indiscernibility relation

The significance of a is defined as ,

Significance

If significance , attribute a is considered redundant.

Further details of this algorithms can be found in Hu and Yu (2005).

cnaaa ,,........., 21A

BF

Qa

)-H(Q-H(Q)=)-Q|()-Q|( aaaHaa

0)-Q|( aa

Entropy and Fuzzy Entropy

Signal to noise ratio

2121 classclassclassclass stddevstddevmeanmean

This approach rank all features in order to define how well a feature

discriminates between two classes. In order to use this approach for

multiclass classification problem, one against one approach was used in

this study.

RELIEF The general idea of RELIEF is to choose the features that can be most

distinguished between classes.

At each step of an iterative process, an instance is chosen at random from the dataset and the weight for each feature is updated according to the distance of this instance to its Near-miss and Near-hit (Kira and Rendell, 1992).

An instance from the dataset will be a near-hit to X, if it belongs to the close neighbourhood of X and belongs to the same class as that of X.

An instance would be called a near-miss if belongs to the neighbourhood of X but not to the same class as that of X.

Data Set1. DAIS 7915 sensor by German Space Agency flown on 29 June

2000.

2. The sensor acquire information in 79-bands at a spatial resolution of

5m in the wavelength range of 0.502–12.278 µm.

3. 7 features located in the mid- and thermal infrared region and 7

features from spectral region of 0.502 – 2.395 µm due to striping

noise were removed.

4. An area of 512 pixels by 512 pixels and 65 features covering the test

site was used.

1. Random sampling was used to collect train and test using a

ground reference image.

2. Eight land cover classes i.e. wheat, water, salt lake, hydrophytic

vegetation, vineyards, bare soil, pasture and built-up land.

3. A total of 800 training pixels and a total of 3800 test pixels was

used.

Training and test data

Classification Method

1. Support vector machines using one against one approach for

multiclass data was used.

2. Radial basis function kernel was used.

3. Regularisation parameter (C) =5000 and Gamma =2 was used.

4. In all feature selection approach classification accuracy with test

dataset was obtained.

5. Test for non-inferiority using McNemar test was used.

Feature selection method Selected feature

Entropy 32, 51, 63, 35, 8, 49, 42, 27, 48, 64, 6, 50, 65, 11, 53, 39, 22

Fuzzy entropy 32, 41, 50, 6, 27, 63, 36, 49, 10, 22, 65, 51, 40, 48

Relief 3, 4, 2, 11, 10, 5, 8, 6, 9, 7, 12, 1, 13, 23, 22, 25, 24, 20, 31, 30

Signal to noise ratio 5, 7, 8, 9, 6, 10, 11, 4, 12, 3, 32, 31, 33, 30, 24, 23, 25, 29, 13, 26

Selected features with different feature selection approaches

Feature selection methodNumber of features used in

classificationClassification accuracy (%)

No feature selection 65 91.76

Fuzzy entropy 14 91.68

Entropy 17 91.61

Signal to noise ratio 20 91.68

Relief 20 88.61

Classification accuracy with SVM classifier with different selected features

Number of features

Accuracy (%)Difference in accuracy (%)

95% confidence interval

Conclusion(at 0.05 level of significance)

65 91.76 0.00 0.000-0.000 -

14 91.68 0.36 0.071-0.089 Non-inferior

17 91.61 0.13 0.142-0.158 Non-inferior

20 91.68 0.26 0.071-0.089 Non-inferior

20 88.61 3.00 3.140-3.160 Inferior

Difference and non-inferiority test results based on 95% confidence interval on the estimated difference in accuracy from the accuracy achieved with 65 features and the feature sets selected using different approach.

Conclusions

Fuzzy entropy based feature selection approach works well with this dataset and provides comparable performance with small number of selected features.

Accuracy achieved by signal to noise ratio and entropy based approaches is also comparable to that is achieved with full dataset but require more number of selected features than fuzzy entropy based approach.

Results with Relief based approach show a significant decline in classification accuracy in comparison to full dataset.

Fuzzy Entropy based feature selection for classification of hyperspectral data Mahesh Pal Department...

Documents

Transcript of Fuzzy Entropy based feature selection for classification of hyperspectral data Mahesh Pal Department...