Research Article Subspace Learning via Local Probability...

18
Research Article Subspace Learning via Local Probability Distribution for Hyperspectral Image Classification Huiwu Luo, Yuan Yan Tang, and Lina Yang Department of Computer and Information Science, University of Macau, Avenida Padre Tomas Pereira, Macau Correspondence should be addressed to Yuan Yan Tang; [email protected] Received 18 November 2014; Accepted 7 January 2015 Academic Editor: Zhike Peng Copyright © 2015 Huiwu Luo et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. e computational procedure of hyperspectral image (HSI) is extremely complex, not only due to the high dimensional information, but also due to the highly correlated data structure. e need of effective processing and analyzing of HSI has met many difficulties. It has been evidenced that dimensionality reduction has been found to be a powerful tool for high dimensional data analysis. Local Fisher’s liner discriminant analysis (LFDA) is an effective method to treat HSI processing. In this paper, a novel approach, called PD-LFDA, is proposed to overcome the weakness of LFDA. PD-LFDA emphasizes the probability distribution (PD) in LFDA, where the maximum distance is replaced with local variance for the construction of weight matrix and the class prior probability is applied to compute the affinity matrix. e proposed approach increases the discriminant ability of the transformed features in low dimensional space. Experimental results on Indian Pines 1992 data indicate that the proposed approach significantly outperforms the traditional alternatives. 1. Introduction With the rapid technological advancement of remote sensing, the technology of high dimensional data analysis has been gotten forward. With the great demand of need and appetite to automatical process, the remote sensing data in very high dimensional space, a series of analytical methods, and applicable toolkits were engendered one aſter another. Hyperspectral imaging (HSI) typically have hundreds even thousands of electromagnetic spectral bands for each pixel, and these bands are oſten highly correlated. To make full use of rich spectrum and to enable effective processing of HSI data, it is oſten dramatic to extract useful fea- tures, preventing negative effect and precaution caused by redundant data. Dimensionality reduction is an efficient technique to eliminate the redundance among data samples. Dimensionality reduction also eliminates the effects brought by the uncorrelated features and simultaneously “selects” or “extracts” the features that are beneficial to precious classification. To be specific, the aim of dimensionality reduc- tion is to decrease computational complexity and ameliorate statistical ill-conditioning by discarding redundant features that potentially deteriorate classification performance [1]. Nevertheless, how to suppress the redundance and preserve the most valuable features still remains an open topic in the community of high dimensional data analysis. Dimension reduction techniques can be roughly catego- rized into linear approaches and nonlinear ones. e linear approaches include principal component analysis (PCA) [2], random projection (RP) [3], linear discriminant analysis (LDA) [4], and locality preserving projection (LPP), whereas the nonlinear approaches include isomap mapping (Isomaps) [5, 6], diffusion maps (DMaps) [7], and locally linear embed- ding (LLE) [8]. e common drawback of nonlinear embedding methods is that these techniques are too expensive to compute HSI data when the size of samples becomes large. For instance, Isomaps employs geodesic distance to measure the distance of data samples rather than Euclidean distance, that is the classical straight-line distance. However, the theory of Isomaps is established on the basis of training samples, which is excessively reliant on the assumption of manifold-like distribution. Meantime, the mapping found by Isomaps is recessive and implicit. For the new data points, the geodesic distance has to be recomputed on the new training set to obtain the low dimensional embedding. ere is no exact Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2015, Article ID 145136, 17 pages http://dx.doi.org/10.1155/2015/145136

Transcript of Research Article Subspace Learning via Local Probability...

Page 1: Research Article Subspace Learning via Local Probability ...downloads.hindawi.com/journals/mpe/2015/145136.pdf · and applicable toolkits were engendered one a er another. Hyperspectral

Research ArticleSubspace Learning via Local Probability Distribution forHyperspectral Image Classification

Huiwu Luo Yuan Yan Tang and Lina Yang

Department of Computer and Information Science University of Macau Avenida Padre Tomas Pereira Macau

Correspondence should be addressed to Yuan Yan Tang yytangumacmo

Received 18 November 2014 Accepted 7 January 2015

Academic Editor Zhike Peng

Copyright copy 2015 Huiwu Luo et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

The computational procedure of hyperspectral image (HSI) is extremely complex not only due to the high dimensional informationbut also due to the highly correlated data structureThe need of effective processing and analyzing of HSI has met many difficultiesIt has been evidenced that dimensionality reduction has been found to be a powerful tool for high dimensional data analysis LocalFisherrsquos liner discriminant analysis (LFDA) is an effective method to treat HSI processing In this paper a novel approach calledPD-LFDA is proposed to overcome the weakness of LFDA PD-LFDA emphasizes the probability distribution (PD) in LFDAwhere the maximum distance is replaced with local variance for the construction of weight matrix and the class prior probability isapplied to compute the affinity matrixThe proposed approach increases the discriminant ability of the transformed features in lowdimensional space Experimental results on Indian Pines 1992 data indicate that the proposed approach significantly outperformsthe traditional alternatives

1 Introduction

With the rapid technological advancement of remote sensingthe technology of high dimensional data analysis has beengotten forward With the great demand of need and appetiteto automatical process the remote sensing data in veryhigh dimensional space a series of analytical methodsand applicable toolkits were engendered one after anotherHyperspectral imaging (HSI) typically have hundreds eventhousands of electromagnetic spectral bands for each pixeland these bands are often highly correlated To make fulluse of rich spectrum and to enable effective processingof HSI data it is often dramatic to extract useful fea-tures preventing negative effect and precaution caused byredundant data Dimensionality reduction is an efficienttechnique to eliminate the redundance among data samplesDimensionality reduction also eliminates the effects broughtby the uncorrelated features and simultaneously ldquoselectsrdquoor ldquoextractsrdquo the features that are beneficial to preciousclassification To be specific the aim of dimensionality reduc-tion is to decrease computational complexity and amelioratestatistical ill-conditioning by discarding redundant featuresthat potentially deteriorate classification performance [1]

Nevertheless how to suppress the redundance and preservethe most valuable features still remains an open topic in thecommunity of high dimensional data analysis

Dimension reduction techniques can be roughly catego-rized into linear approaches and nonlinear ones The linearapproaches include principal component analysis (PCA) [2]random projection (RP) [3] linear discriminant analysis(LDA) [4] and locality preserving projection (LPP) whereasthe nonlinear approaches include isomapmapping (Isomaps)[5 6] diffusion maps (DMaps) [7] and locally linear embed-ding (LLE) [8]

The commondrawback of nonlinear embeddingmethodsis that these techniques are too expensive to compute HSIdata when the size of samples becomes large For instanceIsomaps employs geodesic distance to measure the distanceof data samples rather than Euclidean distance that isthe classical straight-line distance However the theory ofIsomaps is established on the basis of training samples whichis excessively reliant on the assumption of manifold-likedistribution Meantime the mapping found by Isomaps isrecessive and implicit For the new data points the geodesicdistance has to be recomputed on the new training set toobtain the low dimensional embedding There is no exact

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2015 Article ID 145136 17 pageshttpdxdoiorg1011552015145136

2 Mathematical Problems in Engineering

computational expression of new data points in Isomaps It isclear that such computation is explicitly complex and unap-plicable for the large capacity of HSI data For this reasonIsomaps is impractical for the dimensionality reduction ofHSI data Similar drawback occurs in the construction of LLE[7] Recent interests in discovering the intrinsic manifold ofdata structure have been a trend and the theory is still inthe progress of development [9] yet some achievements havebeen gained and reported in many research articles [10]

Nevertheless the linear approaches are efficient to dealwith this issue [11 12] PCA an unsupervised approach findsthe global scatter as the best projected direction with the aimof minimizing the least square error of reconstruction datapoints [13] Due to its ldquounsupervisedrdquo nature the learningprocedure is often blind and the projected direction foundby PCA is usually not the optimal direction [14] LDA isa supervised methodology which absorbs the advantage ofpurpose of learning [15] Toward that goal LDA seeks thedirection that minimizes the classified error However thewithin-class scatter matrix of LDA is often singular when itis applied to the small size of samples [16] Consequentlythe optimal solution of LDA is unable to solve and theprojected direction is failed to achieve These drawbacks willlimit the wide promotion of LDA [4] To cope with thisissue a derived discriminant analysis which puts additionalconstraint on the objective function [17] was proposed insome research papers [18ndash20] for example Join Global andLocal Discriminant Analysis (JGLDA) [21] The commonscheme of these methods is that they are easy to compute andimplement and the mapping is explicit Yet they have showneffeciency in most cases despite the simple models

The linear algorithms would have more advantages inthe dimensionality reduction of HSI data in general As amatter of fact conventional linear approaches such as PCALDA and LPP make the assumption that the distributionsof data samples are Gaussian distribution or mixed Gaussiandistribution However the assumption is often failed [22]since the distribution of real HSI data is preferred to bemultimodal instead of a single modal To be specific thedistribution of HSI data is usually unknown [23] and thesingle Gaussian model or Gaussian mixture model can notcapture the distribution of all landmarks of the HSI datasince the landmarks from different classes are multimodal[24] In this case the conventional methodologies workpoorly In view of this some methods extend the idea ofLDA and formulate extend-LDA algorithms for exampleSugiyama [25] proposed Local Fisher Linear DiscriminantAnalysis (LFDA) formultimodal clusters LFDA incorporatesthe supervised nature of LDA with local description ofLPP and then the optimal projection is obtained under theconstraint of multimodal samples Li et al [1] apply LFDAwith maximum likelihood estimate (MLE) support vectormachine (SVM) and Gaussian mixture model (GMM) toHSI data As reported in his paper LFDA is superior notonly in the computational time but also in the classifiedaccuracy In a word LFDA is especially appropriate forthe landmarks classification of HSI data Nevertheless theconventional LFDA ignores the distribution of data samplesin the construction procedure of affinity matrix

In LFDA the computation of affinity matrix is importantNote that there are clearly many different ways to define anaffinity matrix but the heat kernel derived from LPP hasbeen shown to result in very effective locality preservingproperties [26] In this way the local scaling of data samplesin the 119870-nearest neighborhood is utilized 119870 is a self-tuning predefined parameter To simplify the calculationprocedure of parameters [1 27 28] employing a fixed valueof 119870 = 7 for experiments Note that such calculation mayignore the distribution of data samples in the constructionprocedure of affinity matrix Actually the simplification oflocal distribution by the distance between the samples andthe 119870th nearest neighbor sample may be unreasonable andthe results by using this simplification may raise some error

Thus in this paper to overcome the weakness of con-ventional LFDA a novel approach is proposed where byadopting the local variance of local patch instead of farthestdistance for weight matrix and the class prior probability foraffinitymatrix theweightmatrix of proposed algorithm takesinto account both the distribution of HSI data samples andthe objective function of HSI data after dimension reductionThis novel approach is called PD-LFDA because the prob-ability distribution (PD) is used in LFDA algorithm To bespecific PD-LFDA incorporates two key points namely

(1) The class prior probability is applied to compute theaffinity matrix

(2) The distribution of local patch is represented bythe ldquolocal variancerdquo instead of ldquofarthest distancerdquo toconstruct the weight matrix

The proposed approach essentially increases the discriminantability of transformed features in low dimensional spaceThe pattern found by PD-LFDA is expected to be moreaccurate and is coincide with the character of HSI data andis conducive to classify HSI data

The rest of this paper is organized as follows In the begin-ning of this paper the most basic concepts of conventionallinear approaches related to our work will be introducedin Section 2 Precisely Fisherrsquos linear discriminant analysis(LDA) and locality preserve projection (LPP) as well aslocal Fisher discriminant analysis (LFDA) will be presentedProposed algorithm is developed and formalized in Section 3which is the core of this paper The experimental results withcomparison on real HSI dateset are provided in Section 4Finally we conclude our work in Section 5

2 Related Work

The purpose of linear approaches is to find an optimalprojected direction where the information of embeddingfeatures is preserved as much as possible To formulate ourproblem let 119909

119894be the 119901-dimensional feature in the original

space and let 1199091 1199092 119909

119873 be the119873 samples For the case of

supervised learning let 119897119894be label of 119909

119894 and then the label set

of all samples can be represented by notation 1198971 1198972 119897119873

Suppose that there are119862 classes in all and the sample numberof the 119888th is119873

119888that fulfils the condition119873 = sum119862

119888=1119873119888That is

Mathematical Problems in Engineering 3

the number of all samples is the total sum of each class Let 119909119888119894

be the 119894th sample of the 119888th class Then the correspondingsample mean becomes 119898

119888= (1119873

119888) sum119873119888

119894=1119909119888

119894 yet the data

center of all samples is denoted by 119898 = (1119873)sum119873

119894=1119909119894

Suppose that the data set 119883 in 119901-dimensional hyperspaceis distributed on a low 119902-dimensional subspace A generalproblem of linear discriminant is to find a transformation119879 isin R119901times119902 that maps the 119901-dimensional data into a low119902-dimensional subspace data by 119884 = 119879

119879119883 such that

each 119910119894represents 119909

119894without losing useful information The

transformed matrix 119879 is pursued by different methods anddifferent objective function resulting in different algorithm

21 Fisherrsquos Linear Discriminant Analysis (LDA) LDA intro-duces the within-scatter matrix 119878

119908and between-scatter

matrix 119878119887to describe the distribution of data samples

119878119908=

119862

sum

119888=1

119873119888

sum

119894=1

(119909119888

119894minus 119898119888) (119909119888

119894minus 119898119888)119879 (1)

119878119887=

119862

sum

119888=1

119873119888(119898119888minus 119898) (119898

119888minus 119898)119879 (2)

Fisher criterion seeks a transformation 119879 that maximizedthe between-class scatter while minimized the within-classscatter This can be achieved by optimizing the followingobjective function

119879LDA = arg max119879isinR119901times119902

tr (119879119879119878119887119879)

tr (119879119879119878119908119879) (3)

It is implicitly assumed that 119879119879119878119908119879 is full rank Under

this assumption the problem can then be attributed to thegeneralized eigenvectors 120593

1 1205932 120593

119889 by solving

119878119887120593 = 120582119878

119908120593 (4)

Finally the solution of 119879LDA is given by 119879LDA =

1205931 1205932 120593

119902 which are associated with the first 119902

largest eigenvalues 1205821ge 1205822ge sdot sdot sdot ge 120582

119902 Since the rank of

between-class scatter matrix 119878119887is at most 119862 minus 1 there are

119862minus 1meaningful features in conventional LDA To deal withthis issue a regularization procedure is essential in practice

22 Locality Preserve Projection (LPP) Adrawback of LDA isthat it does not consider the local structure among data points[29] and the distribution of realHSI data is oftenmultimodalLocality preserving projection meets this requirement [30]The goal of LPP is to preserve the local structure of neighbor-hood points Toward this goal a graph ismodeled explicitly todescribe the relationship using 119896-nearest neighborhood Let119860 denote the affinity matrix where 119860(119894 119895) isin [0 1] representsthe similarity between points119909

119894and119909119895The larger the value of

119860(119894 119895) the closer the relationship between 119909119894and119909119895 A simple

and effective way to define affinity matrix 119860 is given by

119860 (119894 119895) =

exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

1205722) if 119909

119894isin KNN (119909

119895 119896)

or 119909119895isin KNN (119909

119894 119896)

0 otherwise(5)

where lowast 2 denotes the square 2-norm Euclidean distance120572 is a tuning parameter and KNN(119909 119896) represents the 119870-nearest neighborhoods of 119909 under parameter 119896

The transformed matrix of LPP is achieved in the follow-ing criterion [31]

119879LPP = arg min119879isinR119901times119902

1

2

119899

sum

119894119895=1

119860 (119894 119895)10038171003817100381710038171003817119910119894minus 119910119895

10038171003817100381710038171003817

2

st 119879119879119883119863119883119879119879 = 119868

(6)

where 119863 = diag(119863119894119894) is a diagonal matrix whose entries are

the column sum (also can be a row sum since119860 is symmetric)of 119860 that is 119863

119894119894= sum119895119860119894119895 Arbitrary scaling invariance and

degeneracy are guaranteed by the constraint of (6)The solution of LPP problem can be gained by solving the

eigenvector problem of

119883119871119883119879120593 = 120582119883119863119883

119879120593 (7)

where 119871 equiv 119863 minus 119860 denotes the graph-Laplacian matrix inthe community of spectral analysis and can be viewed as thediscrete version of Laplace Beltrami operator on a compactRimannian manifold [29] And finally the transformationmatrix 119879 is given by 119879LPP = 1205931 1205932 120593119902 isin R119901times119902 thatcorrespond to eigenvalue 0 = 120582

0le 1205821le 1205822le sdot sdot sdot le 120582

119902le

sdot sdot sdot le 120582119896

23 Local Fisher Discriminant Analysis (LFDA) Local Fisherdiscriminant analysis (LFDA) [32] measures the ldquoweightsrdquoof two data points by the corresponding distance and thenthe affinity matrix is calculated by these weights Notethat the ldquopairwiserdquo representation of within-scatter matrixand between-scatter matrix is very important for LFDAFollowing simple algebra steps the within-scatter matrix (1)of LDA can be transformed into the following forms

119878119908=

119862

sum

119888=1

119873119888

sum

119894=1

(119909119888

119894minus 119898119888) (119909119888

119894minus 119898119888)119879

=

119862

sum

119888=1

119873119888

sum

119894=1

(119909119888

119894minus1

119873119888

119873119888

sum

119895=1

119909119888

119895)(119909

119888

119894minus1

119873119888

119873119888

sum

119895=1

119909119888

119895)

119879

=

119873

sum

119894=1

119909119894119909119879

119894minus

119862

sum

119888=1

1

119873119888

119873119888

sum

119894119895=1

119909119888

119894119909119888

119895

119879

=

119873

sum

119894=1

(

119873

sum

119895=1

119875119908(119894 119895))119909

119894119909119879

119894minus

119873

sum

119894119895=1

119875119908(119894 119895) 119909

119894119909119879

119895

4 Mathematical Problems in Engineering

=1

2

119873

sum

119894119895=1

119875119908(119894 119895) (119909

119894119909119879

119894+ 119909119895119909119879

119895minus 119909119894119909119879

119895minus 119909119895119909119879

119894)

=1

2

119873

sum

119894119895=1

119875119908(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

(8)

where

119875119908(119894 119895) =

1

119873119888 if 119897

119894= 119897119895= 119888

0 if 119897119894= 119897119895

(9)

Let 119878119905be the total mixed matrix of LDA and then we gain

119878119887= 119878119905minus 119878119908

=1

2

119873

sum

119894119895=1

119875119887(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

(10)

where

119875119887(119894 119895) =

1

119873minus1

119873119888 if 119897

119894= 119897119895= 119888

1

119873 if 119897

119894= 119897119895

(11)

LFDA is achieved by weighting the pairwise data points

119878119908=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

119878119887=1

2

119873

sum

119894119895=1

119887(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

(12)

where 119908(119894 119895) and

119887(119894 119895) denote the weight matrix of

different pairwise points for the within-class samples andbetween-class samples respectively

119908(119894 119895) equiv

119882(119894 119895)

119873119888 if 119897

119894= 119897119895= 119888

0 if 119897119894= 119897119895

(13)

119887(119894 119895) equiv

119882(119894 119895) (1

119873minus1

119873119888) if 119897

119894= 119897119895= 119888

1

119873 if 119897

119894= 119897119895

(14)

where119882 indicates the affinity matrixThe construction of119882is critical for the performance of classified accuracy therebythe investigation of construction is in great need to be furtherelaborated in the following section

3 Proposed Scheme

The calculation of (13) and (14) is very important to theperformance of LFDA There are many methods to computethe affinitymatrix119882The simplest one is that119882 is equivalentto a constant that is

119882(119894 119895) equiv 119886 (15)

where 119886 in the above equation is a real nonnegative numberHowever the equations of (13) and (14) are derived to thestate-of-the-art Fisherrsquos linear discriminant analysis underthis construction

Another construction adopts the heat kernel derivedfrom LPP

119882(119894 119895) = exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

1205902) (16)

where 120590 is a tuning parameter Yet the affinity is valued by thedistance of data points and the computation is too simple torepresent the locality of data patches Amore adaptive version[26] of (16) is proposed as follows

119882(119894 119895) =

exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

1205902) if 119909

119894isin KNN (119909

119895 119870)

or 119909119895isin KNN (119909

119894 119870)

0 otherwise(17)

Compared with the former computation (17) is in con-junction with 119870-nearest data points which is computation-ally fast and light Moreover the property of local patches canbe characterized by (17) However the affinity defined in (16)and (17) is globally computed thus it may be apt to overfitthe training points and be sensitive to noise Furthermorethe density of HSI data pointsmay vary according to differentpatches Hence a local scaling technique is proposed inLFDA to cope with this issue [29] where the sophisticatedcomputation is given by

119882(119894 119895) =

exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

) if 119909119894isin KNN (119909

119895 119870)

or 119909119895isin KNN (119909

119894 119870)

0 otherwise(18)

where 120588119894denotes the local scaling around the corresponding

sample 119909119894with the following definition

120588119894=10038171003817100381710038171003817119909119894minus 119909119870

119894

10038171003817100381710038171003817 (19)

where 119909119870119894represents the 119870th nearest neighbor of 119909

119894 lowast 2

denotes the square Euclidean distance and119870 is a self-tuningpredefined parameter

To simplify the calculation many researches considered afixed value of119870 and a recommended value of119870 = 7 is studiedin [1 28] Note that 120588

119894is used to represent the distribution

of local data around sample 119909119894 However the above work

ignored the distribution around each individual sample Thediversity of adjacent HSI pixels is approximate thus thespectrum of the neighboring landmarks has great similarityThat is the pixels of HSI data which have resembling spec-trums tend to be of the same landmark This phenomenonindicates that the adjacency of local patches not only lies in

Mathematical Problems in Engineering 5

the spectrum space but also in the spatial space For a localpoint the calculation of making use of the diversity of its119870thnearest neighborhoods is not fully correct

An evident example is illustrated in Figure 1 where twogroups of points have different distributions In group (a)most neighbor points are closed to point 119909

0 while in group

(b) most neighbor points are far from point 1199090 However the

measurement of two cases are the same according to (19)Thiscan be found in Figure 1 where the distances between point1199090and its 119870th nearest neighborhoods (119870 = 7) are same in

both distributions which can be shown in Figures 1(a) and1(b)119871

1= 1198712This example indicates that the simplification of

local distribution by the distance between the sample 119909119894and

the 119870th nearest neighbor sample is unreasonable Actuallythe result by using of this simplification may raise someerrors

Based on the discussion above a novel approach whichis called PD-LFDA is proposed to overcome the weakness ofLFDA To be specific PD-LFDA incorporates two key pointsnamely

(1) The class prior probability is applied to compute theaffinity matrix

(2) The distribution of local patch is represented by theldquolocal variancerdquo instead of the ldquofarthest distancerdquo toconstruct the weight matrix

The proposed approach essentially increases the discriminantability of transformed features in low dimensional space Thepattern found by PD-LFDA is expected to be more accurateand coincids with the character of HSI data and is conduciveto classify HSI data

In this way a more sophisticated construction of affinitymatrix which is derived from [29] is proposed as follows

119882(119894 119895)

=

119901 (119897119894)2 exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)

sdot(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)) if 119897119894= 119897119895= 119888

0 if 119897119894= 119897119895

(20)

where 119901(119897119894) stands for the class prior probability of class 119909

119894

and 120588119894indicates the local variance Note that the denominator

item of (13) is 1119873119888 which will cancel out our prior effectif we use 119901(119897

119894) to replace 119901(119897

119894)2 (the construction of 119901(119897

119894)

will be given in (21)) Different part of this derivation playsthe same role as the original formulation for example forthe last item on one hand it plays the role of intraclassdiscriminating weight and on the other hand the productresult of119882 may reach zero if the Euclidean square distance sdot is very small for some data points For this case an extraitem (1 + exp(minus119909

119894minus 1199091198952120588119894120588119895)) is added to the construc-

tion of intraclass discriminating weight to prevent accuracytruncation By doing so our derivation can be viewed as an

integration of class prior probability the local weight andthe discriminating weight This construction is expected topreserve both the local neighborhood structure and the classinformation Besides this construction is expected to sharethe same advantages detailed in the original work

It is clear that (20) consists of two new factors comparedwith LFDA method (1) class prior probability 119901(119897

119894) and (2)

local variance 120588119894

Suppose class 119909119894to be class 119888 that is 119897

119894= 119888 so that the

probability of class 119909119894can be calculated by

119901 (119897119894) = 119901 (119888) =

119873119888

119873 (21)

where 119873119888 is the number of the samples in class 119888 whole 119873denotes the total number of samples and119873 = sum119862

119888=1119873119888

Please note that the item (1+exp(minus119909119894minus1199091198952120588119894120588119895)) in (20)

is used to prevent the extra rounding error produced from thefirst two items and to keep the total value of

119901 (119897119894)2 exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

(22)

which does not reach the minimum Here 120588lowastdenotes the

local scaling around 119909lowast In this paper a local scaling 119909

lowastis

measured by the standard deviation of local square distanceAssume that 119909(119894)

1 119909(119894)

2 119909

(119894)

119870are the119870-nearest samples of 119909

119894

and then the square distance between 119909119894and 119909(119894)

119896is given by

119889(119894)

119896=10038171003817100381710038171003817119909119894minus 119909(119894)

119896

10038171003817100381710038171003817

2

119896 = 1 2 119870 (23)

The corresponding mean (119894) can be defined as

(119894)=1

119870

119870

sum

119896=1

119889(119894)

119896

=1

119870

119870

sum

119896=1

10038171003817100381710038171003817119909119894minus 119909(119894)

119896

10038171003817100381710038171003817

2

(24)

where lowast 2 represents a square Euclidean distance and 119870 isa predefined parameter whose recommended value is 119870 = 7The standard deviation can be calculated as

120588119894= radic

1

119870

119870

sum

119896=1

(119889(119894)

119896minus (119894))

2

(25)

Note that in the above equation the item 1119870 becomesa constant that can be shifted outside Thus an equivalentformula is given by

120588119894=1

119870radic

119870

sum

119896=1

(119889(119894)

119896minus (119894))

2

(26)

Similar procedure can be deduced to 119909119895 Hence we have

120588119898=1

119870

119870

sum

119894=1

radic100381710038171003817100381710038171003817119909(119898119896)

119894

100381710038171003817100381710038171003817

2

minus1

119870

119870

sum

119895=1

100381710038171003817100381710038171003817119909(119898119896)

119895

100381710038171003817100381710038171003817

2

(27)

6 Mathematical Problems in Engineering

x1

x2

x3

x4

x5

x6

x7

L1

x0

(a)

x1

x2

x3

x4

x5

x6

x7

L2

x0

(b)

Figure 1 Different distributions of 1199090and the corresponding 119870-nearest neighborhoods (119870 = 7) (a) Most neighbors are closed to point 119909

0

(b) Most neighbors are far from point 1199090 The distances between point 119909

0and its 119870th nearest neighbors are the same in both distributions

1198711= 1198712

Comparing (19) with (27) it is noticeable that (28) holds

120588119898le 120588119894 (28)

Compared with the former definitions our definition has atleast the following advantages

(i) By incorporating the prior probability of each classwith local technique 119901(119897

119894) the proposed scheme is

expect to be a benefit for the classified accuracy(ii) The representation of local patches equation (26) is

described by local standard deviation 120588119894rather than

absolute diversity in (19) which is more accurate inmeasuring the local variance of data samples

(iii) Compared with the global calculation the proposedcalculation is taken on local patches which is efficientin getting rid of over-fitting

(iv) The proposed local scaling technique meets the char-acter of HSI data which is more applicable for theprocessing of hyperspectral image in real applications

Based on the above affinity defined an extended affinitymatrix can also be defined in a similar way Our definitiononly provides a heuristic exploration for reference Theaffinity can be further sparse for example by introducing theidea of 120576-nearest neighborhoods [31]

Theoptimal solution of improved scheme can be achievedby maximize the following criterion

119879PD-LFDA equiv arg max119879isinR119901times119902

tr (119879119879119878119887119879)

tr (119879119879119878119908119879)

(29)

It is evident that (29) has the similar form of (3) Thisfinding enlightens us that the transformation119879 can be simplyachieved by solving the generalized eigenvalue decomposi-tion of 119878minus1

119908119878119887 Moreover Let 119866 isin R119902times119902 be a 119902-dimensional

invertible square matrix It is clear that 119879PD-LFDA119866 is alsoan optimal solution of (29) This property indicates that

the optimal solution is not uniquely determined becauseof arbitrary arithmetic transformation of 119879PD-LFDA119866 Let 120593119894be the eigenvector of 119878minus1

119908119878119887corresponding to eigenvalue

119894

that is 119878119887120593119894= 119894119878119908120593119894 To cope with this issue a rescaling

procedure is adopted [25] Each eigenvector 120593119894119902

119894=1is rescaled

to satisfy the following constraint

120593119894119878119908120593119895= 1 if 119894 = 1198950 if 119894 = 119895

(30)

Then each eigenvector is weighted by the square root of itsassociated eigenvalue The transformed matrix 119879PD-LFDA ofthe proposed scheme is finally given by

119879PD-LFDA = radic11205931 radic21205932 radic1199021120593119902 isin R119901times119902 (31)

with descending order 1ge 2ge sdot sdot sdot ge

119902

For a new testing points 119909 the projected point in the newfeature space can be captured by 119910 = 119879119879PFDA119909 thus it can befurther analyzed in the transformed space

According to the above analysis we can design an algo-rithm which is called PD-LFDA Algorithm to perform ourproposed method The detailed description of this algorithmcan be found in the appendix (Algorithm 2) A summary ofthe calculation steps of PD-LFDA Algorithm is presented inAlgorithm 1

The advantage of PD-LFDA is discussed as followsFirstly to investigate the rank of the between-class scatter

matrix 119878119887of LDA 119878

119887can be rewritten as

119878119887=

119862

sum

119897=1

119873119897(119898119897minus 119898) (119898

119897minus 119898)119879

= [1198731(119898119897minus 119898) 119873

2(119898119897minus 119898) 119873

119871(119898119897minus 119898)]

sdot [1198731(119898119897minus 119898) 119873

2(119898119897minus 119898) 119873

119871(119898119897minus 119898)]

119879

(32)

Mathematical Problems in Engineering 7

Input HSI training samples119883 isin R119901times119873 dimensionality to be embedded 119902 the parameter 119870 of 119870NNand a test sample 119909

119905isin R119901

Step 1 For each sample 119909119894from the same class calculate the119882

119894119895by (20)

where the local scaling factor 120588119894is calculated via (26) or (27)

Step 2 Equations (13) and (14) can be globally and uniformly transformed into an equivalent formula via119908= 119882 sdot 119882

1

119887= 119882 sdot (119882

2minus1198821)

(i)

where the operator 119860 sdot 119861 denotes the dot product between 119860 and 119861 and

119882(119894 119895) =

119901(119897119894)2 exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

if 119897119894= 119897119895= 119888

if 119897119894= 119897119895

0

(iia)

1198821=

1

119873119888 if 119897

119894= 119897119895= 119888

0 others(iib)

1198822=1

119873(1119873times1

11times119873) (iic)

By the above formulas the product of elements in different matrices can be achieved via dot productbetween matrices The equations (iia) (iib) and (iic) can be gained by integrating the number ofeach class119873119888 the number of total training samples119873 and the local scaling 120588

119894 then matrices119882119882

11198822

can be calculatedStep 3 Construct within-scatter matrix

119908and between-scatter matrix

119887 according to (i)

Step 4 Define Laplacian matrices 119871lowast below119871lowast= 119863lowastminus lowast (iii)

where119863lowast is the row sum or column sum of119882lowast119863lowast119894119894= sum119895lowast(119894 119895) (or119863lowast

119894119894= sum119894lowast(119894 119895)) and the

notation lowast denotes one letter in 119908 119887Step 5 On the basis of (29) and (30) the transformation matrix can be achieved viaeigenvectors 119879 = radic

11205931 radic21205932 radic

1199021120593119902 119879 isin R119901times119902 that corresponding the 119902 leading

eigenvalues 119887120593119894= 119894119908120593119894in solving the general problem of

119887120593119894= 119894119908120593119894

Step 6 For a testing sample 119909119905isin R119901 the extracted feature is 119911

119905= 119879119879119909119905isin R119902

Output Transformation matrix 119879 and the extracted feature 119911119905

Algorithm 1 PD-LFDA Algorithm

Thereby

rank (119878119887) ⩽ rank ([119873

1(119898119897minus 119898) 119873

2(119898119897minus 119898)

119873119871(119898119897minus 119898)]) ⩽ 119862 minus 1

(33)

It is easy to infer that the rank of the between-class scattermatrix 119878

119887is119862minus1 atmost thus there are up to119862minus1meaningful

subfeatures that can be extracted Thanks to the help ofaffinity matrix 119882 when compared with the conventionalLDA the reduced subspace of proposed PD-LFDA can beany subdimensional space On the other hand the classicallocal fisherrsquos linear discriminant only weights the value ofsample pairs in the same classes while our method also takesin account the sample pairs in different classes Hereafter theproposedmethod will bemore flexible and the results will bemore adaptiveThe objective function of proposed method isquite similar to the conventional LDA hereby the optimalsolution is almost same as the conventional LDA whichindicates that it is also simple to implement and easy to revise

To further explore the relationship of LDA and PD-LFDAwe now rewrite the objective function of LDAandPD-LFDA respectively

119879LDA = arg max119879isinR119901times119902

trace 119879119879119878119887119879

subject to 119879119879119878119908119879 = 119868

(34)

119879PD-LFDA equiv arg max119879isinR119901times119902

trace 119879119879119878119887119879

subject to 119879119879119878119908119879 = 119868

(35)

This implies that LDA tries to maximize the between-class scatter and simultaneously constraint the within-classscatter to a certain level However such restriction is hardto constraint and no relaxation is imposed When the data isnot a single modal that is multimodal or unknown modalLDA often fails On the other hand benefiting from theflexible designing of affinity matrix119882 PD-LFDA gains morefreedom in (35) That is the separability of PD-LFDA will bemore distinct and the degree of freedom remains more than

8 Mathematical Problems in Engineering

Input HSI data samples119883 = 1199091 1199092 119909

119873 isin R119901times119873 the objective dimension to be embedded 119902

the nearest neighbor parameter 119870 (default 119870 equiv 7) and the test sample 119909119905isin R119901

Output Transformation matrix 119879 isin R119901times119902Steps are as follows

(1) Initialize matrices(2) 119878119908larr 0119901times119901

within-class scatter(3) 119878119887larr 0119901times119901

between-class scatter(4)(5) Compute within-class affinity matrix119882119908(6) for 119888 = 1 2 119862 do in a classwise manner(7) 119909

119888

119894119873119888

119894=1larr 119909

119895| 119897119895= 119888 the 119888th class data samples

(8) 119883 larr 119909119888

1 119909119888

2 119909

119888

119873119888 sample matrix

(9) 119882119894=(1119873119888times111times119873119888)

119873119888

(10)(11) Determine the local scaling(12) for 119894 = 1 2 119873119888 do(13) 119909

(119894)

119896larr the 119896th nearest neighbor of 119909119888

119894 119896 = 1 2 119870

(14) for 119896 = 1 2 119870 do(15) 119889

(119894)

119896= 119909119894minus 119909(119894)

1198962

(16) end for

(17) (119894)larr1

119870

119870

sum

119896=1

10038171003817100381710038171003817119909119894minus 119909(119894)

119896

10038171003817100381710038171003817

2

(18) 120588(119894)= radic

1

119870

119870

sum

119896=1

(119889(119894)

119896minus (119894))2

(19) end for(20)(21) Define local affinity matrix(22) for 119894 119895 = 1 2 119873119888 do(23) 119901(119897

119894) larr 119873

119888119873 prior probability

(24) 119860119894119895larr 119901(119897

119894) exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

(25) end for(26) 119860

119888= 119860

(27) end for(28)119882

119908= diag119882

11198822 119882

119862 in block diagonal manner

(29) 119860119908= diag119860

1 1198602 119860

119862 also in block diagonal manner

(30) for 119894 119895 = 1 2 119873 do(31)

119908(119894 119895) larr 119860

119908(119894 119895)119882

119908(119894 119895)

(32) end for(33)(34) Compute between-class affinity matrix119882119887

(35)119882119887larr(1119873times1

11times119873)

119873minus diag119882

11198822 119882

119862

(36) Let 119865nz denotes the nonzero flag of elements in119882119908 119865nz(119894 119895) = 1 if119882119908(119894 119895) = 0 119865nz(119894 119895) = 0 if119882119908(119894 119895) = 0

(37) 119865nz larr (1198821015840

119887= 0)

(38) 119860119887larr 0119873times119873

(39) 119860

119887(119865nz) = 119860119908(119865nz)

(40) 119860119887(not119865nz) = 1

(41) for 119894 119895 = 1 2 119873 do(42)

119887(119894 119895) larr 119860

119887(119894 119895)119882

119887(119894 119895)

(43) end for(44) Now construct Laplacian matrix for within affinity matrix

119908and between affinity matrix

119887

(45) Let(46)119863119908

119894119894= sum

119895

119908(119894 119895)119863119887

119894119894= sum

119895

119887(119894 119895)

(47) Then

Algorithm 2 Continued

Mathematical Problems in Engineering 9

(48) 119871119908= 119908minus 119863119908 119871119887= 119887minus 119863119887

(49) Construct two matrixs below(50) 119878

119887= 119883119871

119887119883119879 119878119908= 119883119871

119908119883119879

(51) Let 1205931 1205932 120593

119902 be the general eigenvector of

(52) 119878119887120593119894= 119894119878119908120593119894 forall119894 isin 1 2 119902

(53) with the corresponding eigenvalue in descending order 1ge 2ge sdot sdot sdot ge

119902

(54)(55) Finally the transformation matrix can be represented as(56) 119879 = radic120582

11205931 radic12058211205932 radic120582

1120593119902 isin R119901times119902

(57)(58) For a new test sample 119909

119905 the embedding 119911

119905is given by

(59) 119911119905= 119879119879119909119905isin R119902

Algorithm 2 Proposed PD-LFDA method

the conventional LDA thus our method is expected to bemore robust and significantly preponderant

For large scale data sets we discuss a scheme that canaccelerate the computation procedure of the within-scattermatrix 119878

119908 In our algorithm owning to the fact that we have

put penalty on the affinity matrix for different class samplesin constructing the between-scatter matrix the acceleratedprocedure will remain for further discussion

The within-class scatter 119878119908can be reformulated as

119878119908=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894119909119879

119894+ 119909119895119909119879

119895minus 119909119894119909119879

119895minus 119909119895119909119879

119894)

=

119873

sum

119894=1

(

119873

sum

119895=1

119908(119894 119895))119909

119894119909119879

119894minus

119873

sum

119894119895=1

119908(119894 119895) 119909

119894119909119879

119895

= 119883(119863119908minus 119908)119883119879

= 119883119908119883119879

(36)

Here

119863119908 (119894 119894) =

119873

sum

119895=1

119908(119894 119895)

119908= 119863119908minus 119908

(37)

119908can be block diagonal if all samples 119909

119894119873

119894=1are sorted

according to their labels This property implies that 119863119908and

119908can also be block diagonal matrix Hence if we compute

119878119908

through (36) then the procedure will be much moreefficient Similarly 119878

119887can also be formulated as

119878119887= 119883

119887119883119879

= 119883(119863119887minus 119887)119883119879

(38)

Nevertheless 119887is dense and can not be further sim-

plified However the simplified computational procedure of119908

saves for us part of time in a way In this paper weadopt the above procedure to accelerate 119878

119908and pursue 119878

119887

normally In addition to locality structure some papers showthat another property for example marginal information isalso important and should be preserved in the reduced spaceThe theory of extended LDA and LPP algorithm is developedrapidly recently Yan et al [33] summarized these algorithmsin a graph embedding framework and also proposed amarginal fisher analysis embedding (MFA) algorithm underthis framework

In MFA the criterion is characterized by intraclasscompactness and interclass marginal superability which isreplaced for thewithin-class scatter and between-class scatterseverally The intraclass relationship is reflected by an intrin-sic graph which is constructed by 119870-nearest neighborhoodsample data points in the same class while the interclasssuperability is mirrored by a penalty graph computed formarginal points from different classes Following this ideathe intraclass compactness is given as follows

119878119894= sum

119894119895 119894isin119873(119896)(119895)or 119895isin119873(119896)(119894)

10038171003817100381710038171003817119879119879119909119894minus 119879119879119909119895

10038171003817100381710038171003817

2

= 2119879119879119883 (119863 minus119882)119883

119879119879

(39)

where

119882(119894 119895) = 1 if 119894 isin 119873(119896) (119895) or 119895 isin 119873(119896) (119894) 0 otherwise

(40)

Here 119873(119896)(119895) represents the 119870-nearest neighborhood indexset of 119909

119895from the same class and119863 is the row sum (or column

sum) of 119882 119863(119894 119894) = sum119894119882119894119895 Interclass separability is indi-

cated by a penalty graph whose term is expressed as follows

119878119890= sum

119894119895 (119894119895)isin119875(119896)(119897119895)

or (119894119895)isin119875(119896)(119897119894)

10038171003817100381710038171003817119879119879119909119894minus 119879119879119909119895

10038171003817100381710038171003817

2

= 2119879119879119883(119863 minus )119883

119879119879

(41)

10 Mathematical Problems in Engineering

of which

(119894 119895) = 1 if (119894 119895) isin 119875(119896) (119897

119895) or (119894 119895) isin 119875(119896) (119897

119894)

0 otherwise(42)

Note that 119878119894and 119878

119890are corresponding to ldquowithin-scatter

matrixrdquo and ldquobetween-scatter matrixrdquo of the traditional LDAalternatively The optimal solution of MFA can be achievedby solving the following minimization problem That is

= arg119879

min 119879119879119883(119863 minus119882)119883

119879119879

119879119879119883(119863 minus )119883119879119879

(43)

We know that (43) is also a general eigenvalue decompositionproblem Let 119879PCA indicate the transformation matrix fromthe original space to PCA subspace with certain energyremaining and then the final projection of MFA is output as

119879MFA = 119879PCA (44)

As can be seenMFA constructs twoweightedmatrices119882and according to the intraclass compactness and interclassseparability In LFDA and PD-LFDA only one affinity isconstructed The difference lies in that the ldquoweightrdquo in LFDAand PD-LFDA is in the range of [0 1] according to the levelof difference Yet MFA distributes the same weight to its 119870-nearest neighborhoodsThe optimal solution ofMFA LFDAand PD-LFDA can be attributed to a general eigenvaluedecomposition problem Hence the idea of MFA LFDA andPD-LFDA is approximately similar in a certain interpretationRelationship with other methodologies can be analyzed in ananalogous way

4 Experimental Results

To illustrate the performance of PD-LFDA experimentson a real hyperspectral remote sensing image data setmdashAVIRIS Indian Pine 1992 are conducted in this sectionThe AVIRIS Indian Pines 1992 data set was gathered byNational Aeronautics and Space Administrator (NASA) withAirborne VisibleInfrared Imaging Spectrometer (AVIRIS)sensor over the Indian Pine test site in northwest Indians inJune 1992 This data set consists of 145 times 145 pixels and 224spectral reflectance bands ranging from 04 120583m to 245 120583mwith a spatial resolution of 20m The Indian Pines scene iscomposed of two-third agriculture and one-third forest orother natural perennial vegetation Some other landmarkssuch as dual lane highways rail line low density housing andsmaller roads are also in this image Since the scene was takenin June some main crops for example soybeans and cornare in their early growth stage with less than 5 coveragewhile the no-till min-till and clean-till indicate the amountof previous crop residue remaining The region map can bereferred to Figure 5(a) The 20 water absorption bands (ie[108ndash112 154ndash167] 224) were discarded

In this section performance of different dimensionreduction methods that is PCA LPP LFDA LDA JGLDAand RP [34] is compared with the proposed PD-LFDA

Table 1 Training set in AVIRIS Indian Pines 1992 database

ID Class Name1 Alfalfa (4618 3913)2 Corn-notill (1428136 952)3 Corn-mintill (83087 1048)4 Corn (23734 1434)5 Grass-pasture (48354 1118)6 Grass-trees (73071 973)7 Grass-pasture-mowed (2817 6071)8 Hay-windrowed (47850 1046)9 Oats (2015 7500)10 Soybean-notill (97286 884)11 Soybean-mintill (2455214 872)12 Soybean-clean (59354 911)13 Wheat (20528 1366)14 Woods (1265102 806)15 Buildings-Grass-Trees-Drives (38639 1010)16 Stone-Steel-Towers (9324 2581)Total 102491029 1004Numerical value in this table refers to number of samples number oftraining samples and pc respectively

Classification accuracy is reported via the concrete classifiersGenerally many dimension reduction research papers adopt119870-nearest neighborhood classifier (KNN) and support vectormachine (SVM) classifier to measure the performance of theextracting features after the dimension reduction where theoverall accuracy and kappa coefficient are detailed in thereports Hereby in this paper we also adopt KNN classifierand SVM classifier for performance measurement In KNNclassifier we select the value of 119870 as 1 5 and 9 so thatthree classifiers based on nearest neighborhoods are formedwhich are called 1NN 5NN and 9NN In SVM classifier weseek a hyperplane to separate classes in kernel-induced spacewhere the linear nonseparable classes in the original featurespace can be separated via kernel trick SVM as a robustand successful classifier has been widely used to evaluatethe performance of multifarious methods in many areas Forsimplicity and convenience we use LIBSVM package [35] forexperiments Accuracy of dimension reduced performancewill be reported by classified performance from SVM clas-sifier In the following schedule the feature subspace will becalculated at the first step from training samples by differentdimensional algorithms Table 1 gives a numerical statisticsof training samples corresponding to each class Then thenew sample will be projected into a low subspace by thetransformedmatrix Finally all the new samples are classifiedby SVM classifier

In this experiment a total of 1029 samples were selectedfor training and the remaining samples are used for testingNote that all the labeled samples in database are unbalancedand the available samples of each category differ dramaticallyThe following strategy is imposed for sample division A fixednumber of 15 samples are randomly selected to form thetraining sample yet the absent samples are randomly selected

Mathematical Problems in Engineering 11

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(b) 5NN

7 9 11 13 15Reduced space

0

01

02

03

04

05

06

07

08

09

1

Ove

rall

accu

racy

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(d) Liner SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF-SVM

Figure 2 Overall accuracy by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

12 Mathematical Problems in Engineering

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(b) 5NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(d) Linear SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF SVM

Figure 3 Kappa coefficient by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

Mathematical Problems in Engineering 13

(a) Pseudo 3-channel color image band = [12 79 140]

1

2

2

2

2

2 2

3

3

3

3

3

4

5

5

5

5

6

6 6 6

7

8

9

10

10

10 10

11

11

11

11

11 12

12 12

13 14

14

14 15

15

16

(b) Ground truth

(c) Distribution of training sample

Figure 4 Illustration of sample partition

from the remaining samples Under this strategy the trainingsamples and testing samples are listed in Table 1

Figure 2 shows the overall accuracy of different dimen-sion reduction methods applied to AVIRIS Indian 92AV3Cdata set The neighborhood of KNN classifier is selected as 15 and 9 respectively which produces three classifiers thatis 1NN 5NN and 9NN Three different kernel functionsare adopted for SVM classifier Then derived classifiers arealso used in this experiment that is linear SVM polynomialSVM and RBFSVM It can be deduced from Figures 2(a)sim2(c) that when the embedding space is greater than 5 pro-posed PD-LFDA performs the best while JGLDA performsthe worst The results produced by RP is slightly better thanJGLDA PCA LDA LPP and RP show the similar classifiedresults under KNN classifierThat is the proposed PD-LFDAoutperforms the others Meantime compared with LFDAthe proposed PD-LFDA leads to 2 more improvementson average Moreover it can be observed from (d) that theclassified accuracy increases steadily as the embedded spaceincreases However LDA demonstrates the highest overallaccuracywhen the reduced features vary to 9 while LFDAhas

the significant improvements when the number of reducedfeatures is greater than 9 This phenomenon of Figure 2(d)indicates the instability of linear SVM Nevertheless thesituation reversed for polynomial SVM and RBF SVM inFigures 2(e) and 2(f) wherein the proposed PD-LFDAwins a little improvement against LFDA and has significantimprovement compared with the others Inspired effects ofproposed PD-LFDA algorithm were achieved in all casesFurthermore Table 2 gives the detailed overall accuracyunder different feature dimensions using 3NN 7NN andRBFSVM classifiers which validates the feasibility of theproposed scheme in this paper

Figure 3 displays the results of kappa coefficient obtainedusing the different dimension reduction algorithms underKNN classifier and SVM classifiers The experimental cir-cumstance of Figure 3 is same as that of Figure 2 We canfind that from these results JGLDA performs the worst inmost cases except in Figure 3(e) The proposed PD-LFDAmethod outperforms the other methods and achieves thehighest kappa numerical value in most cases except usingthe linear SVM as the classifier In fact none of them work

14 Mathematical Problems in Engineering

686864717360

(a) PCA-7NN

687564707607

(b) LPP-7NN

792376538609

(c) LFDA-7NN

739070418130

(d) LDA-7NN

619556516209

(e) JGLDA-7NN

670962947164

(f) RP-7NN

799277288570

(g) PCA-RBFSVM

767473558288

(h) LPP-RBFSVM

837581518822

(i) LFDA-RBFSVM

728769288071

(j) LDA-RBFSVM

582453777121

(k) JGLDA-RBFSVM

763173278365

(l) RP-RBFSVM

Figure 5 Classified map generated by different dimension reduction methods where the overall accuracy kappa coefficient and averageclassification accuracy are listed at the top of each map respectively

steady in the case of linear SVM (Figure 3(d)) Note that thesituation is improved in polynomial SVM where the kappanumerical value of the proposed PD-LFDA is significantlybetter than the others All the achievements demonstrate therobustness of our contribution in PD-LFDA Simultaneouslyit is noticeable that LPP exhibits an average kappa levelThe kappa value gained by LPP is not seriously bad and isnot dramatically good The kappa results produced by RPare approximately the same as LPP A significant advantageof RP is the simple construction and computation wherethe accuracy is closed to LPP More details are summarizedin Table 3 It can be concluded that the kappa coefficientof proposed algorithm is higher than the other approacheswhich is more appropriate for the classification of HSI data

The visual results of all methods are presented in Figures5sim6 where the class labels are converted to pseudocolor

image The pseudocolor image of the hyperspectral imagefrom Indian 92AV3C database is shown in Figure 4(a) Theavailable labeled image which represents the truth groundis illustrated in Figure 4(b) where the labels are made byhuman The training samples are selected from the labeledimage represented as points in the image as shown inFigure 4(c) Each label number (ID) corresponds to eachclass name which is indexed in Table 1 In this experimentall the available labeled samples are used for testing whileapproximate 10 of samples are used for training Thesubspace is fixed to 13 (the number here is only used forreference it can be changed) For each experiment thedimension from original feature space is reduced to theobjective dimensionality thereafter the classified maps areinduced by 7NN classifier and RBF-SVM classifier Theoverall accuracy kappa coefficient and average accuracy are

Mathematical Problems in Engineering 15

837981698991

(a) PD-LFDA-7NN

848682798968

(b) D-LFDA-RBFSVM

Figure 6 Classified map of the proposed method

Table 2 Overall accuracy obtained using different dimensions offeature and different classifiers for Indian Pine Scene database ()

Item Dims7 9 11 13 15

3NNPCA 7007 6996 7026 7035 7087LPP 6575 6871 6835 6695 6546LFDA 7513 7895 7963 7992 8009LDA 6415 6606 6616 6596 6536PD-LFDA 7713 8094 816 8218 8263JGLDA 5405 5602 5704 5642 5751RP 6173 6444 6745 6632 6652

7NNPCA 6902 6902 691 6933 6972LPP 6782 7057 7027 6892 6681LFDA 7419 7777 7827 7863 7862LDA 6615 6909 6899 692 6882PD-LFDA 7736 809 8155 8172 8235JGLDA 5665 5811 5809 5812 5914RP 6194 6481 6696 6613 6603

RBF-SVMPCA 8051 7794 7718 7935 8176LPP 7101 7514 7614 7501 7314LFDA 7826 8187 82 8113 7915LDA 677 6834 6877 6642 6792PD-LFDA 7858 8267 8395 8423 8166JGLDA 5686 5606 5663 5689 5879RP 6957 7282 7441 7416 7543

also included at the top of each map respectively Figure 5displays the classifiedmaps of classicmethods in pseudocolorimages and the classified maps of proposed PD-LFDA arepresented in Figure 6 It can be observed from these mapsthat the best performance is achieved by PD-LFDA when7NN classifier is used in this case the overall accuracyis 8379 the kappa coefficient is 8169 and the averageaccuracy is 8991 Moreover the worst algorithm is JGLDAwhose overall accuracy is 6195 the kappa coefficient

Table 3 Kappa coefficient by different dimension reduction meth-ods and different classifiers applied to Indian Pine Scene database()

Item Dims7 9 11 13 15

3NNPCA 6592 658 6612 6624 6681LPP 6084 6422 6382 6219 605LFDA 7145 7586 7664 7697 7718LDA 5913 6126 6137 6118 6048PD-LFDA 7392 7824 7901 7966 8016JGLDA 4778 4992 5082 5006 5119RP 564 5963 6303 6172 6192

7NNPCA 6466 6466 6475 6501 6545LPP 6304 6629 6585 6429 6185LFDA 7047 7457 7515 7557 7557LDA 6132 646 6453 6478 6431PD-LFDA 7417 782 7893 7913 7985JGLDA 5029 519 5161 5159 5263RP 5638 5992 6237 6135 6124

RBF-SVMPCA 7767 747 7393 7636 7900LPP 6677 715 7264 7115 6921LFDA 7486 7915 7924 7822 7584LDA 629 6362 6427 6153 6323PD-LFDA 7551 8011 8159 8187 7899JGLDA 5109 5066 5133 5147 5369RP 653 6908 7085 706 7203

becomes 5651 and the average accuracy is only 6209Other methods such as PCA LPP and RP can produce thecomparable results and no one can outperform the otherHowever LDAoutperformsPCA LPP andRP yielding betterresult Similar conclusions can be achieved in the groupof RBF-SVM Generally proposed PD-LFDA significantlyoutperforms the rest in this experiment which indicates thecorrectness of improvements in proposed PD-LFDA

16 Mathematical Problems in Engineering

Table 4 Performance of dimension reduction on the whole labeled samples ()

Evaluated item MethodologiesPCA LDA LPP LFDA JGLDA RP PD-LFDA

7NNOverall accuracy 6868 739 6875 7923 6195 6709 8379Kappa coefficient 6471 7041 647 7653 5651 6294 8169Average accuracy 737 813 7607 8609 6209 7164 8991

RBF-SVMOverall accuracy 7992 7287 7674 8375 5824 7631 8486Kappa coefficient 7728 6928 7355 8151 5377 7327 8279Average accuracy 857 8071 8288 8822 7121 8365 8968

Finally details of assessment deduced by 7NN and RBF-SVM is summarized in Table 4 where the correspondingoverall accuracy kappa coefficient and average accuracy ofdifferent methods can be located collectively

5 Conclusions

In this paper we have analyzed local Fisher discriminantanalysis (LFDA) and found its weakness By replacing themaximum distance with local variance for the constructionof weight matrix introducing class prior probability into thecomputation of affinitymatrix an improved LFDA algorithmhas been proposed This novel approach is called PD-LFDAbecause the probability distribution (PD) is applied in LFDAalgorithm The proposed approach essentially can increasethe discriminant ability of transformed features in lowdimensional space The pattern found by the new approachis expected to be more accurate and coincides with thecharacter of HSI data and is conducive to classify HSI dataPD-LFDA has been evaluated on a real removing sensingAVIRIS Indian Pine 92AV3C data set We have compared theperformance of the proposed PD-LFDA with that of PCALPP LFDA LDA JGLDA and RP Both the numerical resultsand visual inspection of class maps have been obtained Inthe experiments KNN classifier and SVM classifier havebeen used We have argued that the proposed PD-LFDAexhibits the best performance and serves as a very effectivedimensionality reduction tool for high dimensional datasuch as hyperspectral image (HSI) data

Appendix

Procedure of Proposed Algorithm

The brief description of the algorithm to perform the pro-posed PD-LFDA method is already presented in Section 3The details of the algorithm are provided in Algorithm 2

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the Research of University ofMacau under Grants no MYRG205(Y1-L4)-FST11-TYY noMYRG187(Y1-L3)-FST11-TYY and no SRG010-FST11-TYYand by the National Natural Science Foundation of Chinaunder Grant no 61273244 This research project was alsosupported by the Science andTechnologyDevelopment Fund(FDCT) of Macau under Contract no 100-2012-A3

References

[1] W Li S Prasad Z Ye J E Fowler and M Cui ldquoLocality-preserving discriminant analysis for hyperspectral image clas-sification using local spatial informationrdquo in Proceedings ofthe 32nd IEEE International Geoscience and Remote SensingSymposium (IGARSS rsquo12) pp 4134ndash4137MunichGermany July2012

[2] HND LeM S Kim andD-H Kim ldquoComparison of singularvalue decomposition and principal component analysis appliedto hyperspectral imaging of biofilmrdquo in Proceedings of the IEEEPhotonics Conference (IPC rsquo12) pp 6ndash7 2012

[3] C K Chui and J Wang ldquoRandomized anisotropic transformfor nonlinear dimensionality reductionrdquo GEMmdashInternationalJournal on Geomathematics vol 1 no 1 pp 23ndash50 2010

[4] T V Bandos L Bruzzone and G Camps-Valls ldquoClassificationof hyperspectral images with regularized linear discriminantanalysisrdquo IEEE Transactions on Geoscience and Remote Sensingvol 47 no 3 pp 862ndash873 2009

[5] D Guangjun Z Yongsheng and J Song ldquoDimensionalityreduction of hyperspectral data based on ISOMAP algorithmrdquoin Proceedings of the 8th International Conference on ElectronicMeasurement and Instruments (ICEMI rsquo07) pp 3935ndash3938Xirsquoan China August 2007

[6] X Luo and M-F Jiang ldquoThe application of manifold learn-ing in dimensionality analysis for hyperspectral imageryrdquo inProceedings of the International Conference on Remote SensingEnvironment and Transportation Engineering (RSETE rsquo11) pp4572ndash4575 June 2011

[7] J Khodr and R Younes ldquoDimensionality reduction on hyper-spectral images a comparative review based on artificial datasrdquoin Proceedings of the 4th International Congress on Image andSignal Processing (CISP rsquo11) vol 4 pp 1875ndash1883 October 2011

Mathematical Problems in Engineering 17

[8] J Wen Z Tian H She and W Yan ldquoFeature extraction ofhyperspectral images based on preserving neighborhood dis-criminant embeddingrdquo in Proceedings of the 2nd InternationalConference on Image Analysis and Signal Processing (IASP rsquo10)pp 257ndash262 Zhejiang China April 2010

[9] Y-R Yeh S-Y Huang and Y-J Lee ldquoNonlinear dimensionreduction with kernel sliced inverse regressionrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 21 no 11 pp1590ndash1603 2009

[10] J He L Zhang Q Wang and Z Li ldquoUsing diffusion geometriccoordinates for hyperspectral imagery representationrdquo IEEEGeoscience and Remote Sensing Letters vol 6 no 4 pp 767ndash7712009

[11] J Peng P Zhang and N Riedel ldquoDiscriminant learninganalysisrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 38 no 6 pp 1614ndash1625 2008

[12] F S Tsai and K L Chan ldquoDimensionality reduction techniquesfor data explorationrdquo in Proceedings of the 6th InternationalConference on Information Communications and Signal Process-ing pp 1ndash5 December 2007

[13] M D Farrell Jr and R M Mersereau ldquoOn the impact of PCAdimension reduction for hyperspectral detection of difficulttargetsrdquo IEEE Geoscience and Remote Sensing Letters vol 2 no2 pp 192ndash195 2005

[14] S Prasad and L M Bruce ldquoLimitations of principal com-ponents analysis for hyperspectral target recognitionrdquo IEEEGeoscience and Remote Sensing Letters vol 5 no 4 pp 625ndash629 2008

[15] J Yu Q Tian T Rui and T S Huang ldquoIntegrating discriminantand descriptive information for dimension reduction and clas-sificationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 17 no 3 pp 372ndash377 2007

[16] J Kong S Wang J Wang L Ma B Fu and Y Lu ldquoAnovel approach for face recognition based on supervisedlocality preserving projection and maximummargin criterionrdquoin Proceedings of the International Conference on ComputerEngineering and Technology (ICCET rsquo09) vol 1 pp 419ndash423Singapore January 2009

[17] M Loog and R P W Duin ldquoLinear dimensionality reductionvia a heteroscedastic extension of LDA the Chernoff criterionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 26 no 6 pp 732ndash739 2004

[18] A C Jensen A Berge and A S Solberg ldquoRegressionapproaches to small sample inverse covariance matrix estima-tion for hyperspectral image classificationrdquo IEEE TransactionsonGeoscience andRemote Sensing vol 46 no 10 pp 2814ndash28222008

[19] J Jin BWang and L Zhang ldquoA novel approach based on Fisherdiscriminant null space for decomposition of mixed pixels inhyperspectral imageryrdquo IEEE Geoscience and Remote SensingLetters vol 7 no 4 pp 699ndash703 2010

[20] J Wen Z Tian X Liu and W Lin ldquoNeighborhood preservingorthogonal pnmf feature extraction for hyperspectral imageclassificationrdquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 6 no 2 pp 759ndash7682013

[21] Y Ren G Zhang G Yu and X Li ldquoLocal and global structurepreserving based feature selectionrdquoNeurocomputing vol 89 pp147ndash157 2012

[22] Z Fan Y Xu and D Zhang ldquoLocal linear discriminant analysisframework using sample neighborsrdquo IEEE Transactions onNeural Networks vol 22 no 7 pp 1119ndash1132 2011

[23] Y Wang S Huang D Liu and B Wang ldquoResearch advanceon band selection-based dimension reduction of hyperspectralremote sensing imagesrdquo in Proceedings of the 2nd InternationalConference on Remote Sensing Environment and TransportationEngineering (RSETE rsquo12) pp 1ndash4 IEEE Nanjing China June2012

[24] B Waske S van der Linden J A Benediktsson A Rabe andP Hostert ldquoSensitivity of support vector machines to randomfeature selection in classification of hyperspectral datardquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 7pp 2880ndash2889 2010

[25] M Sugiyama ldquoLocal fisher discriminant analysis for superviseddimensionality reductionrdquo in Proceedings of the 23rd Interna-tional Conference onMachine Learning (ICML rsquo06) pp 905ndash912ACM June 2006

[26] Y Cheng ldquoMean shift mode seeking and clusteringrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol17 no 8 pp 790ndash799 1995

[27] W Li S Prasad J E Fowler and L M Bruce ldquoLocality-preserving dimensionality reduction and classification forhyperspectral image analysisrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 50 no 4 pp 1185ndash1198 2012

[28] L Zelnik-Manor and P Perona ldquoSelf-tuning spectral cluster-ingrdquo in Proceedings of the 18th Annual Conference on NeuralInformation Processing Systems pp 1601ndash1608 December 2004

[29] W K Wong and H T Zhao ldquoSupervised optimal localitypreserving projectionrdquo Pattern Recognition vol 45 no 1 pp186ndash197 2012

[30] X He and P Niyogi ldquoLocality preserving projectionsrdquo inAdvances in Neural Information Processing Systems 16 SThrunL Saul and B Scholkopf Eds MIT Press Cambridge MassUSA 2004

[31] H Wang S Chen Z Hu and W Zheng ldquoLocality-preservedmaximum information projectionrdquo IEEE Transactions on Neu-ral Networks vol 19 no 4 pp 571ndash585 2008

[32] M Sugiyama ldquoDimensionality reduction ofmultimodal labeleddata by local fisher discriminant analysisrdquo Journal of MachineLearning Research vol 8 pp 1027ndash1061 2007

[33] S Yan D Xu B Zhang H J Zhang Q Yang and S Lin ldquoGraphembedding and extensions a general framework for dimen-sionality reductionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 40ndash51 2007

[34] C K Chui and J Wang ldquoDimensionality reduction of hyper-spectral imagery data for feature classificationrdquo in Handbookof Geomathematics pp 1005ndash1047 Springer Heidelberg Ger-many 2010

[35] C-C Chang and C-J Lin ldquoLIBSVM a Library for supportvector machinesrdquo ACM Transactions on Intelligent Systems andTechnology vol 2 no 3 article 27 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Subspace Learning via Local Probability ...downloads.hindawi.com/journals/mpe/2015/145136.pdf · and applicable toolkits were engendered one a er another. Hyperspectral

2 Mathematical Problems in Engineering

computational expression of new data points in Isomaps It isclear that such computation is explicitly complex and unap-plicable for the large capacity of HSI data For this reasonIsomaps is impractical for the dimensionality reduction ofHSI data Similar drawback occurs in the construction of LLE[7] Recent interests in discovering the intrinsic manifold ofdata structure have been a trend and the theory is still inthe progress of development [9] yet some achievements havebeen gained and reported in many research articles [10]

Nevertheless the linear approaches are efficient to dealwith this issue [11 12] PCA an unsupervised approach findsthe global scatter as the best projected direction with the aimof minimizing the least square error of reconstruction datapoints [13] Due to its ldquounsupervisedrdquo nature the learningprocedure is often blind and the projected direction foundby PCA is usually not the optimal direction [14] LDA isa supervised methodology which absorbs the advantage ofpurpose of learning [15] Toward that goal LDA seeks thedirection that minimizes the classified error However thewithin-class scatter matrix of LDA is often singular when itis applied to the small size of samples [16] Consequentlythe optimal solution of LDA is unable to solve and theprojected direction is failed to achieve These drawbacks willlimit the wide promotion of LDA [4] To cope with thisissue a derived discriminant analysis which puts additionalconstraint on the objective function [17] was proposed insome research papers [18ndash20] for example Join Global andLocal Discriminant Analysis (JGLDA) [21] The commonscheme of these methods is that they are easy to compute andimplement and the mapping is explicit Yet they have showneffeciency in most cases despite the simple models

The linear algorithms would have more advantages inthe dimensionality reduction of HSI data in general As amatter of fact conventional linear approaches such as PCALDA and LPP make the assumption that the distributionsof data samples are Gaussian distribution or mixed Gaussiandistribution However the assumption is often failed [22]since the distribution of real HSI data is preferred to bemultimodal instead of a single modal To be specific thedistribution of HSI data is usually unknown [23] and thesingle Gaussian model or Gaussian mixture model can notcapture the distribution of all landmarks of the HSI datasince the landmarks from different classes are multimodal[24] In this case the conventional methodologies workpoorly In view of this some methods extend the idea ofLDA and formulate extend-LDA algorithms for exampleSugiyama [25] proposed Local Fisher Linear DiscriminantAnalysis (LFDA) formultimodal clusters LFDA incorporatesthe supervised nature of LDA with local description ofLPP and then the optimal projection is obtained under theconstraint of multimodal samples Li et al [1] apply LFDAwith maximum likelihood estimate (MLE) support vectormachine (SVM) and Gaussian mixture model (GMM) toHSI data As reported in his paper LFDA is superior notonly in the computational time but also in the classifiedaccuracy In a word LFDA is especially appropriate forthe landmarks classification of HSI data Nevertheless theconventional LFDA ignores the distribution of data samplesin the construction procedure of affinity matrix

In LFDA the computation of affinity matrix is importantNote that there are clearly many different ways to define anaffinity matrix but the heat kernel derived from LPP hasbeen shown to result in very effective locality preservingproperties [26] In this way the local scaling of data samplesin the 119870-nearest neighborhood is utilized 119870 is a self-tuning predefined parameter To simplify the calculationprocedure of parameters [1 27 28] employing a fixed valueof 119870 = 7 for experiments Note that such calculation mayignore the distribution of data samples in the constructionprocedure of affinity matrix Actually the simplification oflocal distribution by the distance between the samples andthe 119870th nearest neighbor sample may be unreasonable andthe results by using this simplification may raise some error

Thus in this paper to overcome the weakness of con-ventional LFDA a novel approach is proposed where byadopting the local variance of local patch instead of farthestdistance for weight matrix and the class prior probability foraffinitymatrix theweightmatrix of proposed algorithm takesinto account both the distribution of HSI data samples andthe objective function of HSI data after dimension reductionThis novel approach is called PD-LFDA because the prob-ability distribution (PD) is used in LFDA algorithm To bespecific PD-LFDA incorporates two key points namely

(1) The class prior probability is applied to compute theaffinity matrix

(2) The distribution of local patch is represented bythe ldquolocal variancerdquo instead of ldquofarthest distancerdquo toconstruct the weight matrix

The proposed approach essentially increases the discriminantability of transformed features in low dimensional spaceThe pattern found by PD-LFDA is expected to be moreaccurate and is coincide with the character of HSI data andis conducive to classify HSI data

The rest of this paper is organized as follows In the begin-ning of this paper the most basic concepts of conventionallinear approaches related to our work will be introducedin Section 2 Precisely Fisherrsquos linear discriminant analysis(LDA) and locality preserve projection (LPP) as well aslocal Fisher discriminant analysis (LFDA) will be presentedProposed algorithm is developed and formalized in Section 3which is the core of this paper The experimental results withcomparison on real HSI dateset are provided in Section 4Finally we conclude our work in Section 5

2 Related Work

The purpose of linear approaches is to find an optimalprojected direction where the information of embeddingfeatures is preserved as much as possible To formulate ourproblem let 119909

119894be the 119901-dimensional feature in the original

space and let 1199091 1199092 119909

119873 be the119873 samples For the case of

supervised learning let 119897119894be label of 119909

119894 and then the label set

of all samples can be represented by notation 1198971 1198972 119897119873

Suppose that there are119862 classes in all and the sample numberof the 119888th is119873

119888that fulfils the condition119873 = sum119862

119888=1119873119888That is

Mathematical Problems in Engineering 3

the number of all samples is the total sum of each class Let 119909119888119894

be the 119894th sample of the 119888th class Then the correspondingsample mean becomes 119898

119888= (1119873

119888) sum119873119888

119894=1119909119888

119894 yet the data

center of all samples is denoted by 119898 = (1119873)sum119873

119894=1119909119894

Suppose that the data set 119883 in 119901-dimensional hyperspaceis distributed on a low 119902-dimensional subspace A generalproblem of linear discriminant is to find a transformation119879 isin R119901times119902 that maps the 119901-dimensional data into a low119902-dimensional subspace data by 119884 = 119879

119879119883 such that

each 119910119894represents 119909

119894without losing useful information The

transformed matrix 119879 is pursued by different methods anddifferent objective function resulting in different algorithm

21 Fisherrsquos Linear Discriminant Analysis (LDA) LDA intro-duces the within-scatter matrix 119878

119908and between-scatter

matrix 119878119887to describe the distribution of data samples

119878119908=

119862

sum

119888=1

119873119888

sum

119894=1

(119909119888

119894minus 119898119888) (119909119888

119894minus 119898119888)119879 (1)

119878119887=

119862

sum

119888=1

119873119888(119898119888minus 119898) (119898

119888minus 119898)119879 (2)

Fisher criterion seeks a transformation 119879 that maximizedthe between-class scatter while minimized the within-classscatter This can be achieved by optimizing the followingobjective function

119879LDA = arg max119879isinR119901times119902

tr (119879119879119878119887119879)

tr (119879119879119878119908119879) (3)

It is implicitly assumed that 119879119879119878119908119879 is full rank Under

this assumption the problem can then be attributed to thegeneralized eigenvectors 120593

1 1205932 120593

119889 by solving

119878119887120593 = 120582119878

119908120593 (4)

Finally the solution of 119879LDA is given by 119879LDA =

1205931 1205932 120593

119902 which are associated with the first 119902

largest eigenvalues 1205821ge 1205822ge sdot sdot sdot ge 120582

119902 Since the rank of

between-class scatter matrix 119878119887is at most 119862 minus 1 there are

119862minus 1meaningful features in conventional LDA To deal withthis issue a regularization procedure is essential in practice

22 Locality Preserve Projection (LPP) Adrawback of LDA isthat it does not consider the local structure among data points[29] and the distribution of realHSI data is oftenmultimodalLocality preserving projection meets this requirement [30]The goal of LPP is to preserve the local structure of neighbor-hood points Toward this goal a graph ismodeled explicitly todescribe the relationship using 119896-nearest neighborhood Let119860 denote the affinity matrix where 119860(119894 119895) isin [0 1] representsthe similarity between points119909

119894and119909119895The larger the value of

119860(119894 119895) the closer the relationship between 119909119894and119909119895 A simple

and effective way to define affinity matrix 119860 is given by

119860 (119894 119895) =

exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

1205722) if 119909

119894isin KNN (119909

119895 119896)

or 119909119895isin KNN (119909

119894 119896)

0 otherwise(5)

where lowast 2 denotes the square 2-norm Euclidean distance120572 is a tuning parameter and KNN(119909 119896) represents the 119870-nearest neighborhoods of 119909 under parameter 119896

The transformed matrix of LPP is achieved in the follow-ing criterion [31]

119879LPP = arg min119879isinR119901times119902

1

2

119899

sum

119894119895=1

119860 (119894 119895)10038171003817100381710038171003817119910119894minus 119910119895

10038171003817100381710038171003817

2

st 119879119879119883119863119883119879119879 = 119868

(6)

where 119863 = diag(119863119894119894) is a diagonal matrix whose entries are

the column sum (also can be a row sum since119860 is symmetric)of 119860 that is 119863

119894119894= sum119895119860119894119895 Arbitrary scaling invariance and

degeneracy are guaranteed by the constraint of (6)The solution of LPP problem can be gained by solving the

eigenvector problem of

119883119871119883119879120593 = 120582119883119863119883

119879120593 (7)

where 119871 equiv 119863 minus 119860 denotes the graph-Laplacian matrix inthe community of spectral analysis and can be viewed as thediscrete version of Laplace Beltrami operator on a compactRimannian manifold [29] And finally the transformationmatrix 119879 is given by 119879LPP = 1205931 1205932 120593119902 isin R119901times119902 thatcorrespond to eigenvalue 0 = 120582

0le 1205821le 1205822le sdot sdot sdot le 120582

119902le

sdot sdot sdot le 120582119896

23 Local Fisher Discriminant Analysis (LFDA) Local Fisherdiscriminant analysis (LFDA) [32] measures the ldquoweightsrdquoof two data points by the corresponding distance and thenthe affinity matrix is calculated by these weights Notethat the ldquopairwiserdquo representation of within-scatter matrixand between-scatter matrix is very important for LFDAFollowing simple algebra steps the within-scatter matrix (1)of LDA can be transformed into the following forms

119878119908=

119862

sum

119888=1

119873119888

sum

119894=1

(119909119888

119894minus 119898119888) (119909119888

119894minus 119898119888)119879

=

119862

sum

119888=1

119873119888

sum

119894=1

(119909119888

119894minus1

119873119888

119873119888

sum

119895=1

119909119888

119895)(119909

119888

119894minus1

119873119888

119873119888

sum

119895=1

119909119888

119895)

119879

=

119873

sum

119894=1

119909119894119909119879

119894minus

119862

sum

119888=1

1

119873119888

119873119888

sum

119894119895=1

119909119888

119894119909119888

119895

119879

=

119873

sum

119894=1

(

119873

sum

119895=1

119875119908(119894 119895))119909

119894119909119879

119894minus

119873

sum

119894119895=1

119875119908(119894 119895) 119909

119894119909119879

119895

4 Mathematical Problems in Engineering

=1

2

119873

sum

119894119895=1

119875119908(119894 119895) (119909

119894119909119879

119894+ 119909119895119909119879

119895minus 119909119894119909119879

119895minus 119909119895119909119879

119894)

=1

2

119873

sum

119894119895=1

119875119908(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

(8)

where

119875119908(119894 119895) =

1

119873119888 if 119897

119894= 119897119895= 119888

0 if 119897119894= 119897119895

(9)

Let 119878119905be the total mixed matrix of LDA and then we gain

119878119887= 119878119905minus 119878119908

=1

2

119873

sum

119894119895=1

119875119887(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

(10)

where

119875119887(119894 119895) =

1

119873minus1

119873119888 if 119897

119894= 119897119895= 119888

1

119873 if 119897

119894= 119897119895

(11)

LFDA is achieved by weighting the pairwise data points

119878119908=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

119878119887=1

2

119873

sum

119894119895=1

119887(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

(12)

where 119908(119894 119895) and

119887(119894 119895) denote the weight matrix of

different pairwise points for the within-class samples andbetween-class samples respectively

119908(119894 119895) equiv

119882(119894 119895)

119873119888 if 119897

119894= 119897119895= 119888

0 if 119897119894= 119897119895

(13)

119887(119894 119895) equiv

119882(119894 119895) (1

119873minus1

119873119888) if 119897

119894= 119897119895= 119888

1

119873 if 119897

119894= 119897119895

(14)

where119882 indicates the affinity matrixThe construction of119882is critical for the performance of classified accuracy therebythe investigation of construction is in great need to be furtherelaborated in the following section

3 Proposed Scheme

The calculation of (13) and (14) is very important to theperformance of LFDA There are many methods to computethe affinitymatrix119882The simplest one is that119882 is equivalentto a constant that is

119882(119894 119895) equiv 119886 (15)

where 119886 in the above equation is a real nonnegative numberHowever the equations of (13) and (14) are derived to thestate-of-the-art Fisherrsquos linear discriminant analysis underthis construction

Another construction adopts the heat kernel derivedfrom LPP

119882(119894 119895) = exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

1205902) (16)

where 120590 is a tuning parameter Yet the affinity is valued by thedistance of data points and the computation is too simple torepresent the locality of data patches Amore adaptive version[26] of (16) is proposed as follows

119882(119894 119895) =

exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

1205902) if 119909

119894isin KNN (119909

119895 119870)

or 119909119895isin KNN (119909

119894 119870)

0 otherwise(17)

Compared with the former computation (17) is in con-junction with 119870-nearest data points which is computation-ally fast and light Moreover the property of local patches canbe characterized by (17) However the affinity defined in (16)and (17) is globally computed thus it may be apt to overfitthe training points and be sensitive to noise Furthermorethe density of HSI data pointsmay vary according to differentpatches Hence a local scaling technique is proposed inLFDA to cope with this issue [29] where the sophisticatedcomputation is given by

119882(119894 119895) =

exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

) if 119909119894isin KNN (119909

119895 119870)

or 119909119895isin KNN (119909

119894 119870)

0 otherwise(18)

where 120588119894denotes the local scaling around the corresponding

sample 119909119894with the following definition

120588119894=10038171003817100381710038171003817119909119894minus 119909119870

119894

10038171003817100381710038171003817 (19)

where 119909119870119894represents the 119870th nearest neighbor of 119909

119894 lowast 2

denotes the square Euclidean distance and119870 is a self-tuningpredefined parameter

To simplify the calculation many researches considered afixed value of119870 and a recommended value of119870 = 7 is studiedin [1 28] Note that 120588

119894is used to represent the distribution

of local data around sample 119909119894 However the above work

ignored the distribution around each individual sample Thediversity of adjacent HSI pixels is approximate thus thespectrum of the neighboring landmarks has great similarityThat is the pixels of HSI data which have resembling spec-trums tend to be of the same landmark This phenomenonindicates that the adjacency of local patches not only lies in

Mathematical Problems in Engineering 5

the spectrum space but also in the spatial space For a localpoint the calculation of making use of the diversity of its119870thnearest neighborhoods is not fully correct

An evident example is illustrated in Figure 1 where twogroups of points have different distributions In group (a)most neighbor points are closed to point 119909

0 while in group

(b) most neighbor points are far from point 1199090 However the

measurement of two cases are the same according to (19)Thiscan be found in Figure 1 where the distances between point1199090and its 119870th nearest neighborhoods (119870 = 7) are same in

both distributions which can be shown in Figures 1(a) and1(b)119871

1= 1198712This example indicates that the simplification of

local distribution by the distance between the sample 119909119894and

the 119870th nearest neighbor sample is unreasonable Actuallythe result by using of this simplification may raise someerrors

Based on the discussion above a novel approach whichis called PD-LFDA is proposed to overcome the weakness ofLFDA To be specific PD-LFDA incorporates two key pointsnamely

(1) The class prior probability is applied to compute theaffinity matrix

(2) The distribution of local patch is represented by theldquolocal variancerdquo instead of the ldquofarthest distancerdquo toconstruct the weight matrix

The proposed approach essentially increases the discriminantability of transformed features in low dimensional space Thepattern found by PD-LFDA is expected to be more accurateand coincids with the character of HSI data and is conduciveto classify HSI data

In this way a more sophisticated construction of affinitymatrix which is derived from [29] is proposed as follows

119882(119894 119895)

=

119901 (119897119894)2 exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)

sdot(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)) if 119897119894= 119897119895= 119888

0 if 119897119894= 119897119895

(20)

where 119901(119897119894) stands for the class prior probability of class 119909

119894

and 120588119894indicates the local variance Note that the denominator

item of (13) is 1119873119888 which will cancel out our prior effectif we use 119901(119897

119894) to replace 119901(119897

119894)2 (the construction of 119901(119897

119894)

will be given in (21)) Different part of this derivation playsthe same role as the original formulation for example forthe last item on one hand it plays the role of intraclassdiscriminating weight and on the other hand the productresult of119882 may reach zero if the Euclidean square distance sdot is very small for some data points For this case an extraitem (1 + exp(minus119909

119894minus 1199091198952120588119894120588119895)) is added to the construc-

tion of intraclass discriminating weight to prevent accuracytruncation By doing so our derivation can be viewed as an

integration of class prior probability the local weight andthe discriminating weight This construction is expected topreserve both the local neighborhood structure and the classinformation Besides this construction is expected to sharethe same advantages detailed in the original work

It is clear that (20) consists of two new factors comparedwith LFDA method (1) class prior probability 119901(119897

119894) and (2)

local variance 120588119894

Suppose class 119909119894to be class 119888 that is 119897

119894= 119888 so that the

probability of class 119909119894can be calculated by

119901 (119897119894) = 119901 (119888) =

119873119888

119873 (21)

where 119873119888 is the number of the samples in class 119888 whole 119873denotes the total number of samples and119873 = sum119862

119888=1119873119888

Please note that the item (1+exp(minus119909119894minus1199091198952120588119894120588119895)) in (20)

is used to prevent the extra rounding error produced from thefirst two items and to keep the total value of

119901 (119897119894)2 exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

(22)

which does not reach the minimum Here 120588lowastdenotes the

local scaling around 119909lowast In this paper a local scaling 119909

lowastis

measured by the standard deviation of local square distanceAssume that 119909(119894)

1 119909(119894)

2 119909

(119894)

119870are the119870-nearest samples of 119909

119894

and then the square distance between 119909119894and 119909(119894)

119896is given by

119889(119894)

119896=10038171003817100381710038171003817119909119894minus 119909(119894)

119896

10038171003817100381710038171003817

2

119896 = 1 2 119870 (23)

The corresponding mean (119894) can be defined as

(119894)=1

119870

119870

sum

119896=1

119889(119894)

119896

=1

119870

119870

sum

119896=1

10038171003817100381710038171003817119909119894minus 119909(119894)

119896

10038171003817100381710038171003817

2

(24)

where lowast 2 represents a square Euclidean distance and 119870 isa predefined parameter whose recommended value is 119870 = 7The standard deviation can be calculated as

120588119894= radic

1

119870

119870

sum

119896=1

(119889(119894)

119896minus (119894))

2

(25)

Note that in the above equation the item 1119870 becomesa constant that can be shifted outside Thus an equivalentformula is given by

120588119894=1

119870radic

119870

sum

119896=1

(119889(119894)

119896minus (119894))

2

(26)

Similar procedure can be deduced to 119909119895 Hence we have

120588119898=1

119870

119870

sum

119894=1

radic100381710038171003817100381710038171003817119909(119898119896)

119894

100381710038171003817100381710038171003817

2

minus1

119870

119870

sum

119895=1

100381710038171003817100381710038171003817119909(119898119896)

119895

100381710038171003817100381710038171003817

2

(27)

6 Mathematical Problems in Engineering

x1

x2

x3

x4

x5

x6

x7

L1

x0

(a)

x1

x2

x3

x4

x5

x6

x7

L2

x0

(b)

Figure 1 Different distributions of 1199090and the corresponding 119870-nearest neighborhoods (119870 = 7) (a) Most neighbors are closed to point 119909

0

(b) Most neighbors are far from point 1199090 The distances between point 119909

0and its 119870th nearest neighbors are the same in both distributions

1198711= 1198712

Comparing (19) with (27) it is noticeable that (28) holds

120588119898le 120588119894 (28)

Compared with the former definitions our definition has atleast the following advantages

(i) By incorporating the prior probability of each classwith local technique 119901(119897

119894) the proposed scheme is

expect to be a benefit for the classified accuracy(ii) The representation of local patches equation (26) is

described by local standard deviation 120588119894rather than

absolute diversity in (19) which is more accurate inmeasuring the local variance of data samples

(iii) Compared with the global calculation the proposedcalculation is taken on local patches which is efficientin getting rid of over-fitting

(iv) The proposed local scaling technique meets the char-acter of HSI data which is more applicable for theprocessing of hyperspectral image in real applications

Based on the above affinity defined an extended affinitymatrix can also be defined in a similar way Our definitiononly provides a heuristic exploration for reference Theaffinity can be further sparse for example by introducing theidea of 120576-nearest neighborhoods [31]

Theoptimal solution of improved scheme can be achievedby maximize the following criterion

119879PD-LFDA equiv arg max119879isinR119901times119902

tr (119879119879119878119887119879)

tr (119879119879119878119908119879)

(29)

It is evident that (29) has the similar form of (3) Thisfinding enlightens us that the transformation119879 can be simplyachieved by solving the generalized eigenvalue decomposi-tion of 119878minus1

119908119878119887 Moreover Let 119866 isin R119902times119902 be a 119902-dimensional

invertible square matrix It is clear that 119879PD-LFDA119866 is alsoan optimal solution of (29) This property indicates that

the optimal solution is not uniquely determined becauseof arbitrary arithmetic transformation of 119879PD-LFDA119866 Let 120593119894be the eigenvector of 119878minus1

119908119878119887corresponding to eigenvalue

119894

that is 119878119887120593119894= 119894119878119908120593119894 To cope with this issue a rescaling

procedure is adopted [25] Each eigenvector 120593119894119902

119894=1is rescaled

to satisfy the following constraint

120593119894119878119908120593119895= 1 if 119894 = 1198950 if 119894 = 119895

(30)

Then each eigenvector is weighted by the square root of itsassociated eigenvalue The transformed matrix 119879PD-LFDA ofthe proposed scheme is finally given by

119879PD-LFDA = radic11205931 radic21205932 radic1199021120593119902 isin R119901times119902 (31)

with descending order 1ge 2ge sdot sdot sdot ge

119902

For a new testing points 119909 the projected point in the newfeature space can be captured by 119910 = 119879119879PFDA119909 thus it can befurther analyzed in the transformed space

According to the above analysis we can design an algo-rithm which is called PD-LFDA Algorithm to perform ourproposed method The detailed description of this algorithmcan be found in the appendix (Algorithm 2) A summary ofthe calculation steps of PD-LFDA Algorithm is presented inAlgorithm 1

The advantage of PD-LFDA is discussed as followsFirstly to investigate the rank of the between-class scatter

matrix 119878119887of LDA 119878

119887can be rewritten as

119878119887=

119862

sum

119897=1

119873119897(119898119897minus 119898) (119898

119897minus 119898)119879

= [1198731(119898119897minus 119898) 119873

2(119898119897minus 119898) 119873

119871(119898119897minus 119898)]

sdot [1198731(119898119897minus 119898) 119873

2(119898119897minus 119898) 119873

119871(119898119897minus 119898)]

119879

(32)

Mathematical Problems in Engineering 7

Input HSI training samples119883 isin R119901times119873 dimensionality to be embedded 119902 the parameter 119870 of 119870NNand a test sample 119909

119905isin R119901

Step 1 For each sample 119909119894from the same class calculate the119882

119894119895by (20)

where the local scaling factor 120588119894is calculated via (26) or (27)

Step 2 Equations (13) and (14) can be globally and uniformly transformed into an equivalent formula via119908= 119882 sdot 119882

1

119887= 119882 sdot (119882

2minus1198821)

(i)

where the operator 119860 sdot 119861 denotes the dot product between 119860 and 119861 and

119882(119894 119895) =

119901(119897119894)2 exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

if 119897119894= 119897119895= 119888

if 119897119894= 119897119895

0

(iia)

1198821=

1

119873119888 if 119897

119894= 119897119895= 119888

0 others(iib)

1198822=1

119873(1119873times1

11times119873) (iic)

By the above formulas the product of elements in different matrices can be achieved via dot productbetween matrices The equations (iia) (iib) and (iic) can be gained by integrating the number ofeach class119873119888 the number of total training samples119873 and the local scaling 120588

119894 then matrices119882119882

11198822

can be calculatedStep 3 Construct within-scatter matrix

119908and between-scatter matrix

119887 according to (i)

Step 4 Define Laplacian matrices 119871lowast below119871lowast= 119863lowastminus lowast (iii)

where119863lowast is the row sum or column sum of119882lowast119863lowast119894119894= sum119895lowast(119894 119895) (or119863lowast

119894119894= sum119894lowast(119894 119895)) and the

notation lowast denotes one letter in 119908 119887Step 5 On the basis of (29) and (30) the transformation matrix can be achieved viaeigenvectors 119879 = radic

11205931 radic21205932 radic

1199021120593119902 119879 isin R119901times119902 that corresponding the 119902 leading

eigenvalues 119887120593119894= 119894119908120593119894in solving the general problem of

119887120593119894= 119894119908120593119894

Step 6 For a testing sample 119909119905isin R119901 the extracted feature is 119911

119905= 119879119879119909119905isin R119902

Output Transformation matrix 119879 and the extracted feature 119911119905

Algorithm 1 PD-LFDA Algorithm

Thereby

rank (119878119887) ⩽ rank ([119873

1(119898119897minus 119898) 119873

2(119898119897minus 119898)

119873119871(119898119897minus 119898)]) ⩽ 119862 minus 1

(33)

It is easy to infer that the rank of the between-class scattermatrix 119878

119887is119862minus1 atmost thus there are up to119862minus1meaningful

subfeatures that can be extracted Thanks to the help ofaffinity matrix 119882 when compared with the conventionalLDA the reduced subspace of proposed PD-LFDA can beany subdimensional space On the other hand the classicallocal fisherrsquos linear discriminant only weights the value ofsample pairs in the same classes while our method also takesin account the sample pairs in different classes Hereafter theproposedmethod will bemore flexible and the results will bemore adaptiveThe objective function of proposed method isquite similar to the conventional LDA hereby the optimalsolution is almost same as the conventional LDA whichindicates that it is also simple to implement and easy to revise

To further explore the relationship of LDA and PD-LFDAwe now rewrite the objective function of LDAandPD-LFDA respectively

119879LDA = arg max119879isinR119901times119902

trace 119879119879119878119887119879

subject to 119879119879119878119908119879 = 119868

(34)

119879PD-LFDA equiv arg max119879isinR119901times119902

trace 119879119879119878119887119879

subject to 119879119879119878119908119879 = 119868

(35)

This implies that LDA tries to maximize the between-class scatter and simultaneously constraint the within-classscatter to a certain level However such restriction is hardto constraint and no relaxation is imposed When the data isnot a single modal that is multimodal or unknown modalLDA often fails On the other hand benefiting from theflexible designing of affinity matrix119882 PD-LFDA gains morefreedom in (35) That is the separability of PD-LFDA will bemore distinct and the degree of freedom remains more than

8 Mathematical Problems in Engineering

Input HSI data samples119883 = 1199091 1199092 119909

119873 isin R119901times119873 the objective dimension to be embedded 119902

the nearest neighbor parameter 119870 (default 119870 equiv 7) and the test sample 119909119905isin R119901

Output Transformation matrix 119879 isin R119901times119902Steps are as follows

(1) Initialize matrices(2) 119878119908larr 0119901times119901

within-class scatter(3) 119878119887larr 0119901times119901

between-class scatter(4)(5) Compute within-class affinity matrix119882119908(6) for 119888 = 1 2 119862 do in a classwise manner(7) 119909

119888

119894119873119888

119894=1larr 119909

119895| 119897119895= 119888 the 119888th class data samples

(8) 119883 larr 119909119888

1 119909119888

2 119909

119888

119873119888 sample matrix

(9) 119882119894=(1119873119888times111times119873119888)

119873119888

(10)(11) Determine the local scaling(12) for 119894 = 1 2 119873119888 do(13) 119909

(119894)

119896larr the 119896th nearest neighbor of 119909119888

119894 119896 = 1 2 119870

(14) for 119896 = 1 2 119870 do(15) 119889

(119894)

119896= 119909119894minus 119909(119894)

1198962

(16) end for

(17) (119894)larr1

119870

119870

sum

119896=1

10038171003817100381710038171003817119909119894minus 119909(119894)

119896

10038171003817100381710038171003817

2

(18) 120588(119894)= radic

1

119870

119870

sum

119896=1

(119889(119894)

119896minus (119894))2

(19) end for(20)(21) Define local affinity matrix(22) for 119894 119895 = 1 2 119873119888 do(23) 119901(119897

119894) larr 119873

119888119873 prior probability

(24) 119860119894119895larr 119901(119897

119894) exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

(25) end for(26) 119860

119888= 119860

(27) end for(28)119882

119908= diag119882

11198822 119882

119862 in block diagonal manner

(29) 119860119908= diag119860

1 1198602 119860

119862 also in block diagonal manner

(30) for 119894 119895 = 1 2 119873 do(31)

119908(119894 119895) larr 119860

119908(119894 119895)119882

119908(119894 119895)

(32) end for(33)(34) Compute between-class affinity matrix119882119887

(35)119882119887larr(1119873times1

11times119873)

119873minus diag119882

11198822 119882

119862

(36) Let 119865nz denotes the nonzero flag of elements in119882119908 119865nz(119894 119895) = 1 if119882119908(119894 119895) = 0 119865nz(119894 119895) = 0 if119882119908(119894 119895) = 0

(37) 119865nz larr (1198821015840

119887= 0)

(38) 119860119887larr 0119873times119873

(39) 119860

119887(119865nz) = 119860119908(119865nz)

(40) 119860119887(not119865nz) = 1

(41) for 119894 119895 = 1 2 119873 do(42)

119887(119894 119895) larr 119860

119887(119894 119895)119882

119887(119894 119895)

(43) end for(44) Now construct Laplacian matrix for within affinity matrix

119908and between affinity matrix

119887

(45) Let(46)119863119908

119894119894= sum

119895

119908(119894 119895)119863119887

119894119894= sum

119895

119887(119894 119895)

(47) Then

Algorithm 2 Continued

Mathematical Problems in Engineering 9

(48) 119871119908= 119908minus 119863119908 119871119887= 119887minus 119863119887

(49) Construct two matrixs below(50) 119878

119887= 119883119871

119887119883119879 119878119908= 119883119871

119908119883119879

(51) Let 1205931 1205932 120593

119902 be the general eigenvector of

(52) 119878119887120593119894= 119894119878119908120593119894 forall119894 isin 1 2 119902

(53) with the corresponding eigenvalue in descending order 1ge 2ge sdot sdot sdot ge

119902

(54)(55) Finally the transformation matrix can be represented as(56) 119879 = radic120582

11205931 radic12058211205932 radic120582

1120593119902 isin R119901times119902

(57)(58) For a new test sample 119909

119905 the embedding 119911

119905is given by

(59) 119911119905= 119879119879119909119905isin R119902

Algorithm 2 Proposed PD-LFDA method

the conventional LDA thus our method is expected to bemore robust and significantly preponderant

For large scale data sets we discuss a scheme that canaccelerate the computation procedure of the within-scattermatrix 119878

119908 In our algorithm owning to the fact that we have

put penalty on the affinity matrix for different class samplesin constructing the between-scatter matrix the acceleratedprocedure will remain for further discussion

The within-class scatter 119878119908can be reformulated as

119878119908=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894119909119879

119894+ 119909119895119909119879

119895minus 119909119894119909119879

119895minus 119909119895119909119879

119894)

=

119873

sum

119894=1

(

119873

sum

119895=1

119908(119894 119895))119909

119894119909119879

119894minus

119873

sum

119894119895=1

119908(119894 119895) 119909

119894119909119879

119895

= 119883(119863119908minus 119908)119883119879

= 119883119908119883119879

(36)

Here

119863119908 (119894 119894) =

119873

sum

119895=1

119908(119894 119895)

119908= 119863119908minus 119908

(37)

119908can be block diagonal if all samples 119909

119894119873

119894=1are sorted

according to their labels This property implies that 119863119908and

119908can also be block diagonal matrix Hence if we compute

119878119908

through (36) then the procedure will be much moreefficient Similarly 119878

119887can also be formulated as

119878119887= 119883

119887119883119879

= 119883(119863119887minus 119887)119883119879

(38)

Nevertheless 119887is dense and can not be further sim-

plified However the simplified computational procedure of119908

saves for us part of time in a way In this paper weadopt the above procedure to accelerate 119878

119908and pursue 119878

119887

normally In addition to locality structure some papers showthat another property for example marginal information isalso important and should be preserved in the reduced spaceThe theory of extended LDA and LPP algorithm is developedrapidly recently Yan et al [33] summarized these algorithmsin a graph embedding framework and also proposed amarginal fisher analysis embedding (MFA) algorithm underthis framework

In MFA the criterion is characterized by intraclasscompactness and interclass marginal superability which isreplaced for thewithin-class scatter and between-class scatterseverally The intraclass relationship is reflected by an intrin-sic graph which is constructed by 119870-nearest neighborhoodsample data points in the same class while the interclasssuperability is mirrored by a penalty graph computed formarginal points from different classes Following this ideathe intraclass compactness is given as follows

119878119894= sum

119894119895 119894isin119873(119896)(119895)or 119895isin119873(119896)(119894)

10038171003817100381710038171003817119879119879119909119894minus 119879119879119909119895

10038171003817100381710038171003817

2

= 2119879119879119883 (119863 minus119882)119883

119879119879

(39)

where

119882(119894 119895) = 1 if 119894 isin 119873(119896) (119895) or 119895 isin 119873(119896) (119894) 0 otherwise

(40)

Here 119873(119896)(119895) represents the 119870-nearest neighborhood indexset of 119909

119895from the same class and119863 is the row sum (or column

sum) of 119882 119863(119894 119894) = sum119894119882119894119895 Interclass separability is indi-

cated by a penalty graph whose term is expressed as follows

119878119890= sum

119894119895 (119894119895)isin119875(119896)(119897119895)

or (119894119895)isin119875(119896)(119897119894)

10038171003817100381710038171003817119879119879119909119894minus 119879119879119909119895

10038171003817100381710038171003817

2

= 2119879119879119883(119863 minus )119883

119879119879

(41)

10 Mathematical Problems in Engineering

of which

(119894 119895) = 1 if (119894 119895) isin 119875(119896) (119897

119895) or (119894 119895) isin 119875(119896) (119897

119894)

0 otherwise(42)

Note that 119878119894and 119878

119890are corresponding to ldquowithin-scatter

matrixrdquo and ldquobetween-scatter matrixrdquo of the traditional LDAalternatively The optimal solution of MFA can be achievedby solving the following minimization problem That is

= arg119879

min 119879119879119883(119863 minus119882)119883

119879119879

119879119879119883(119863 minus )119883119879119879

(43)

We know that (43) is also a general eigenvalue decompositionproblem Let 119879PCA indicate the transformation matrix fromthe original space to PCA subspace with certain energyremaining and then the final projection of MFA is output as

119879MFA = 119879PCA (44)

As can be seenMFA constructs twoweightedmatrices119882and according to the intraclass compactness and interclassseparability In LFDA and PD-LFDA only one affinity isconstructed The difference lies in that the ldquoweightrdquo in LFDAand PD-LFDA is in the range of [0 1] according to the levelof difference Yet MFA distributes the same weight to its 119870-nearest neighborhoodsThe optimal solution ofMFA LFDAand PD-LFDA can be attributed to a general eigenvaluedecomposition problem Hence the idea of MFA LFDA andPD-LFDA is approximately similar in a certain interpretationRelationship with other methodologies can be analyzed in ananalogous way

4 Experimental Results

To illustrate the performance of PD-LFDA experimentson a real hyperspectral remote sensing image data setmdashAVIRIS Indian Pine 1992 are conducted in this sectionThe AVIRIS Indian Pines 1992 data set was gathered byNational Aeronautics and Space Administrator (NASA) withAirborne VisibleInfrared Imaging Spectrometer (AVIRIS)sensor over the Indian Pine test site in northwest Indians inJune 1992 This data set consists of 145 times 145 pixels and 224spectral reflectance bands ranging from 04 120583m to 245 120583mwith a spatial resolution of 20m The Indian Pines scene iscomposed of two-third agriculture and one-third forest orother natural perennial vegetation Some other landmarkssuch as dual lane highways rail line low density housing andsmaller roads are also in this image Since the scene was takenin June some main crops for example soybeans and cornare in their early growth stage with less than 5 coveragewhile the no-till min-till and clean-till indicate the amountof previous crop residue remaining The region map can bereferred to Figure 5(a) The 20 water absorption bands (ie[108ndash112 154ndash167] 224) were discarded

In this section performance of different dimensionreduction methods that is PCA LPP LFDA LDA JGLDAand RP [34] is compared with the proposed PD-LFDA

Table 1 Training set in AVIRIS Indian Pines 1992 database

ID Class Name1 Alfalfa (4618 3913)2 Corn-notill (1428136 952)3 Corn-mintill (83087 1048)4 Corn (23734 1434)5 Grass-pasture (48354 1118)6 Grass-trees (73071 973)7 Grass-pasture-mowed (2817 6071)8 Hay-windrowed (47850 1046)9 Oats (2015 7500)10 Soybean-notill (97286 884)11 Soybean-mintill (2455214 872)12 Soybean-clean (59354 911)13 Wheat (20528 1366)14 Woods (1265102 806)15 Buildings-Grass-Trees-Drives (38639 1010)16 Stone-Steel-Towers (9324 2581)Total 102491029 1004Numerical value in this table refers to number of samples number oftraining samples and pc respectively

Classification accuracy is reported via the concrete classifiersGenerally many dimension reduction research papers adopt119870-nearest neighborhood classifier (KNN) and support vectormachine (SVM) classifier to measure the performance of theextracting features after the dimension reduction where theoverall accuracy and kappa coefficient are detailed in thereports Hereby in this paper we also adopt KNN classifierand SVM classifier for performance measurement In KNNclassifier we select the value of 119870 as 1 5 and 9 so thatthree classifiers based on nearest neighborhoods are formedwhich are called 1NN 5NN and 9NN In SVM classifier weseek a hyperplane to separate classes in kernel-induced spacewhere the linear nonseparable classes in the original featurespace can be separated via kernel trick SVM as a robustand successful classifier has been widely used to evaluatethe performance of multifarious methods in many areas Forsimplicity and convenience we use LIBSVM package [35] forexperiments Accuracy of dimension reduced performancewill be reported by classified performance from SVM clas-sifier In the following schedule the feature subspace will becalculated at the first step from training samples by differentdimensional algorithms Table 1 gives a numerical statisticsof training samples corresponding to each class Then thenew sample will be projected into a low subspace by thetransformedmatrix Finally all the new samples are classifiedby SVM classifier

In this experiment a total of 1029 samples were selectedfor training and the remaining samples are used for testingNote that all the labeled samples in database are unbalancedand the available samples of each category differ dramaticallyThe following strategy is imposed for sample division A fixednumber of 15 samples are randomly selected to form thetraining sample yet the absent samples are randomly selected

Mathematical Problems in Engineering 11

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(b) 5NN

7 9 11 13 15Reduced space

0

01

02

03

04

05

06

07

08

09

1

Ove

rall

accu

racy

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(d) Liner SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF-SVM

Figure 2 Overall accuracy by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

12 Mathematical Problems in Engineering

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(b) 5NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(d) Linear SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF SVM

Figure 3 Kappa coefficient by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

Mathematical Problems in Engineering 13

(a) Pseudo 3-channel color image band = [12 79 140]

1

2

2

2

2

2 2

3

3

3

3

3

4

5

5

5

5

6

6 6 6

7

8

9

10

10

10 10

11

11

11

11

11 12

12 12

13 14

14

14 15

15

16

(b) Ground truth

(c) Distribution of training sample

Figure 4 Illustration of sample partition

from the remaining samples Under this strategy the trainingsamples and testing samples are listed in Table 1

Figure 2 shows the overall accuracy of different dimen-sion reduction methods applied to AVIRIS Indian 92AV3Cdata set The neighborhood of KNN classifier is selected as 15 and 9 respectively which produces three classifiers thatis 1NN 5NN and 9NN Three different kernel functionsare adopted for SVM classifier Then derived classifiers arealso used in this experiment that is linear SVM polynomialSVM and RBFSVM It can be deduced from Figures 2(a)sim2(c) that when the embedding space is greater than 5 pro-posed PD-LFDA performs the best while JGLDA performsthe worst The results produced by RP is slightly better thanJGLDA PCA LDA LPP and RP show the similar classifiedresults under KNN classifierThat is the proposed PD-LFDAoutperforms the others Meantime compared with LFDAthe proposed PD-LFDA leads to 2 more improvementson average Moreover it can be observed from (d) that theclassified accuracy increases steadily as the embedded spaceincreases However LDA demonstrates the highest overallaccuracywhen the reduced features vary to 9 while LFDAhas

the significant improvements when the number of reducedfeatures is greater than 9 This phenomenon of Figure 2(d)indicates the instability of linear SVM Nevertheless thesituation reversed for polynomial SVM and RBF SVM inFigures 2(e) and 2(f) wherein the proposed PD-LFDAwins a little improvement against LFDA and has significantimprovement compared with the others Inspired effects ofproposed PD-LFDA algorithm were achieved in all casesFurthermore Table 2 gives the detailed overall accuracyunder different feature dimensions using 3NN 7NN andRBFSVM classifiers which validates the feasibility of theproposed scheme in this paper

Figure 3 displays the results of kappa coefficient obtainedusing the different dimension reduction algorithms underKNN classifier and SVM classifiers The experimental cir-cumstance of Figure 3 is same as that of Figure 2 We canfind that from these results JGLDA performs the worst inmost cases except in Figure 3(e) The proposed PD-LFDAmethod outperforms the other methods and achieves thehighest kappa numerical value in most cases except usingthe linear SVM as the classifier In fact none of them work

14 Mathematical Problems in Engineering

686864717360

(a) PCA-7NN

687564707607

(b) LPP-7NN

792376538609

(c) LFDA-7NN

739070418130

(d) LDA-7NN

619556516209

(e) JGLDA-7NN

670962947164

(f) RP-7NN

799277288570

(g) PCA-RBFSVM

767473558288

(h) LPP-RBFSVM

837581518822

(i) LFDA-RBFSVM

728769288071

(j) LDA-RBFSVM

582453777121

(k) JGLDA-RBFSVM

763173278365

(l) RP-RBFSVM

Figure 5 Classified map generated by different dimension reduction methods where the overall accuracy kappa coefficient and averageclassification accuracy are listed at the top of each map respectively

steady in the case of linear SVM (Figure 3(d)) Note that thesituation is improved in polynomial SVM where the kappanumerical value of the proposed PD-LFDA is significantlybetter than the others All the achievements demonstrate therobustness of our contribution in PD-LFDA Simultaneouslyit is noticeable that LPP exhibits an average kappa levelThe kappa value gained by LPP is not seriously bad and isnot dramatically good The kappa results produced by RPare approximately the same as LPP A significant advantageof RP is the simple construction and computation wherethe accuracy is closed to LPP More details are summarizedin Table 3 It can be concluded that the kappa coefficientof proposed algorithm is higher than the other approacheswhich is more appropriate for the classification of HSI data

The visual results of all methods are presented in Figures5sim6 where the class labels are converted to pseudocolor

image The pseudocolor image of the hyperspectral imagefrom Indian 92AV3C database is shown in Figure 4(a) Theavailable labeled image which represents the truth groundis illustrated in Figure 4(b) where the labels are made byhuman The training samples are selected from the labeledimage represented as points in the image as shown inFigure 4(c) Each label number (ID) corresponds to eachclass name which is indexed in Table 1 In this experimentall the available labeled samples are used for testing whileapproximate 10 of samples are used for training Thesubspace is fixed to 13 (the number here is only used forreference it can be changed) For each experiment thedimension from original feature space is reduced to theobjective dimensionality thereafter the classified maps areinduced by 7NN classifier and RBF-SVM classifier Theoverall accuracy kappa coefficient and average accuracy are

Mathematical Problems in Engineering 15

837981698991

(a) PD-LFDA-7NN

848682798968

(b) D-LFDA-RBFSVM

Figure 6 Classified map of the proposed method

Table 2 Overall accuracy obtained using different dimensions offeature and different classifiers for Indian Pine Scene database ()

Item Dims7 9 11 13 15

3NNPCA 7007 6996 7026 7035 7087LPP 6575 6871 6835 6695 6546LFDA 7513 7895 7963 7992 8009LDA 6415 6606 6616 6596 6536PD-LFDA 7713 8094 816 8218 8263JGLDA 5405 5602 5704 5642 5751RP 6173 6444 6745 6632 6652

7NNPCA 6902 6902 691 6933 6972LPP 6782 7057 7027 6892 6681LFDA 7419 7777 7827 7863 7862LDA 6615 6909 6899 692 6882PD-LFDA 7736 809 8155 8172 8235JGLDA 5665 5811 5809 5812 5914RP 6194 6481 6696 6613 6603

RBF-SVMPCA 8051 7794 7718 7935 8176LPP 7101 7514 7614 7501 7314LFDA 7826 8187 82 8113 7915LDA 677 6834 6877 6642 6792PD-LFDA 7858 8267 8395 8423 8166JGLDA 5686 5606 5663 5689 5879RP 6957 7282 7441 7416 7543

also included at the top of each map respectively Figure 5displays the classifiedmaps of classicmethods in pseudocolorimages and the classified maps of proposed PD-LFDA arepresented in Figure 6 It can be observed from these mapsthat the best performance is achieved by PD-LFDA when7NN classifier is used in this case the overall accuracyis 8379 the kappa coefficient is 8169 and the averageaccuracy is 8991 Moreover the worst algorithm is JGLDAwhose overall accuracy is 6195 the kappa coefficient

Table 3 Kappa coefficient by different dimension reduction meth-ods and different classifiers applied to Indian Pine Scene database()

Item Dims7 9 11 13 15

3NNPCA 6592 658 6612 6624 6681LPP 6084 6422 6382 6219 605LFDA 7145 7586 7664 7697 7718LDA 5913 6126 6137 6118 6048PD-LFDA 7392 7824 7901 7966 8016JGLDA 4778 4992 5082 5006 5119RP 564 5963 6303 6172 6192

7NNPCA 6466 6466 6475 6501 6545LPP 6304 6629 6585 6429 6185LFDA 7047 7457 7515 7557 7557LDA 6132 646 6453 6478 6431PD-LFDA 7417 782 7893 7913 7985JGLDA 5029 519 5161 5159 5263RP 5638 5992 6237 6135 6124

RBF-SVMPCA 7767 747 7393 7636 7900LPP 6677 715 7264 7115 6921LFDA 7486 7915 7924 7822 7584LDA 629 6362 6427 6153 6323PD-LFDA 7551 8011 8159 8187 7899JGLDA 5109 5066 5133 5147 5369RP 653 6908 7085 706 7203

becomes 5651 and the average accuracy is only 6209Other methods such as PCA LPP and RP can produce thecomparable results and no one can outperform the otherHowever LDAoutperformsPCA LPP andRP yielding betterresult Similar conclusions can be achieved in the groupof RBF-SVM Generally proposed PD-LFDA significantlyoutperforms the rest in this experiment which indicates thecorrectness of improvements in proposed PD-LFDA

16 Mathematical Problems in Engineering

Table 4 Performance of dimension reduction on the whole labeled samples ()

Evaluated item MethodologiesPCA LDA LPP LFDA JGLDA RP PD-LFDA

7NNOverall accuracy 6868 739 6875 7923 6195 6709 8379Kappa coefficient 6471 7041 647 7653 5651 6294 8169Average accuracy 737 813 7607 8609 6209 7164 8991

RBF-SVMOverall accuracy 7992 7287 7674 8375 5824 7631 8486Kappa coefficient 7728 6928 7355 8151 5377 7327 8279Average accuracy 857 8071 8288 8822 7121 8365 8968

Finally details of assessment deduced by 7NN and RBF-SVM is summarized in Table 4 where the correspondingoverall accuracy kappa coefficient and average accuracy ofdifferent methods can be located collectively

5 Conclusions

In this paper we have analyzed local Fisher discriminantanalysis (LFDA) and found its weakness By replacing themaximum distance with local variance for the constructionof weight matrix introducing class prior probability into thecomputation of affinitymatrix an improved LFDA algorithmhas been proposed This novel approach is called PD-LFDAbecause the probability distribution (PD) is applied in LFDAalgorithm The proposed approach essentially can increasethe discriminant ability of transformed features in lowdimensional space The pattern found by the new approachis expected to be more accurate and coincides with thecharacter of HSI data and is conducive to classify HSI dataPD-LFDA has been evaluated on a real removing sensingAVIRIS Indian Pine 92AV3C data set We have compared theperformance of the proposed PD-LFDA with that of PCALPP LFDA LDA JGLDA and RP Both the numerical resultsand visual inspection of class maps have been obtained Inthe experiments KNN classifier and SVM classifier havebeen used We have argued that the proposed PD-LFDAexhibits the best performance and serves as a very effectivedimensionality reduction tool for high dimensional datasuch as hyperspectral image (HSI) data

Appendix

Procedure of Proposed Algorithm

The brief description of the algorithm to perform the pro-posed PD-LFDA method is already presented in Section 3The details of the algorithm are provided in Algorithm 2

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the Research of University ofMacau under Grants no MYRG205(Y1-L4)-FST11-TYY noMYRG187(Y1-L3)-FST11-TYY and no SRG010-FST11-TYYand by the National Natural Science Foundation of Chinaunder Grant no 61273244 This research project was alsosupported by the Science andTechnologyDevelopment Fund(FDCT) of Macau under Contract no 100-2012-A3

References

[1] W Li S Prasad Z Ye J E Fowler and M Cui ldquoLocality-preserving discriminant analysis for hyperspectral image clas-sification using local spatial informationrdquo in Proceedings ofthe 32nd IEEE International Geoscience and Remote SensingSymposium (IGARSS rsquo12) pp 4134ndash4137MunichGermany July2012

[2] HND LeM S Kim andD-H Kim ldquoComparison of singularvalue decomposition and principal component analysis appliedto hyperspectral imaging of biofilmrdquo in Proceedings of the IEEEPhotonics Conference (IPC rsquo12) pp 6ndash7 2012

[3] C K Chui and J Wang ldquoRandomized anisotropic transformfor nonlinear dimensionality reductionrdquo GEMmdashInternationalJournal on Geomathematics vol 1 no 1 pp 23ndash50 2010

[4] T V Bandos L Bruzzone and G Camps-Valls ldquoClassificationof hyperspectral images with regularized linear discriminantanalysisrdquo IEEE Transactions on Geoscience and Remote Sensingvol 47 no 3 pp 862ndash873 2009

[5] D Guangjun Z Yongsheng and J Song ldquoDimensionalityreduction of hyperspectral data based on ISOMAP algorithmrdquoin Proceedings of the 8th International Conference on ElectronicMeasurement and Instruments (ICEMI rsquo07) pp 3935ndash3938Xirsquoan China August 2007

[6] X Luo and M-F Jiang ldquoThe application of manifold learn-ing in dimensionality analysis for hyperspectral imageryrdquo inProceedings of the International Conference on Remote SensingEnvironment and Transportation Engineering (RSETE rsquo11) pp4572ndash4575 June 2011

[7] J Khodr and R Younes ldquoDimensionality reduction on hyper-spectral images a comparative review based on artificial datasrdquoin Proceedings of the 4th International Congress on Image andSignal Processing (CISP rsquo11) vol 4 pp 1875ndash1883 October 2011

Mathematical Problems in Engineering 17

[8] J Wen Z Tian H She and W Yan ldquoFeature extraction ofhyperspectral images based on preserving neighborhood dis-criminant embeddingrdquo in Proceedings of the 2nd InternationalConference on Image Analysis and Signal Processing (IASP rsquo10)pp 257ndash262 Zhejiang China April 2010

[9] Y-R Yeh S-Y Huang and Y-J Lee ldquoNonlinear dimensionreduction with kernel sliced inverse regressionrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 21 no 11 pp1590ndash1603 2009

[10] J He L Zhang Q Wang and Z Li ldquoUsing diffusion geometriccoordinates for hyperspectral imagery representationrdquo IEEEGeoscience and Remote Sensing Letters vol 6 no 4 pp 767ndash7712009

[11] J Peng P Zhang and N Riedel ldquoDiscriminant learninganalysisrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 38 no 6 pp 1614ndash1625 2008

[12] F S Tsai and K L Chan ldquoDimensionality reduction techniquesfor data explorationrdquo in Proceedings of the 6th InternationalConference on Information Communications and Signal Process-ing pp 1ndash5 December 2007

[13] M D Farrell Jr and R M Mersereau ldquoOn the impact of PCAdimension reduction for hyperspectral detection of difficulttargetsrdquo IEEE Geoscience and Remote Sensing Letters vol 2 no2 pp 192ndash195 2005

[14] S Prasad and L M Bruce ldquoLimitations of principal com-ponents analysis for hyperspectral target recognitionrdquo IEEEGeoscience and Remote Sensing Letters vol 5 no 4 pp 625ndash629 2008

[15] J Yu Q Tian T Rui and T S Huang ldquoIntegrating discriminantand descriptive information for dimension reduction and clas-sificationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 17 no 3 pp 372ndash377 2007

[16] J Kong S Wang J Wang L Ma B Fu and Y Lu ldquoAnovel approach for face recognition based on supervisedlocality preserving projection and maximummargin criterionrdquoin Proceedings of the International Conference on ComputerEngineering and Technology (ICCET rsquo09) vol 1 pp 419ndash423Singapore January 2009

[17] M Loog and R P W Duin ldquoLinear dimensionality reductionvia a heteroscedastic extension of LDA the Chernoff criterionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 26 no 6 pp 732ndash739 2004

[18] A C Jensen A Berge and A S Solberg ldquoRegressionapproaches to small sample inverse covariance matrix estima-tion for hyperspectral image classificationrdquo IEEE TransactionsonGeoscience andRemote Sensing vol 46 no 10 pp 2814ndash28222008

[19] J Jin BWang and L Zhang ldquoA novel approach based on Fisherdiscriminant null space for decomposition of mixed pixels inhyperspectral imageryrdquo IEEE Geoscience and Remote SensingLetters vol 7 no 4 pp 699ndash703 2010

[20] J Wen Z Tian X Liu and W Lin ldquoNeighborhood preservingorthogonal pnmf feature extraction for hyperspectral imageclassificationrdquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 6 no 2 pp 759ndash7682013

[21] Y Ren G Zhang G Yu and X Li ldquoLocal and global structurepreserving based feature selectionrdquoNeurocomputing vol 89 pp147ndash157 2012

[22] Z Fan Y Xu and D Zhang ldquoLocal linear discriminant analysisframework using sample neighborsrdquo IEEE Transactions onNeural Networks vol 22 no 7 pp 1119ndash1132 2011

[23] Y Wang S Huang D Liu and B Wang ldquoResearch advanceon band selection-based dimension reduction of hyperspectralremote sensing imagesrdquo in Proceedings of the 2nd InternationalConference on Remote Sensing Environment and TransportationEngineering (RSETE rsquo12) pp 1ndash4 IEEE Nanjing China June2012

[24] B Waske S van der Linden J A Benediktsson A Rabe andP Hostert ldquoSensitivity of support vector machines to randomfeature selection in classification of hyperspectral datardquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 7pp 2880ndash2889 2010

[25] M Sugiyama ldquoLocal fisher discriminant analysis for superviseddimensionality reductionrdquo in Proceedings of the 23rd Interna-tional Conference onMachine Learning (ICML rsquo06) pp 905ndash912ACM June 2006

[26] Y Cheng ldquoMean shift mode seeking and clusteringrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol17 no 8 pp 790ndash799 1995

[27] W Li S Prasad J E Fowler and L M Bruce ldquoLocality-preserving dimensionality reduction and classification forhyperspectral image analysisrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 50 no 4 pp 1185ndash1198 2012

[28] L Zelnik-Manor and P Perona ldquoSelf-tuning spectral cluster-ingrdquo in Proceedings of the 18th Annual Conference on NeuralInformation Processing Systems pp 1601ndash1608 December 2004

[29] W K Wong and H T Zhao ldquoSupervised optimal localitypreserving projectionrdquo Pattern Recognition vol 45 no 1 pp186ndash197 2012

[30] X He and P Niyogi ldquoLocality preserving projectionsrdquo inAdvances in Neural Information Processing Systems 16 SThrunL Saul and B Scholkopf Eds MIT Press Cambridge MassUSA 2004

[31] H Wang S Chen Z Hu and W Zheng ldquoLocality-preservedmaximum information projectionrdquo IEEE Transactions on Neu-ral Networks vol 19 no 4 pp 571ndash585 2008

[32] M Sugiyama ldquoDimensionality reduction ofmultimodal labeleddata by local fisher discriminant analysisrdquo Journal of MachineLearning Research vol 8 pp 1027ndash1061 2007

[33] S Yan D Xu B Zhang H J Zhang Q Yang and S Lin ldquoGraphembedding and extensions a general framework for dimen-sionality reductionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 40ndash51 2007

[34] C K Chui and J Wang ldquoDimensionality reduction of hyper-spectral imagery data for feature classificationrdquo in Handbookof Geomathematics pp 1005ndash1047 Springer Heidelberg Ger-many 2010

[35] C-C Chang and C-J Lin ldquoLIBSVM a Library for supportvector machinesrdquo ACM Transactions on Intelligent Systems andTechnology vol 2 no 3 article 27 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Subspace Learning via Local Probability ...downloads.hindawi.com/journals/mpe/2015/145136.pdf · and applicable toolkits were engendered one a er another. Hyperspectral

Mathematical Problems in Engineering 3

the number of all samples is the total sum of each class Let 119909119888119894

be the 119894th sample of the 119888th class Then the correspondingsample mean becomes 119898

119888= (1119873

119888) sum119873119888

119894=1119909119888

119894 yet the data

center of all samples is denoted by 119898 = (1119873)sum119873

119894=1119909119894

Suppose that the data set 119883 in 119901-dimensional hyperspaceis distributed on a low 119902-dimensional subspace A generalproblem of linear discriminant is to find a transformation119879 isin R119901times119902 that maps the 119901-dimensional data into a low119902-dimensional subspace data by 119884 = 119879

119879119883 such that

each 119910119894represents 119909

119894without losing useful information The

transformed matrix 119879 is pursued by different methods anddifferent objective function resulting in different algorithm

21 Fisherrsquos Linear Discriminant Analysis (LDA) LDA intro-duces the within-scatter matrix 119878

119908and between-scatter

matrix 119878119887to describe the distribution of data samples

119878119908=

119862

sum

119888=1

119873119888

sum

119894=1

(119909119888

119894minus 119898119888) (119909119888

119894minus 119898119888)119879 (1)

119878119887=

119862

sum

119888=1

119873119888(119898119888minus 119898) (119898

119888minus 119898)119879 (2)

Fisher criterion seeks a transformation 119879 that maximizedthe between-class scatter while minimized the within-classscatter This can be achieved by optimizing the followingobjective function

119879LDA = arg max119879isinR119901times119902

tr (119879119879119878119887119879)

tr (119879119879119878119908119879) (3)

It is implicitly assumed that 119879119879119878119908119879 is full rank Under

this assumption the problem can then be attributed to thegeneralized eigenvectors 120593

1 1205932 120593

119889 by solving

119878119887120593 = 120582119878

119908120593 (4)

Finally the solution of 119879LDA is given by 119879LDA =

1205931 1205932 120593

119902 which are associated with the first 119902

largest eigenvalues 1205821ge 1205822ge sdot sdot sdot ge 120582

119902 Since the rank of

between-class scatter matrix 119878119887is at most 119862 minus 1 there are

119862minus 1meaningful features in conventional LDA To deal withthis issue a regularization procedure is essential in practice

22 Locality Preserve Projection (LPP) Adrawback of LDA isthat it does not consider the local structure among data points[29] and the distribution of realHSI data is oftenmultimodalLocality preserving projection meets this requirement [30]The goal of LPP is to preserve the local structure of neighbor-hood points Toward this goal a graph ismodeled explicitly todescribe the relationship using 119896-nearest neighborhood Let119860 denote the affinity matrix where 119860(119894 119895) isin [0 1] representsthe similarity between points119909

119894and119909119895The larger the value of

119860(119894 119895) the closer the relationship between 119909119894and119909119895 A simple

and effective way to define affinity matrix 119860 is given by

119860 (119894 119895) =

exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

1205722) if 119909

119894isin KNN (119909

119895 119896)

or 119909119895isin KNN (119909

119894 119896)

0 otherwise(5)

where lowast 2 denotes the square 2-norm Euclidean distance120572 is a tuning parameter and KNN(119909 119896) represents the 119870-nearest neighborhoods of 119909 under parameter 119896

The transformed matrix of LPP is achieved in the follow-ing criterion [31]

119879LPP = arg min119879isinR119901times119902

1

2

119899

sum

119894119895=1

119860 (119894 119895)10038171003817100381710038171003817119910119894minus 119910119895

10038171003817100381710038171003817

2

st 119879119879119883119863119883119879119879 = 119868

(6)

where 119863 = diag(119863119894119894) is a diagonal matrix whose entries are

the column sum (also can be a row sum since119860 is symmetric)of 119860 that is 119863

119894119894= sum119895119860119894119895 Arbitrary scaling invariance and

degeneracy are guaranteed by the constraint of (6)The solution of LPP problem can be gained by solving the

eigenvector problem of

119883119871119883119879120593 = 120582119883119863119883

119879120593 (7)

where 119871 equiv 119863 minus 119860 denotes the graph-Laplacian matrix inthe community of spectral analysis and can be viewed as thediscrete version of Laplace Beltrami operator on a compactRimannian manifold [29] And finally the transformationmatrix 119879 is given by 119879LPP = 1205931 1205932 120593119902 isin R119901times119902 thatcorrespond to eigenvalue 0 = 120582

0le 1205821le 1205822le sdot sdot sdot le 120582

119902le

sdot sdot sdot le 120582119896

23 Local Fisher Discriminant Analysis (LFDA) Local Fisherdiscriminant analysis (LFDA) [32] measures the ldquoweightsrdquoof two data points by the corresponding distance and thenthe affinity matrix is calculated by these weights Notethat the ldquopairwiserdquo representation of within-scatter matrixand between-scatter matrix is very important for LFDAFollowing simple algebra steps the within-scatter matrix (1)of LDA can be transformed into the following forms

119878119908=

119862

sum

119888=1

119873119888

sum

119894=1

(119909119888

119894minus 119898119888) (119909119888

119894minus 119898119888)119879

=

119862

sum

119888=1

119873119888

sum

119894=1

(119909119888

119894minus1

119873119888

119873119888

sum

119895=1

119909119888

119895)(119909

119888

119894minus1

119873119888

119873119888

sum

119895=1

119909119888

119895)

119879

=

119873

sum

119894=1

119909119894119909119879

119894minus

119862

sum

119888=1

1

119873119888

119873119888

sum

119894119895=1

119909119888

119894119909119888

119895

119879

=

119873

sum

119894=1

(

119873

sum

119895=1

119875119908(119894 119895))119909

119894119909119879

119894minus

119873

sum

119894119895=1

119875119908(119894 119895) 119909

119894119909119879

119895

4 Mathematical Problems in Engineering

=1

2

119873

sum

119894119895=1

119875119908(119894 119895) (119909

119894119909119879

119894+ 119909119895119909119879

119895minus 119909119894119909119879

119895minus 119909119895119909119879

119894)

=1

2

119873

sum

119894119895=1

119875119908(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

(8)

where

119875119908(119894 119895) =

1

119873119888 if 119897

119894= 119897119895= 119888

0 if 119897119894= 119897119895

(9)

Let 119878119905be the total mixed matrix of LDA and then we gain

119878119887= 119878119905minus 119878119908

=1

2

119873

sum

119894119895=1

119875119887(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

(10)

where

119875119887(119894 119895) =

1

119873minus1

119873119888 if 119897

119894= 119897119895= 119888

1

119873 if 119897

119894= 119897119895

(11)

LFDA is achieved by weighting the pairwise data points

119878119908=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

119878119887=1

2

119873

sum

119894119895=1

119887(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

(12)

where 119908(119894 119895) and

119887(119894 119895) denote the weight matrix of

different pairwise points for the within-class samples andbetween-class samples respectively

119908(119894 119895) equiv

119882(119894 119895)

119873119888 if 119897

119894= 119897119895= 119888

0 if 119897119894= 119897119895

(13)

119887(119894 119895) equiv

119882(119894 119895) (1

119873minus1

119873119888) if 119897

119894= 119897119895= 119888

1

119873 if 119897

119894= 119897119895

(14)

where119882 indicates the affinity matrixThe construction of119882is critical for the performance of classified accuracy therebythe investigation of construction is in great need to be furtherelaborated in the following section

3 Proposed Scheme

The calculation of (13) and (14) is very important to theperformance of LFDA There are many methods to computethe affinitymatrix119882The simplest one is that119882 is equivalentto a constant that is

119882(119894 119895) equiv 119886 (15)

where 119886 in the above equation is a real nonnegative numberHowever the equations of (13) and (14) are derived to thestate-of-the-art Fisherrsquos linear discriminant analysis underthis construction

Another construction adopts the heat kernel derivedfrom LPP

119882(119894 119895) = exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

1205902) (16)

where 120590 is a tuning parameter Yet the affinity is valued by thedistance of data points and the computation is too simple torepresent the locality of data patches Amore adaptive version[26] of (16) is proposed as follows

119882(119894 119895) =

exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

1205902) if 119909

119894isin KNN (119909

119895 119870)

or 119909119895isin KNN (119909

119894 119870)

0 otherwise(17)

Compared with the former computation (17) is in con-junction with 119870-nearest data points which is computation-ally fast and light Moreover the property of local patches canbe characterized by (17) However the affinity defined in (16)and (17) is globally computed thus it may be apt to overfitthe training points and be sensitive to noise Furthermorethe density of HSI data pointsmay vary according to differentpatches Hence a local scaling technique is proposed inLFDA to cope with this issue [29] where the sophisticatedcomputation is given by

119882(119894 119895) =

exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

) if 119909119894isin KNN (119909

119895 119870)

or 119909119895isin KNN (119909

119894 119870)

0 otherwise(18)

where 120588119894denotes the local scaling around the corresponding

sample 119909119894with the following definition

120588119894=10038171003817100381710038171003817119909119894minus 119909119870

119894

10038171003817100381710038171003817 (19)

where 119909119870119894represents the 119870th nearest neighbor of 119909

119894 lowast 2

denotes the square Euclidean distance and119870 is a self-tuningpredefined parameter

To simplify the calculation many researches considered afixed value of119870 and a recommended value of119870 = 7 is studiedin [1 28] Note that 120588

119894is used to represent the distribution

of local data around sample 119909119894 However the above work

ignored the distribution around each individual sample Thediversity of adjacent HSI pixels is approximate thus thespectrum of the neighboring landmarks has great similarityThat is the pixels of HSI data which have resembling spec-trums tend to be of the same landmark This phenomenonindicates that the adjacency of local patches not only lies in

Mathematical Problems in Engineering 5

the spectrum space but also in the spatial space For a localpoint the calculation of making use of the diversity of its119870thnearest neighborhoods is not fully correct

An evident example is illustrated in Figure 1 where twogroups of points have different distributions In group (a)most neighbor points are closed to point 119909

0 while in group

(b) most neighbor points are far from point 1199090 However the

measurement of two cases are the same according to (19)Thiscan be found in Figure 1 where the distances between point1199090and its 119870th nearest neighborhoods (119870 = 7) are same in

both distributions which can be shown in Figures 1(a) and1(b)119871

1= 1198712This example indicates that the simplification of

local distribution by the distance between the sample 119909119894and

the 119870th nearest neighbor sample is unreasonable Actuallythe result by using of this simplification may raise someerrors

Based on the discussion above a novel approach whichis called PD-LFDA is proposed to overcome the weakness ofLFDA To be specific PD-LFDA incorporates two key pointsnamely

(1) The class prior probability is applied to compute theaffinity matrix

(2) The distribution of local patch is represented by theldquolocal variancerdquo instead of the ldquofarthest distancerdquo toconstruct the weight matrix

The proposed approach essentially increases the discriminantability of transformed features in low dimensional space Thepattern found by PD-LFDA is expected to be more accurateand coincids with the character of HSI data and is conduciveto classify HSI data

In this way a more sophisticated construction of affinitymatrix which is derived from [29] is proposed as follows

119882(119894 119895)

=

119901 (119897119894)2 exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)

sdot(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)) if 119897119894= 119897119895= 119888

0 if 119897119894= 119897119895

(20)

where 119901(119897119894) stands for the class prior probability of class 119909

119894

and 120588119894indicates the local variance Note that the denominator

item of (13) is 1119873119888 which will cancel out our prior effectif we use 119901(119897

119894) to replace 119901(119897

119894)2 (the construction of 119901(119897

119894)

will be given in (21)) Different part of this derivation playsthe same role as the original formulation for example forthe last item on one hand it plays the role of intraclassdiscriminating weight and on the other hand the productresult of119882 may reach zero if the Euclidean square distance sdot is very small for some data points For this case an extraitem (1 + exp(minus119909

119894minus 1199091198952120588119894120588119895)) is added to the construc-

tion of intraclass discriminating weight to prevent accuracytruncation By doing so our derivation can be viewed as an

integration of class prior probability the local weight andthe discriminating weight This construction is expected topreserve both the local neighborhood structure and the classinformation Besides this construction is expected to sharethe same advantages detailed in the original work

It is clear that (20) consists of two new factors comparedwith LFDA method (1) class prior probability 119901(119897

119894) and (2)

local variance 120588119894

Suppose class 119909119894to be class 119888 that is 119897

119894= 119888 so that the

probability of class 119909119894can be calculated by

119901 (119897119894) = 119901 (119888) =

119873119888

119873 (21)

where 119873119888 is the number of the samples in class 119888 whole 119873denotes the total number of samples and119873 = sum119862

119888=1119873119888

Please note that the item (1+exp(minus119909119894minus1199091198952120588119894120588119895)) in (20)

is used to prevent the extra rounding error produced from thefirst two items and to keep the total value of

119901 (119897119894)2 exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

(22)

which does not reach the minimum Here 120588lowastdenotes the

local scaling around 119909lowast In this paper a local scaling 119909

lowastis

measured by the standard deviation of local square distanceAssume that 119909(119894)

1 119909(119894)

2 119909

(119894)

119870are the119870-nearest samples of 119909

119894

and then the square distance between 119909119894and 119909(119894)

119896is given by

119889(119894)

119896=10038171003817100381710038171003817119909119894minus 119909(119894)

119896

10038171003817100381710038171003817

2

119896 = 1 2 119870 (23)

The corresponding mean (119894) can be defined as

(119894)=1

119870

119870

sum

119896=1

119889(119894)

119896

=1

119870

119870

sum

119896=1

10038171003817100381710038171003817119909119894minus 119909(119894)

119896

10038171003817100381710038171003817

2

(24)

where lowast 2 represents a square Euclidean distance and 119870 isa predefined parameter whose recommended value is 119870 = 7The standard deviation can be calculated as

120588119894= radic

1

119870

119870

sum

119896=1

(119889(119894)

119896minus (119894))

2

(25)

Note that in the above equation the item 1119870 becomesa constant that can be shifted outside Thus an equivalentformula is given by

120588119894=1

119870radic

119870

sum

119896=1

(119889(119894)

119896minus (119894))

2

(26)

Similar procedure can be deduced to 119909119895 Hence we have

120588119898=1

119870

119870

sum

119894=1

radic100381710038171003817100381710038171003817119909(119898119896)

119894

100381710038171003817100381710038171003817

2

minus1

119870

119870

sum

119895=1

100381710038171003817100381710038171003817119909(119898119896)

119895

100381710038171003817100381710038171003817

2

(27)

6 Mathematical Problems in Engineering

x1

x2

x3

x4

x5

x6

x7

L1

x0

(a)

x1

x2

x3

x4

x5

x6

x7

L2

x0

(b)

Figure 1 Different distributions of 1199090and the corresponding 119870-nearest neighborhoods (119870 = 7) (a) Most neighbors are closed to point 119909

0

(b) Most neighbors are far from point 1199090 The distances between point 119909

0and its 119870th nearest neighbors are the same in both distributions

1198711= 1198712

Comparing (19) with (27) it is noticeable that (28) holds

120588119898le 120588119894 (28)

Compared with the former definitions our definition has atleast the following advantages

(i) By incorporating the prior probability of each classwith local technique 119901(119897

119894) the proposed scheme is

expect to be a benefit for the classified accuracy(ii) The representation of local patches equation (26) is

described by local standard deviation 120588119894rather than

absolute diversity in (19) which is more accurate inmeasuring the local variance of data samples

(iii) Compared with the global calculation the proposedcalculation is taken on local patches which is efficientin getting rid of over-fitting

(iv) The proposed local scaling technique meets the char-acter of HSI data which is more applicable for theprocessing of hyperspectral image in real applications

Based on the above affinity defined an extended affinitymatrix can also be defined in a similar way Our definitiononly provides a heuristic exploration for reference Theaffinity can be further sparse for example by introducing theidea of 120576-nearest neighborhoods [31]

Theoptimal solution of improved scheme can be achievedby maximize the following criterion

119879PD-LFDA equiv arg max119879isinR119901times119902

tr (119879119879119878119887119879)

tr (119879119879119878119908119879)

(29)

It is evident that (29) has the similar form of (3) Thisfinding enlightens us that the transformation119879 can be simplyachieved by solving the generalized eigenvalue decomposi-tion of 119878minus1

119908119878119887 Moreover Let 119866 isin R119902times119902 be a 119902-dimensional

invertible square matrix It is clear that 119879PD-LFDA119866 is alsoan optimal solution of (29) This property indicates that

the optimal solution is not uniquely determined becauseof arbitrary arithmetic transformation of 119879PD-LFDA119866 Let 120593119894be the eigenvector of 119878minus1

119908119878119887corresponding to eigenvalue

119894

that is 119878119887120593119894= 119894119878119908120593119894 To cope with this issue a rescaling

procedure is adopted [25] Each eigenvector 120593119894119902

119894=1is rescaled

to satisfy the following constraint

120593119894119878119908120593119895= 1 if 119894 = 1198950 if 119894 = 119895

(30)

Then each eigenvector is weighted by the square root of itsassociated eigenvalue The transformed matrix 119879PD-LFDA ofthe proposed scheme is finally given by

119879PD-LFDA = radic11205931 radic21205932 radic1199021120593119902 isin R119901times119902 (31)

with descending order 1ge 2ge sdot sdot sdot ge

119902

For a new testing points 119909 the projected point in the newfeature space can be captured by 119910 = 119879119879PFDA119909 thus it can befurther analyzed in the transformed space

According to the above analysis we can design an algo-rithm which is called PD-LFDA Algorithm to perform ourproposed method The detailed description of this algorithmcan be found in the appendix (Algorithm 2) A summary ofthe calculation steps of PD-LFDA Algorithm is presented inAlgorithm 1

The advantage of PD-LFDA is discussed as followsFirstly to investigate the rank of the between-class scatter

matrix 119878119887of LDA 119878

119887can be rewritten as

119878119887=

119862

sum

119897=1

119873119897(119898119897minus 119898) (119898

119897minus 119898)119879

= [1198731(119898119897minus 119898) 119873

2(119898119897minus 119898) 119873

119871(119898119897minus 119898)]

sdot [1198731(119898119897minus 119898) 119873

2(119898119897minus 119898) 119873

119871(119898119897minus 119898)]

119879

(32)

Mathematical Problems in Engineering 7

Input HSI training samples119883 isin R119901times119873 dimensionality to be embedded 119902 the parameter 119870 of 119870NNand a test sample 119909

119905isin R119901

Step 1 For each sample 119909119894from the same class calculate the119882

119894119895by (20)

where the local scaling factor 120588119894is calculated via (26) or (27)

Step 2 Equations (13) and (14) can be globally and uniformly transformed into an equivalent formula via119908= 119882 sdot 119882

1

119887= 119882 sdot (119882

2minus1198821)

(i)

where the operator 119860 sdot 119861 denotes the dot product between 119860 and 119861 and

119882(119894 119895) =

119901(119897119894)2 exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

if 119897119894= 119897119895= 119888

if 119897119894= 119897119895

0

(iia)

1198821=

1

119873119888 if 119897

119894= 119897119895= 119888

0 others(iib)

1198822=1

119873(1119873times1

11times119873) (iic)

By the above formulas the product of elements in different matrices can be achieved via dot productbetween matrices The equations (iia) (iib) and (iic) can be gained by integrating the number ofeach class119873119888 the number of total training samples119873 and the local scaling 120588

119894 then matrices119882119882

11198822

can be calculatedStep 3 Construct within-scatter matrix

119908and between-scatter matrix

119887 according to (i)

Step 4 Define Laplacian matrices 119871lowast below119871lowast= 119863lowastminus lowast (iii)

where119863lowast is the row sum or column sum of119882lowast119863lowast119894119894= sum119895lowast(119894 119895) (or119863lowast

119894119894= sum119894lowast(119894 119895)) and the

notation lowast denotes one letter in 119908 119887Step 5 On the basis of (29) and (30) the transformation matrix can be achieved viaeigenvectors 119879 = radic

11205931 radic21205932 radic

1199021120593119902 119879 isin R119901times119902 that corresponding the 119902 leading

eigenvalues 119887120593119894= 119894119908120593119894in solving the general problem of

119887120593119894= 119894119908120593119894

Step 6 For a testing sample 119909119905isin R119901 the extracted feature is 119911

119905= 119879119879119909119905isin R119902

Output Transformation matrix 119879 and the extracted feature 119911119905

Algorithm 1 PD-LFDA Algorithm

Thereby

rank (119878119887) ⩽ rank ([119873

1(119898119897minus 119898) 119873

2(119898119897minus 119898)

119873119871(119898119897minus 119898)]) ⩽ 119862 minus 1

(33)

It is easy to infer that the rank of the between-class scattermatrix 119878

119887is119862minus1 atmost thus there are up to119862minus1meaningful

subfeatures that can be extracted Thanks to the help ofaffinity matrix 119882 when compared with the conventionalLDA the reduced subspace of proposed PD-LFDA can beany subdimensional space On the other hand the classicallocal fisherrsquos linear discriminant only weights the value ofsample pairs in the same classes while our method also takesin account the sample pairs in different classes Hereafter theproposedmethod will bemore flexible and the results will bemore adaptiveThe objective function of proposed method isquite similar to the conventional LDA hereby the optimalsolution is almost same as the conventional LDA whichindicates that it is also simple to implement and easy to revise

To further explore the relationship of LDA and PD-LFDAwe now rewrite the objective function of LDAandPD-LFDA respectively

119879LDA = arg max119879isinR119901times119902

trace 119879119879119878119887119879

subject to 119879119879119878119908119879 = 119868

(34)

119879PD-LFDA equiv arg max119879isinR119901times119902

trace 119879119879119878119887119879

subject to 119879119879119878119908119879 = 119868

(35)

This implies that LDA tries to maximize the between-class scatter and simultaneously constraint the within-classscatter to a certain level However such restriction is hardto constraint and no relaxation is imposed When the data isnot a single modal that is multimodal or unknown modalLDA often fails On the other hand benefiting from theflexible designing of affinity matrix119882 PD-LFDA gains morefreedom in (35) That is the separability of PD-LFDA will bemore distinct and the degree of freedom remains more than

8 Mathematical Problems in Engineering

Input HSI data samples119883 = 1199091 1199092 119909

119873 isin R119901times119873 the objective dimension to be embedded 119902

the nearest neighbor parameter 119870 (default 119870 equiv 7) and the test sample 119909119905isin R119901

Output Transformation matrix 119879 isin R119901times119902Steps are as follows

(1) Initialize matrices(2) 119878119908larr 0119901times119901

within-class scatter(3) 119878119887larr 0119901times119901

between-class scatter(4)(5) Compute within-class affinity matrix119882119908(6) for 119888 = 1 2 119862 do in a classwise manner(7) 119909

119888

119894119873119888

119894=1larr 119909

119895| 119897119895= 119888 the 119888th class data samples

(8) 119883 larr 119909119888

1 119909119888

2 119909

119888

119873119888 sample matrix

(9) 119882119894=(1119873119888times111times119873119888)

119873119888

(10)(11) Determine the local scaling(12) for 119894 = 1 2 119873119888 do(13) 119909

(119894)

119896larr the 119896th nearest neighbor of 119909119888

119894 119896 = 1 2 119870

(14) for 119896 = 1 2 119870 do(15) 119889

(119894)

119896= 119909119894minus 119909(119894)

1198962

(16) end for

(17) (119894)larr1

119870

119870

sum

119896=1

10038171003817100381710038171003817119909119894minus 119909(119894)

119896

10038171003817100381710038171003817

2

(18) 120588(119894)= radic

1

119870

119870

sum

119896=1

(119889(119894)

119896minus (119894))2

(19) end for(20)(21) Define local affinity matrix(22) for 119894 119895 = 1 2 119873119888 do(23) 119901(119897

119894) larr 119873

119888119873 prior probability

(24) 119860119894119895larr 119901(119897

119894) exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

(25) end for(26) 119860

119888= 119860

(27) end for(28)119882

119908= diag119882

11198822 119882

119862 in block diagonal manner

(29) 119860119908= diag119860

1 1198602 119860

119862 also in block diagonal manner

(30) for 119894 119895 = 1 2 119873 do(31)

119908(119894 119895) larr 119860

119908(119894 119895)119882

119908(119894 119895)

(32) end for(33)(34) Compute between-class affinity matrix119882119887

(35)119882119887larr(1119873times1

11times119873)

119873minus diag119882

11198822 119882

119862

(36) Let 119865nz denotes the nonzero flag of elements in119882119908 119865nz(119894 119895) = 1 if119882119908(119894 119895) = 0 119865nz(119894 119895) = 0 if119882119908(119894 119895) = 0

(37) 119865nz larr (1198821015840

119887= 0)

(38) 119860119887larr 0119873times119873

(39) 119860

119887(119865nz) = 119860119908(119865nz)

(40) 119860119887(not119865nz) = 1

(41) for 119894 119895 = 1 2 119873 do(42)

119887(119894 119895) larr 119860

119887(119894 119895)119882

119887(119894 119895)

(43) end for(44) Now construct Laplacian matrix for within affinity matrix

119908and between affinity matrix

119887

(45) Let(46)119863119908

119894119894= sum

119895

119908(119894 119895)119863119887

119894119894= sum

119895

119887(119894 119895)

(47) Then

Algorithm 2 Continued

Mathematical Problems in Engineering 9

(48) 119871119908= 119908minus 119863119908 119871119887= 119887minus 119863119887

(49) Construct two matrixs below(50) 119878

119887= 119883119871

119887119883119879 119878119908= 119883119871

119908119883119879

(51) Let 1205931 1205932 120593

119902 be the general eigenvector of

(52) 119878119887120593119894= 119894119878119908120593119894 forall119894 isin 1 2 119902

(53) with the corresponding eigenvalue in descending order 1ge 2ge sdot sdot sdot ge

119902

(54)(55) Finally the transformation matrix can be represented as(56) 119879 = radic120582

11205931 radic12058211205932 radic120582

1120593119902 isin R119901times119902

(57)(58) For a new test sample 119909

119905 the embedding 119911

119905is given by

(59) 119911119905= 119879119879119909119905isin R119902

Algorithm 2 Proposed PD-LFDA method

the conventional LDA thus our method is expected to bemore robust and significantly preponderant

For large scale data sets we discuss a scheme that canaccelerate the computation procedure of the within-scattermatrix 119878

119908 In our algorithm owning to the fact that we have

put penalty on the affinity matrix for different class samplesin constructing the between-scatter matrix the acceleratedprocedure will remain for further discussion

The within-class scatter 119878119908can be reformulated as

119878119908=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894119909119879

119894+ 119909119895119909119879

119895minus 119909119894119909119879

119895minus 119909119895119909119879

119894)

=

119873

sum

119894=1

(

119873

sum

119895=1

119908(119894 119895))119909

119894119909119879

119894minus

119873

sum

119894119895=1

119908(119894 119895) 119909

119894119909119879

119895

= 119883(119863119908minus 119908)119883119879

= 119883119908119883119879

(36)

Here

119863119908 (119894 119894) =

119873

sum

119895=1

119908(119894 119895)

119908= 119863119908minus 119908

(37)

119908can be block diagonal if all samples 119909

119894119873

119894=1are sorted

according to their labels This property implies that 119863119908and

119908can also be block diagonal matrix Hence if we compute

119878119908

through (36) then the procedure will be much moreefficient Similarly 119878

119887can also be formulated as

119878119887= 119883

119887119883119879

= 119883(119863119887minus 119887)119883119879

(38)

Nevertheless 119887is dense and can not be further sim-

plified However the simplified computational procedure of119908

saves for us part of time in a way In this paper weadopt the above procedure to accelerate 119878

119908and pursue 119878

119887

normally In addition to locality structure some papers showthat another property for example marginal information isalso important and should be preserved in the reduced spaceThe theory of extended LDA and LPP algorithm is developedrapidly recently Yan et al [33] summarized these algorithmsin a graph embedding framework and also proposed amarginal fisher analysis embedding (MFA) algorithm underthis framework

In MFA the criterion is characterized by intraclasscompactness and interclass marginal superability which isreplaced for thewithin-class scatter and between-class scatterseverally The intraclass relationship is reflected by an intrin-sic graph which is constructed by 119870-nearest neighborhoodsample data points in the same class while the interclasssuperability is mirrored by a penalty graph computed formarginal points from different classes Following this ideathe intraclass compactness is given as follows

119878119894= sum

119894119895 119894isin119873(119896)(119895)or 119895isin119873(119896)(119894)

10038171003817100381710038171003817119879119879119909119894minus 119879119879119909119895

10038171003817100381710038171003817

2

= 2119879119879119883 (119863 minus119882)119883

119879119879

(39)

where

119882(119894 119895) = 1 if 119894 isin 119873(119896) (119895) or 119895 isin 119873(119896) (119894) 0 otherwise

(40)

Here 119873(119896)(119895) represents the 119870-nearest neighborhood indexset of 119909

119895from the same class and119863 is the row sum (or column

sum) of 119882 119863(119894 119894) = sum119894119882119894119895 Interclass separability is indi-

cated by a penalty graph whose term is expressed as follows

119878119890= sum

119894119895 (119894119895)isin119875(119896)(119897119895)

or (119894119895)isin119875(119896)(119897119894)

10038171003817100381710038171003817119879119879119909119894minus 119879119879119909119895

10038171003817100381710038171003817

2

= 2119879119879119883(119863 minus )119883

119879119879

(41)

10 Mathematical Problems in Engineering

of which

(119894 119895) = 1 if (119894 119895) isin 119875(119896) (119897

119895) or (119894 119895) isin 119875(119896) (119897

119894)

0 otherwise(42)

Note that 119878119894and 119878

119890are corresponding to ldquowithin-scatter

matrixrdquo and ldquobetween-scatter matrixrdquo of the traditional LDAalternatively The optimal solution of MFA can be achievedby solving the following minimization problem That is

= arg119879

min 119879119879119883(119863 minus119882)119883

119879119879

119879119879119883(119863 minus )119883119879119879

(43)

We know that (43) is also a general eigenvalue decompositionproblem Let 119879PCA indicate the transformation matrix fromthe original space to PCA subspace with certain energyremaining and then the final projection of MFA is output as

119879MFA = 119879PCA (44)

As can be seenMFA constructs twoweightedmatrices119882and according to the intraclass compactness and interclassseparability In LFDA and PD-LFDA only one affinity isconstructed The difference lies in that the ldquoweightrdquo in LFDAand PD-LFDA is in the range of [0 1] according to the levelof difference Yet MFA distributes the same weight to its 119870-nearest neighborhoodsThe optimal solution ofMFA LFDAand PD-LFDA can be attributed to a general eigenvaluedecomposition problem Hence the idea of MFA LFDA andPD-LFDA is approximately similar in a certain interpretationRelationship with other methodologies can be analyzed in ananalogous way

4 Experimental Results

To illustrate the performance of PD-LFDA experimentson a real hyperspectral remote sensing image data setmdashAVIRIS Indian Pine 1992 are conducted in this sectionThe AVIRIS Indian Pines 1992 data set was gathered byNational Aeronautics and Space Administrator (NASA) withAirborne VisibleInfrared Imaging Spectrometer (AVIRIS)sensor over the Indian Pine test site in northwest Indians inJune 1992 This data set consists of 145 times 145 pixels and 224spectral reflectance bands ranging from 04 120583m to 245 120583mwith a spatial resolution of 20m The Indian Pines scene iscomposed of two-third agriculture and one-third forest orother natural perennial vegetation Some other landmarkssuch as dual lane highways rail line low density housing andsmaller roads are also in this image Since the scene was takenin June some main crops for example soybeans and cornare in their early growth stage with less than 5 coveragewhile the no-till min-till and clean-till indicate the amountof previous crop residue remaining The region map can bereferred to Figure 5(a) The 20 water absorption bands (ie[108ndash112 154ndash167] 224) were discarded

In this section performance of different dimensionreduction methods that is PCA LPP LFDA LDA JGLDAand RP [34] is compared with the proposed PD-LFDA

Table 1 Training set in AVIRIS Indian Pines 1992 database

ID Class Name1 Alfalfa (4618 3913)2 Corn-notill (1428136 952)3 Corn-mintill (83087 1048)4 Corn (23734 1434)5 Grass-pasture (48354 1118)6 Grass-trees (73071 973)7 Grass-pasture-mowed (2817 6071)8 Hay-windrowed (47850 1046)9 Oats (2015 7500)10 Soybean-notill (97286 884)11 Soybean-mintill (2455214 872)12 Soybean-clean (59354 911)13 Wheat (20528 1366)14 Woods (1265102 806)15 Buildings-Grass-Trees-Drives (38639 1010)16 Stone-Steel-Towers (9324 2581)Total 102491029 1004Numerical value in this table refers to number of samples number oftraining samples and pc respectively

Classification accuracy is reported via the concrete classifiersGenerally many dimension reduction research papers adopt119870-nearest neighborhood classifier (KNN) and support vectormachine (SVM) classifier to measure the performance of theextracting features after the dimension reduction where theoverall accuracy and kappa coefficient are detailed in thereports Hereby in this paper we also adopt KNN classifierand SVM classifier for performance measurement In KNNclassifier we select the value of 119870 as 1 5 and 9 so thatthree classifiers based on nearest neighborhoods are formedwhich are called 1NN 5NN and 9NN In SVM classifier weseek a hyperplane to separate classes in kernel-induced spacewhere the linear nonseparable classes in the original featurespace can be separated via kernel trick SVM as a robustand successful classifier has been widely used to evaluatethe performance of multifarious methods in many areas Forsimplicity and convenience we use LIBSVM package [35] forexperiments Accuracy of dimension reduced performancewill be reported by classified performance from SVM clas-sifier In the following schedule the feature subspace will becalculated at the first step from training samples by differentdimensional algorithms Table 1 gives a numerical statisticsof training samples corresponding to each class Then thenew sample will be projected into a low subspace by thetransformedmatrix Finally all the new samples are classifiedby SVM classifier

In this experiment a total of 1029 samples were selectedfor training and the remaining samples are used for testingNote that all the labeled samples in database are unbalancedand the available samples of each category differ dramaticallyThe following strategy is imposed for sample division A fixednumber of 15 samples are randomly selected to form thetraining sample yet the absent samples are randomly selected

Mathematical Problems in Engineering 11

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(b) 5NN

7 9 11 13 15Reduced space

0

01

02

03

04

05

06

07

08

09

1

Ove

rall

accu

racy

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(d) Liner SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF-SVM

Figure 2 Overall accuracy by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

12 Mathematical Problems in Engineering

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(b) 5NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(d) Linear SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF SVM

Figure 3 Kappa coefficient by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

Mathematical Problems in Engineering 13

(a) Pseudo 3-channel color image band = [12 79 140]

1

2

2

2

2

2 2

3

3

3

3

3

4

5

5

5

5

6

6 6 6

7

8

9

10

10

10 10

11

11

11

11

11 12

12 12

13 14

14

14 15

15

16

(b) Ground truth

(c) Distribution of training sample

Figure 4 Illustration of sample partition

from the remaining samples Under this strategy the trainingsamples and testing samples are listed in Table 1

Figure 2 shows the overall accuracy of different dimen-sion reduction methods applied to AVIRIS Indian 92AV3Cdata set The neighborhood of KNN classifier is selected as 15 and 9 respectively which produces three classifiers thatis 1NN 5NN and 9NN Three different kernel functionsare adopted for SVM classifier Then derived classifiers arealso used in this experiment that is linear SVM polynomialSVM and RBFSVM It can be deduced from Figures 2(a)sim2(c) that when the embedding space is greater than 5 pro-posed PD-LFDA performs the best while JGLDA performsthe worst The results produced by RP is slightly better thanJGLDA PCA LDA LPP and RP show the similar classifiedresults under KNN classifierThat is the proposed PD-LFDAoutperforms the others Meantime compared with LFDAthe proposed PD-LFDA leads to 2 more improvementson average Moreover it can be observed from (d) that theclassified accuracy increases steadily as the embedded spaceincreases However LDA demonstrates the highest overallaccuracywhen the reduced features vary to 9 while LFDAhas

the significant improvements when the number of reducedfeatures is greater than 9 This phenomenon of Figure 2(d)indicates the instability of linear SVM Nevertheless thesituation reversed for polynomial SVM and RBF SVM inFigures 2(e) and 2(f) wherein the proposed PD-LFDAwins a little improvement against LFDA and has significantimprovement compared with the others Inspired effects ofproposed PD-LFDA algorithm were achieved in all casesFurthermore Table 2 gives the detailed overall accuracyunder different feature dimensions using 3NN 7NN andRBFSVM classifiers which validates the feasibility of theproposed scheme in this paper

Figure 3 displays the results of kappa coefficient obtainedusing the different dimension reduction algorithms underKNN classifier and SVM classifiers The experimental cir-cumstance of Figure 3 is same as that of Figure 2 We canfind that from these results JGLDA performs the worst inmost cases except in Figure 3(e) The proposed PD-LFDAmethod outperforms the other methods and achieves thehighest kappa numerical value in most cases except usingthe linear SVM as the classifier In fact none of them work

14 Mathematical Problems in Engineering

686864717360

(a) PCA-7NN

687564707607

(b) LPP-7NN

792376538609

(c) LFDA-7NN

739070418130

(d) LDA-7NN

619556516209

(e) JGLDA-7NN

670962947164

(f) RP-7NN

799277288570

(g) PCA-RBFSVM

767473558288

(h) LPP-RBFSVM

837581518822

(i) LFDA-RBFSVM

728769288071

(j) LDA-RBFSVM

582453777121

(k) JGLDA-RBFSVM

763173278365

(l) RP-RBFSVM

Figure 5 Classified map generated by different dimension reduction methods where the overall accuracy kappa coefficient and averageclassification accuracy are listed at the top of each map respectively

steady in the case of linear SVM (Figure 3(d)) Note that thesituation is improved in polynomial SVM where the kappanumerical value of the proposed PD-LFDA is significantlybetter than the others All the achievements demonstrate therobustness of our contribution in PD-LFDA Simultaneouslyit is noticeable that LPP exhibits an average kappa levelThe kappa value gained by LPP is not seriously bad and isnot dramatically good The kappa results produced by RPare approximately the same as LPP A significant advantageof RP is the simple construction and computation wherethe accuracy is closed to LPP More details are summarizedin Table 3 It can be concluded that the kappa coefficientof proposed algorithm is higher than the other approacheswhich is more appropriate for the classification of HSI data

The visual results of all methods are presented in Figures5sim6 where the class labels are converted to pseudocolor

image The pseudocolor image of the hyperspectral imagefrom Indian 92AV3C database is shown in Figure 4(a) Theavailable labeled image which represents the truth groundis illustrated in Figure 4(b) where the labels are made byhuman The training samples are selected from the labeledimage represented as points in the image as shown inFigure 4(c) Each label number (ID) corresponds to eachclass name which is indexed in Table 1 In this experimentall the available labeled samples are used for testing whileapproximate 10 of samples are used for training Thesubspace is fixed to 13 (the number here is only used forreference it can be changed) For each experiment thedimension from original feature space is reduced to theobjective dimensionality thereafter the classified maps areinduced by 7NN classifier and RBF-SVM classifier Theoverall accuracy kappa coefficient and average accuracy are

Mathematical Problems in Engineering 15

837981698991

(a) PD-LFDA-7NN

848682798968

(b) D-LFDA-RBFSVM

Figure 6 Classified map of the proposed method

Table 2 Overall accuracy obtained using different dimensions offeature and different classifiers for Indian Pine Scene database ()

Item Dims7 9 11 13 15

3NNPCA 7007 6996 7026 7035 7087LPP 6575 6871 6835 6695 6546LFDA 7513 7895 7963 7992 8009LDA 6415 6606 6616 6596 6536PD-LFDA 7713 8094 816 8218 8263JGLDA 5405 5602 5704 5642 5751RP 6173 6444 6745 6632 6652

7NNPCA 6902 6902 691 6933 6972LPP 6782 7057 7027 6892 6681LFDA 7419 7777 7827 7863 7862LDA 6615 6909 6899 692 6882PD-LFDA 7736 809 8155 8172 8235JGLDA 5665 5811 5809 5812 5914RP 6194 6481 6696 6613 6603

RBF-SVMPCA 8051 7794 7718 7935 8176LPP 7101 7514 7614 7501 7314LFDA 7826 8187 82 8113 7915LDA 677 6834 6877 6642 6792PD-LFDA 7858 8267 8395 8423 8166JGLDA 5686 5606 5663 5689 5879RP 6957 7282 7441 7416 7543

also included at the top of each map respectively Figure 5displays the classifiedmaps of classicmethods in pseudocolorimages and the classified maps of proposed PD-LFDA arepresented in Figure 6 It can be observed from these mapsthat the best performance is achieved by PD-LFDA when7NN classifier is used in this case the overall accuracyis 8379 the kappa coefficient is 8169 and the averageaccuracy is 8991 Moreover the worst algorithm is JGLDAwhose overall accuracy is 6195 the kappa coefficient

Table 3 Kappa coefficient by different dimension reduction meth-ods and different classifiers applied to Indian Pine Scene database()

Item Dims7 9 11 13 15

3NNPCA 6592 658 6612 6624 6681LPP 6084 6422 6382 6219 605LFDA 7145 7586 7664 7697 7718LDA 5913 6126 6137 6118 6048PD-LFDA 7392 7824 7901 7966 8016JGLDA 4778 4992 5082 5006 5119RP 564 5963 6303 6172 6192

7NNPCA 6466 6466 6475 6501 6545LPP 6304 6629 6585 6429 6185LFDA 7047 7457 7515 7557 7557LDA 6132 646 6453 6478 6431PD-LFDA 7417 782 7893 7913 7985JGLDA 5029 519 5161 5159 5263RP 5638 5992 6237 6135 6124

RBF-SVMPCA 7767 747 7393 7636 7900LPP 6677 715 7264 7115 6921LFDA 7486 7915 7924 7822 7584LDA 629 6362 6427 6153 6323PD-LFDA 7551 8011 8159 8187 7899JGLDA 5109 5066 5133 5147 5369RP 653 6908 7085 706 7203

becomes 5651 and the average accuracy is only 6209Other methods such as PCA LPP and RP can produce thecomparable results and no one can outperform the otherHowever LDAoutperformsPCA LPP andRP yielding betterresult Similar conclusions can be achieved in the groupof RBF-SVM Generally proposed PD-LFDA significantlyoutperforms the rest in this experiment which indicates thecorrectness of improvements in proposed PD-LFDA

16 Mathematical Problems in Engineering

Table 4 Performance of dimension reduction on the whole labeled samples ()

Evaluated item MethodologiesPCA LDA LPP LFDA JGLDA RP PD-LFDA

7NNOverall accuracy 6868 739 6875 7923 6195 6709 8379Kappa coefficient 6471 7041 647 7653 5651 6294 8169Average accuracy 737 813 7607 8609 6209 7164 8991

RBF-SVMOverall accuracy 7992 7287 7674 8375 5824 7631 8486Kappa coefficient 7728 6928 7355 8151 5377 7327 8279Average accuracy 857 8071 8288 8822 7121 8365 8968

Finally details of assessment deduced by 7NN and RBF-SVM is summarized in Table 4 where the correspondingoverall accuracy kappa coefficient and average accuracy ofdifferent methods can be located collectively

5 Conclusions

In this paper we have analyzed local Fisher discriminantanalysis (LFDA) and found its weakness By replacing themaximum distance with local variance for the constructionof weight matrix introducing class prior probability into thecomputation of affinitymatrix an improved LFDA algorithmhas been proposed This novel approach is called PD-LFDAbecause the probability distribution (PD) is applied in LFDAalgorithm The proposed approach essentially can increasethe discriminant ability of transformed features in lowdimensional space The pattern found by the new approachis expected to be more accurate and coincides with thecharacter of HSI data and is conducive to classify HSI dataPD-LFDA has been evaluated on a real removing sensingAVIRIS Indian Pine 92AV3C data set We have compared theperformance of the proposed PD-LFDA with that of PCALPP LFDA LDA JGLDA and RP Both the numerical resultsand visual inspection of class maps have been obtained Inthe experiments KNN classifier and SVM classifier havebeen used We have argued that the proposed PD-LFDAexhibits the best performance and serves as a very effectivedimensionality reduction tool for high dimensional datasuch as hyperspectral image (HSI) data

Appendix

Procedure of Proposed Algorithm

The brief description of the algorithm to perform the pro-posed PD-LFDA method is already presented in Section 3The details of the algorithm are provided in Algorithm 2

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the Research of University ofMacau under Grants no MYRG205(Y1-L4)-FST11-TYY noMYRG187(Y1-L3)-FST11-TYY and no SRG010-FST11-TYYand by the National Natural Science Foundation of Chinaunder Grant no 61273244 This research project was alsosupported by the Science andTechnologyDevelopment Fund(FDCT) of Macau under Contract no 100-2012-A3

References

[1] W Li S Prasad Z Ye J E Fowler and M Cui ldquoLocality-preserving discriminant analysis for hyperspectral image clas-sification using local spatial informationrdquo in Proceedings ofthe 32nd IEEE International Geoscience and Remote SensingSymposium (IGARSS rsquo12) pp 4134ndash4137MunichGermany July2012

[2] HND LeM S Kim andD-H Kim ldquoComparison of singularvalue decomposition and principal component analysis appliedto hyperspectral imaging of biofilmrdquo in Proceedings of the IEEEPhotonics Conference (IPC rsquo12) pp 6ndash7 2012

[3] C K Chui and J Wang ldquoRandomized anisotropic transformfor nonlinear dimensionality reductionrdquo GEMmdashInternationalJournal on Geomathematics vol 1 no 1 pp 23ndash50 2010

[4] T V Bandos L Bruzzone and G Camps-Valls ldquoClassificationof hyperspectral images with regularized linear discriminantanalysisrdquo IEEE Transactions on Geoscience and Remote Sensingvol 47 no 3 pp 862ndash873 2009

[5] D Guangjun Z Yongsheng and J Song ldquoDimensionalityreduction of hyperspectral data based on ISOMAP algorithmrdquoin Proceedings of the 8th International Conference on ElectronicMeasurement and Instruments (ICEMI rsquo07) pp 3935ndash3938Xirsquoan China August 2007

[6] X Luo and M-F Jiang ldquoThe application of manifold learn-ing in dimensionality analysis for hyperspectral imageryrdquo inProceedings of the International Conference on Remote SensingEnvironment and Transportation Engineering (RSETE rsquo11) pp4572ndash4575 June 2011

[7] J Khodr and R Younes ldquoDimensionality reduction on hyper-spectral images a comparative review based on artificial datasrdquoin Proceedings of the 4th International Congress on Image andSignal Processing (CISP rsquo11) vol 4 pp 1875ndash1883 October 2011

Mathematical Problems in Engineering 17

[8] J Wen Z Tian H She and W Yan ldquoFeature extraction ofhyperspectral images based on preserving neighborhood dis-criminant embeddingrdquo in Proceedings of the 2nd InternationalConference on Image Analysis and Signal Processing (IASP rsquo10)pp 257ndash262 Zhejiang China April 2010

[9] Y-R Yeh S-Y Huang and Y-J Lee ldquoNonlinear dimensionreduction with kernel sliced inverse regressionrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 21 no 11 pp1590ndash1603 2009

[10] J He L Zhang Q Wang and Z Li ldquoUsing diffusion geometriccoordinates for hyperspectral imagery representationrdquo IEEEGeoscience and Remote Sensing Letters vol 6 no 4 pp 767ndash7712009

[11] J Peng P Zhang and N Riedel ldquoDiscriminant learninganalysisrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 38 no 6 pp 1614ndash1625 2008

[12] F S Tsai and K L Chan ldquoDimensionality reduction techniquesfor data explorationrdquo in Proceedings of the 6th InternationalConference on Information Communications and Signal Process-ing pp 1ndash5 December 2007

[13] M D Farrell Jr and R M Mersereau ldquoOn the impact of PCAdimension reduction for hyperspectral detection of difficulttargetsrdquo IEEE Geoscience and Remote Sensing Letters vol 2 no2 pp 192ndash195 2005

[14] S Prasad and L M Bruce ldquoLimitations of principal com-ponents analysis for hyperspectral target recognitionrdquo IEEEGeoscience and Remote Sensing Letters vol 5 no 4 pp 625ndash629 2008

[15] J Yu Q Tian T Rui and T S Huang ldquoIntegrating discriminantand descriptive information for dimension reduction and clas-sificationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 17 no 3 pp 372ndash377 2007

[16] J Kong S Wang J Wang L Ma B Fu and Y Lu ldquoAnovel approach for face recognition based on supervisedlocality preserving projection and maximummargin criterionrdquoin Proceedings of the International Conference on ComputerEngineering and Technology (ICCET rsquo09) vol 1 pp 419ndash423Singapore January 2009

[17] M Loog and R P W Duin ldquoLinear dimensionality reductionvia a heteroscedastic extension of LDA the Chernoff criterionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 26 no 6 pp 732ndash739 2004

[18] A C Jensen A Berge and A S Solberg ldquoRegressionapproaches to small sample inverse covariance matrix estima-tion for hyperspectral image classificationrdquo IEEE TransactionsonGeoscience andRemote Sensing vol 46 no 10 pp 2814ndash28222008

[19] J Jin BWang and L Zhang ldquoA novel approach based on Fisherdiscriminant null space for decomposition of mixed pixels inhyperspectral imageryrdquo IEEE Geoscience and Remote SensingLetters vol 7 no 4 pp 699ndash703 2010

[20] J Wen Z Tian X Liu and W Lin ldquoNeighborhood preservingorthogonal pnmf feature extraction for hyperspectral imageclassificationrdquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 6 no 2 pp 759ndash7682013

[21] Y Ren G Zhang G Yu and X Li ldquoLocal and global structurepreserving based feature selectionrdquoNeurocomputing vol 89 pp147ndash157 2012

[22] Z Fan Y Xu and D Zhang ldquoLocal linear discriminant analysisframework using sample neighborsrdquo IEEE Transactions onNeural Networks vol 22 no 7 pp 1119ndash1132 2011

[23] Y Wang S Huang D Liu and B Wang ldquoResearch advanceon band selection-based dimension reduction of hyperspectralremote sensing imagesrdquo in Proceedings of the 2nd InternationalConference on Remote Sensing Environment and TransportationEngineering (RSETE rsquo12) pp 1ndash4 IEEE Nanjing China June2012

[24] B Waske S van der Linden J A Benediktsson A Rabe andP Hostert ldquoSensitivity of support vector machines to randomfeature selection in classification of hyperspectral datardquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 7pp 2880ndash2889 2010

[25] M Sugiyama ldquoLocal fisher discriminant analysis for superviseddimensionality reductionrdquo in Proceedings of the 23rd Interna-tional Conference onMachine Learning (ICML rsquo06) pp 905ndash912ACM June 2006

[26] Y Cheng ldquoMean shift mode seeking and clusteringrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol17 no 8 pp 790ndash799 1995

[27] W Li S Prasad J E Fowler and L M Bruce ldquoLocality-preserving dimensionality reduction and classification forhyperspectral image analysisrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 50 no 4 pp 1185ndash1198 2012

[28] L Zelnik-Manor and P Perona ldquoSelf-tuning spectral cluster-ingrdquo in Proceedings of the 18th Annual Conference on NeuralInformation Processing Systems pp 1601ndash1608 December 2004

[29] W K Wong and H T Zhao ldquoSupervised optimal localitypreserving projectionrdquo Pattern Recognition vol 45 no 1 pp186ndash197 2012

[30] X He and P Niyogi ldquoLocality preserving projectionsrdquo inAdvances in Neural Information Processing Systems 16 SThrunL Saul and B Scholkopf Eds MIT Press Cambridge MassUSA 2004

[31] H Wang S Chen Z Hu and W Zheng ldquoLocality-preservedmaximum information projectionrdquo IEEE Transactions on Neu-ral Networks vol 19 no 4 pp 571ndash585 2008

[32] M Sugiyama ldquoDimensionality reduction ofmultimodal labeleddata by local fisher discriminant analysisrdquo Journal of MachineLearning Research vol 8 pp 1027ndash1061 2007

[33] S Yan D Xu B Zhang H J Zhang Q Yang and S Lin ldquoGraphembedding and extensions a general framework for dimen-sionality reductionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 40ndash51 2007

[34] C K Chui and J Wang ldquoDimensionality reduction of hyper-spectral imagery data for feature classificationrdquo in Handbookof Geomathematics pp 1005ndash1047 Springer Heidelberg Ger-many 2010

[35] C-C Chang and C-J Lin ldquoLIBSVM a Library for supportvector machinesrdquo ACM Transactions on Intelligent Systems andTechnology vol 2 no 3 article 27 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Subspace Learning via Local Probability ...downloads.hindawi.com/journals/mpe/2015/145136.pdf · and applicable toolkits were engendered one a er another. Hyperspectral

4 Mathematical Problems in Engineering

=1

2

119873

sum

119894119895=1

119875119908(119894 119895) (119909

119894119909119879

119894+ 119909119895119909119879

119895minus 119909119894119909119879

119895minus 119909119895119909119879

119894)

=1

2

119873

sum

119894119895=1

119875119908(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

(8)

where

119875119908(119894 119895) =

1

119873119888 if 119897

119894= 119897119895= 119888

0 if 119897119894= 119897119895

(9)

Let 119878119905be the total mixed matrix of LDA and then we gain

119878119887= 119878119905minus 119878119908

=1

2

119873

sum

119894119895=1

119875119887(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

(10)

where

119875119887(119894 119895) =

1

119873minus1

119873119888 if 119897

119894= 119897119895= 119888

1

119873 if 119897

119894= 119897119895

(11)

LFDA is achieved by weighting the pairwise data points

119878119908=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

119878119887=1

2

119873

sum

119894119895=1

119887(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

(12)

where 119908(119894 119895) and

119887(119894 119895) denote the weight matrix of

different pairwise points for the within-class samples andbetween-class samples respectively

119908(119894 119895) equiv

119882(119894 119895)

119873119888 if 119897

119894= 119897119895= 119888

0 if 119897119894= 119897119895

(13)

119887(119894 119895) equiv

119882(119894 119895) (1

119873minus1

119873119888) if 119897

119894= 119897119895= 119888

1

119873 if 119897

119894= 119897119895

(14)

where119882 indicates the affinity matrixThe construction of119882is critical for the performance of classified accuracy therebythe investigation of construction is in great need to be furtherelaborated in the following section

3 Proposed Scheme

The calculation of (13) and (14) is very important to theperformance of LFDA There are many methods to computethe affinitymatrix119882The simplest one is that119882 is equivalentto a constant that is

119882(119894 119895) equiv 119886 (15)

where 119886 in the above equation is a real nonnegative numberHowever the equations of (13) and (14) are derived to thestate-of-the-art Fisherrsquos linear discriminant analysis underthis construction

Another construction adopts the heat kernel derivedfrom LPP

119882(119894 119895) = exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

1205902) (16)

where 120590 is a tuning parameter Yet the affinity is valued by thedistance of data points and the computation is too simple torepresent the locality of data patches Amore adaptive version[26] of (16) is proposed as follows

119882(119894 119895) =

exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

1205902) if 119909

119894isin KNN (119909

119895 119870)

or 119909119895isin KNN (119909

119894 119870)

0 otherwise(17)

Compared with the former computation (17) is in con-junction with 119870-nearest data points which is computation-ally fast and light Moreover the property of local patches canbe characterized by (17) However the affinity defined in (16)and (17) is globally computed thus it may be apt to overfitthe training points and be sensitive to noise Furthermorethe density of HSI data pointsmay vary according to differentpatches Hence a local scaling technique is proposed inLFDA to cope with this issue [29] where the sophisticatedcomputation is given by

119882(119894 119895) =

exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

) if 119909119894isin KNN (119909

119895 119870)

or 119909119895isin KNN (119909

119894 119870)

0 otherwise(18)

where 120588119894denotes the local scaling around the corresponding

sample 119909119894with the following definition

120588119894=10038171003817100381710038171003817119909119894minus 119909119870

119894

10038171003817100381710038171003817 (19)

where 119909119870119894represents the 119870th nearest neighbor of 119909

119894 lowast 2

denotes the square Euclidean distance and119870 is a self-tuningpredefined parameter

To simplify the calculation many researches considered afixed value of119870 and a recommended value of119870 = 7 is studiedin [1 28] Note that 120588

119894is used to represent the distribution

of local data around sample 119909119894 However the above work

ignored the distribution around each individual sample Thediversity of adjacent HSI pixels is approximate thus thespectrum of the neighboring landmarks has great similarityThat is the pixels of HSI data which have resembling spec-trums tend to be of the same landmark This phenomenonindicates that the adjacency of local patches not only lies in

Mathematical Problems in Engineering 5

the spectrum space but also in the spatial space For a localpoint the calculation of making use of the diversity of its119870thnearest neighborhoods is not fully correct

An evident example is illustrated in Figure 1 where twogroups of points have different distributions In group (a)most neighbor points are closed to point 119909

0 while in group

(b) most neighbor points are far from point 1199090 However the

measurement of two cases are the same according to (19)Thiscan be found in Figure 1 where the distances between point1199090and its 119870th nearest neighborhoods (119870 = 7) are same in

both distributions which can be shown in Figures 1(a) and1(b)119871

1= 1198712This example indicates that the simplification of

local distribution by the distance between the sample 119909119894and

the 119870th nearest neighbor sample is unreasonable Actuallythe result by using of this simplification may raise someerrors

Based on the discussion above a novel approach whichis called PD-LFDA is proposed to overcome the weakness ofLFDA To be specific PD-LFDA incorporates two key pointsnamely

(1) The class prior probability is applied to compute theaffinity matrix

(2) The distribution of local patch is represented by theldquolocal variancerdquo instead of the ldquofarthest distancerdquo toconstruct the weight matrix

The proposed approach essentially increases the discriminantability of transformed features in low dimensional space Thepattern found by PD-LFDA is expected to be more accurateand coincids with the character of HSI data and is conduciveto classify HSI data

In this way a more sophisticated construction of affinitymatrix which is derived from [29] is proposed as follows

119882(119894 119895)

=

119901 (119897119894)2 exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)

sdot(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)) if 119897119894= 119897119895= 119888

0 if 119897119894= 119897119895

(20)

where 119901(119897119894) stands for the class prior probability of class 119909

119894

and 120588119894indicates the local variance Note that the denominator

item of (13) is 1119873119888 which will cancel out our prior effectif we use 119901(119897

119894) to replace 119901(119897

119894)2 (the construction of 119901(119897

119894)

will be given in (21)) Different part of this derivation playsthe same role as the original formulation for example forthe last item on one hand it plays the role of intraclassdiscriminating weight and on the other hand the productresult of119882 may reach zero if the Euclidean square distance sdot is very small for some data points For this case an extraitem (1 + exp(minus119909

119894minus 1199091198952120588119894120588119895)) is added to the construc-

tion of intraclass discriminating weight to prevent accuracytruncation By doing so our derivation can be viewed as an

integration of class prior probability the local weight andthe discriminating weight This construction is expected topreserve both the local neighborhood structure and the classinformation Besides this construction is expected to sharethe same advantages detailed in the original work

It is clear that (20) consists of two new factors comparedwith LFDA method (1) class prior probability 119901(119897

119894) and (2)

local variance 120588119894

Suppose class 119909119894to be class 119888 that is 119897

119894= 119888 so that the

probability of class 119909119894can be calculated by

119901 (119897119894) = 119901 (119888) =

119873119888

119873 (21)

where 119873119888 is the number of the samples in class 119888 whole 119873denotes the total number of samples and119873 = sum119862

119888=1119873119888

Please note that the item (1+exp(minus119909119894minus1199091198952120588119894120588119895)) in (20)

is used to prevent the extra rounding error produced from thefirst two items and to keep the total value of

119901 (119897119894)2 exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

(22)

which does not reach the minimum Here 120588lowastdenotes the

local scaling around 119909lowast In this paper a local scaling 119909

lowastis

measured by the standard deviation of local square distanceAssume that 119909(119894)

1 119909(119894)

2 119909

(119894)

119870are the119870-nearest samples of 119909

119894

and then the square distance between 119909119894and 119909(119894)

119896is given by

119889(119894)

119896=10038171003817100381710038171003817119909119894minus 119909(119894)

119896

10038171003817100381710038171003817

2

119896 = 1 2 119870 (23)

The corresponding mean (119894) can be defined as

(119894)=1

119870

119870

sum

119896=1

119889(119894)

119896

=1

119870

119870

sum

119896=1

10038171003817100381710038171003817119909119894minus 119909(119894)

119896

10038171003817100381710038171003817

2

(24)

where lowast 2 represents a square Euclidean distance and 119870 isa predefined parameter whose recommended value is 119870 = 7The standard deviation can be calculated as

120588119894= radic

1

119870

119870

sum

119896=1

(119889(119894)

119896minus (119894))

2

(25)

Note that in the above equation the item 1119870 becomesa constant that can be shifted outside Thus an equivalentformula is given by

120588119894=1

119870radic

119870

sum

119896=1

(119889(119894)

119896minus (119894))

2

(26)

Similar procedure can be deduced to 119909119895 Hence we have

120588119898=1

119870

119870

sum

119894=1

radic100381710038171003817100381710038171003817119909(119898119896)

119894

100381710038171003817100381710038171003817

2

minus1

119870

119870

sum

119895=1

100381710038171003817100381710038171003817119909(119898119896)

119895

100381710038171003817100381710038171003817

2

(27)

6 Mathematical Problems in Engineering

x1

x2

x3

x4

x5

x6

x7

L1

x0

(a)

x1

x2

x3

x4

x5

x6

x7

L2

x0

(b)

Figure 1 Different distributions of 1199090and the corresponding 119870-nearest neighborhoods (119870 = 7) (a) Most neighbors are closed to point 119909

0

(b) Most neighbors are far from point 1199090 The distances between point 119909

0and its 119870th nearest neighbors are the same in both distributions

1198711= 1198712

Comparing (19) with (27) it is noticeable that (28) holds

120588119898le 120588119894 (28)

Compared with the former definitions our definition has atleast the following advantages

(i) By incorporating the prior probability of each classwith local technique 119901(119897

119894) the proposed scheme is

expect to be a benefit for the classified accuracy(ii) The representation of local patches equation (26) is

described by local standard deviation 120588119894rather than

absolute diversity in (19) which is more accurate inmeasuring the local variance of data samples

(iii) Compared with the global calculation the proposedcalculation is taken on local patches which is efficientin getting rid of over-fitting

(iv) The proposed local scaling technique meets the char-acter of HSI data which is more applicable for theprocessing of hyperspectral image in real applications

Based on the above affinity defined an extended affinitymatrix can also be defined in a similar way Our definitiononly provides a heuristic exploration for reference Theaffinity can be further sparse for example by introducing theidea of 120576-nearest neighborhoods [31]

Theoptimal solution of improved scheme can be achievedby maximize the following criterion

119879PD-LFDA equiv arg max119879isinR119901times119902

tr (119879119879119878119887119879)

tr (119879119879119878119908119879)

(29)

It is evident that (29) has the similar form of (3) Thisfinding enlightens us that the transformation119879 can be simplyachieved by solving the generalized eigenvalue decomposi-tion of 119878minus1

119908119878119887 Moreover Let 119866 isin R119902times119902 be a 119902-dimensional

invertible square matrix It is clear that 119879PD-LFDA119866 is alsoan optimal solution of (29) This property indicates that

the optimal solution is not uniquely determined becauseof arbitrary arithmetic transformation of 119879PD-LFDA119866 Let 120593119894be the eigenvector of 119878minus1

119908119878119887corresponding to eigenvalue

119894

that is 119878119887120593119894= 119894119878119908120593119894 To cope with this issue a rescaling

procedure is adopted [25] Each eigenvector 120593119894119902

119894=1is rescaled

to satisfy the following constraint

120593119894119878119908120593119895= 1 if 119894 = 1198950 if 119894 = 119895

(30)

Then each eigenvector is weighted by the square root of itsassociated eigenvalue The transformed matrix 119879PD-LFDA ofthe proposed scheme is finally given by

119879PD-LFDA = radic11205931 radic21205932 radic1199021120593119902 isin R119901times119902 (31)

with descending order 1ge 2ge sdot sdot sdot ge

119902

For a new testing points 119909 the projected point in the newfeature space can be captured by 119910 = 119879119879PFDA119909 thus it can befurther analyzed in the transformed space

According to the above analysis we can design an algo-rithm which is called PD-LFDA Algorithm to perform ourproposed method The detailed description of this algorithmcan be found in the appendix (Algorithm 2) A summary ofthe calculation steps of PD-LFDA Algorithm is presented inAlgorithm 1

The advantage of PD-LFDA is discussed as followsFirstly to investigate the rank of the between-class scatter

matrix 119878119887of LDA 119878

119887can be rewritten as

119878119887=

119862

sum

119897=1

119873119897(119898119897minus 119898) (119898

119897minus 119898)119879

= [1198731(119898119897minus 119898) 119873

2(119898119897minus 119898) 119873

119871(119898119897minus 119898)]

sdot [1198731(119898119897minus 119898) 119873

2(119898119897minus 119898) 119873

119871(119898119897minus 119898)]

119879

(32)

Mathematical Problems in Engineering 7

Input HSI training samples119883 isin R119901times119873 dimensionality to be embedded 119902 the parameter 119870 of 119870NNand a test sample 119909

119905isin R119901

Step 1 For each sample 119909119894from the same class calculate the119882

119894119895by (20)

where the local scaling factor 120588119894is calculated via (26) or (27)

Step 2 Equations (13) and (14) can be globally and uniformly transformed into an equivalent formula via119908= 119882 sdot 119882

1

119887= 119882 sdot (119882

2minus1198821)

(i)

where the operator 119860 sdot 119861 denotes the dot product between 119860 and 119861 and

119882(119894 119895) =

119901(119897119894)2 exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

if 119897119894= 119897119895= 119888

if 119897119894= 119897119895

0

(iia)

1198821=

1

119873119888 if 119897

119894= 119897119895= 119888

0 others(iib)

1198822=1

119873(1119873times1

11times119873) (iic)

By the above formulas the product of elements in different matrices can be achieved via dot productbetween matrices The equations (iia) (iib) and (iic) can be gained by integrating the number ofeach class119873119888 the number of total training samples119873 and the local scaling 120588

119894 then matrices119882119882

11198822

can be calculatedStep 3 Construct within-scatter matrix

119908and between-scatter matrix

119887 according to (i)

Step 4 Define Laplacian matrices 119871lowast below119871lowast= 119863lowastminus lowast (iii)

where119863lowast is the row sum or column sum of119882lowast119863lowast119894119894= sum119895lowast(119894 119895) (or119863lowast

119894119894= sum119894lowast(119894 119895)) and the

notation lowast denotes one letter in 119908 119887Step 5 On the basis of (29) and (30) the transformation matrix can be achieved viaeigenvectors 119879 = radic

11205931 radic21205932 radic

1199021120593119902 119879 isin R119901times119902 that corresponding the 119902 leading

eigenvalues 119887120593119894= 119894119908120593119894in solving the general problem of

119887120593119894= 119894119908120593119894

Step 6 For a testing sample 119909119905isin R119901 the extracted feature is 119911

119905= 119879119879119909119905isin R119902

Output Transformation matrix 119879 and the extracted feature 119911119905

Algorithm 1 PD-LFDA Algorithm

Thereby

rank (119878119887) ⩽ rank ([119873

1(119898119897minus 119898) 119873

2(119898119897minus 119898)

119873119871(119898119897minus 119898)]) ⩽ 119862 minus 1

(33)

It is easy to infer that the rank of the between-class scattermatrix 119878

119887is119862minus1 atmost thus there are up to119862minus1meaningful

subfeatures that can be extracted Thanks to the help ofaffinity matrix 119882 when compared with the conventionalLDA the reduced subspace of proposed PD-LFDA can beany subdimensional space On the other hand the classicallocal fisherrsquos linear discriminant only weights the value ofsample pairs in the same classes while our method also takesin account the sample pairs in different classes Hereafter theproposedmethod will bemore flexible and the results will bemore adaptiveThe objective function of proposed method isquite similar to the conventional LDA hereby the optimalsolution is almost same as the conventional LDA whichindicates that it is also simple to implement and easy to revise

To further explore the relationship of LDA and PD-LFDAwe now rewrite the objective function of LDAandPD-LFDA respectively

119879LDA = arg max119879isinR119901times119902

trace 119879119879119878119887119879

subject to 119879119879119878119908119879 = 119868

(34)

119879PD-LFDA equiv arg max119879isinR119901times119902

trace 119879119879119878119887119879

subject to 119879119879119878119908119879 = 119868

(35)

This implies that LDA tries to maximize the between-class scatter and simultaneously constraint the within-classscatter to a certain level However such restriction is hardto constraint and no relaxation is imposed When the data isnot a single modal that is multimodal or unknown modalLDA often fails On the other hand benefiting from theflexible designing of affinity matrix119882 PD-LFDA gains morefreedom in (35) That is the separability of PD-LFDA will bemore distinct and the degree of freedom remains more than

8 Mathematical Problems in Engineering

Input HSI data samples119883 = 1199091 1199092 119909

119873 isin R119901times119873 the objective dimension to be embedded 119902

the nearest neighbor parameter 119870 (default 119870 equiv 7) and the test sample 119909119905isin R119901

Output Transformation matrix 119879 isin R119901times119902Steps are as follows

(1) Initialize matrices(2) 119878119908larr 0119901times119901

within-class scatter(3) 119878119887larr 0119901times119901

between-class scatter(4)(5) Compute within-class affinity matrix119882119908(6) for 119888 = 1 2 119862 do in a classwise manner(7) 119909

119888

119894119873119888

119894=1larr 119909

119895| 119897119895= 119888 the 119888th class data samples

(8) 119883 larr 119909119888

1 119909119888

2 119909

119888

119873119888 sample matrix

(9) 119882119894=(1119873119888times111times119873119888)

119873119888

(10)(11) Determine the local scaling(12) for 119894 = 1 2 119873119888 do(13) 119909

(119894)

119896larr the 119896th nearest neighbor of 119909119888

119894 119896 = 1 2 119870

(14) for 119896 = 1 2 119870 do(15) 119889

(119894)

119896= 119909119894minus 119909(119894)

1198962

(16) end for

(17) (119894)larr1

119870

119870

sum

119896=1

10038171003817100381710038171003817119909119894minus 119909(119894)

119896

10038171003817100381710038171003817

2

(18) 120588(119894)= radic

1

119870

119870

sum

119896=1

(119889(119894)

119896minus (119894))2

(19) end for(20)(21) Define local affinity matrix(22) for 119894 119895 = 1 2 119873119888 do(23) 119901(119897

119894) larr 119873

119888119873 prior probability

(24) 119860119894119895larr 119901(119897

119894) exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

(25) end for(26) 119860

119888= 119860

(27) end for(28)119882

119908= diag119882

11198822 119882

119862 in block diagonal manner

(29) 119860119908= diag119860

1 1198602 119860

119862 also in block diagonal manner

(30) for 119894 119895 = 1 2 119873 do(31)

119908(119894 119895) larr 119860

119908(119894 119895)119882

119908(119894 119895)

(32) end for(33)(34) Compute between-class affinity matrix119882119887

(35)119882119887larr(1119873times1

11times119873)

119873minus diag119882

11198822 119882

119862

(36) Let 119865nz denotes the nonzero flag of elements in119882119908 119865nz(119894 119895) = 1 if119882119908(119894 119895) = 0 119865nz(119894 119895) = 0 if119882119908(119894 119895) = 0

(37) 119865nz larr (1198821015840

119887= 0)

(38) 119860119887larr 0119873times119873

(39) 119860

119887(119865nz) = 119860119908(119865nz)

(40) 119860119887(not119865nz) = 1

(41) for 119894 119895 = 1 2 119873 do(42)

119887(119894 119895) larr 119860

119887(119894 119895)119882

119887(119894 119895)

(43) end for(44) Now construct Laplacian matrix for within affinity matrix

119908and between affinity matrix

119887

(45) Let(46)119863119908

119894119894= sum

119895

119908(119894 119895)119863119887

119894119894= sum

119895

119887(119894 119895)

(47) Then

Algorithm 2 Continued

Mathematical Problems in Engineering 9

(48) 119871119908= 119908minus 119863119908 119871119887= 119887minus 119863119887

(49) Construct two matrixs below(50) 119878

119887= 119883119871

119887119883119879 119878119908= 119883119871

119908119883119879

(51) Let 1205931 1205932 120593

119902 be the general eigenvector of

(52) 119878119887120593119894= 119894119878119908120593119894 forall119894 isin 1 2 119902

(53) with the corresponding eigenvalue in descending order 1ge 2ge sdot sdot sdot ge

119902

(54)(55) Finally the transformation matrix can be represented as(56) 119879 = radic120582

11205931 radic12058211205932 radic120582

1120593119902 isin R119901times119902

(57)(58) For a new test sample 119909

119905 the embedding 119911

119905is given by

(59) 119911119905= 119879119879119909119905isin R119902

Algorithm 2 Proposed PD-LFDA method

the conventional LDA thus our method is expected to bemore robust and significantly preponderant

For large scale data sets we discuss a scheme that canaccelerate the computation procedure of the within-scattermatrix 119878

119908 In our algorithm owning to the fact that we have

put penalty on the affinity matrix for different class samplesin constructing the between-scatter matrix the acceleratedprocedure will remain for further discussion

The within-class scatter 119878119908can be reformulated as

119878119908=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894119909119879

119894+ 119909119895119909119879

119895minus 119909119894119909119879

119895minus 119909119895119909119879

119894)

=

119873

sum

119894=1

(

119873

sum

119895=1

119908(119894 119895))119909

119894119909119879

119894minus

119873

sum

119894119895=1

119908(119894 119895) 119909

119894119909119879

119895

= 119883(119863119908minus 119908)119883119879

= 119883119908119883119879

(36)

Here

119863119908 (119894 119894) =

119873

sum

119895=1

119908(119894 119895)

119908= 119863119908minus 119908

(37)

119908can be block diagonal if all samples 119909

119894119873

119894=1are sorted

according to their labels This property implies that 119863119908and

119908can also be block diagonal matrix Hence if we compute

119878119908

through (36) then the procedure will be much moreefficient Similarly 119878

119887can also be formulated as

119878119887= 119883

119887119883119879

= 119883(119863119887minus 119887)119883119879

(38)

Nevertheless 119887is dense and can not be further sim-

plified However the simplified computational procedure of119908

saves for us part of time in a way In this paper weadopt the above procedure to accelerate 119878

119908and pursue 119878

119887

normally In addition to locality structure some papers showthat another property for example marginal information isalso important and should be preserved in the reduced spaceThe theory of extended LDA and LPP algorithm is developedrapidly recently Yan et al [33] summarized these algorithmsin a graph embedding framework and also proposed amarginal fisher analysis embedding (MFA) algorithm underthis framework

In MFA the criterion is characterized by intraclasscompactness and interclass marginal superability which isreplaced for thewithin-class scatter and between-class scatterseverally The intraclass relationship is reflected by an intrin-sic graph which is constructed by 119870-nearest neighborhoodsample data points in the same class while the interclasssuperability is mirrored by a penalty graph computed formarginal points from different classes Following this ideathe intraclass compactness is given as follows

119878119894= sum

119894119895 119894isin119873(119896)(119895)or 119895isin119873(119896)(119894)

10038171003817100381710038171003817119879119879119909119894minus 119879119879119909119895

10038171003817100381710038171003817

2

= 2119879119879119883 (119863 minus119882)119883

119879119879

(39)

where

119882(119894 119895) = 1 if 119894 isin 119873(119896) (119895) or 119895 isin 119873(119896) (119894) 0 otherwise

(40)

Here 119873(119896)(119895) represents the 119870-nearest neighborhood indexset of 119909

119895from the same class and119863 is the row sum (or column

sum) of 119882 119863(119894 119894) = sum119894119882119894119895 Interclass separability is indi-

cated by a penalty graph whose term is expressed as follows

119878119890= sum

119894119895 (119894119895)isin119875(119896)(119897119895)

or (119894119895)isin119875(119896)(119897119894)

10038171003817100381710038171003817119879119879119909119894minus 119879119879119909119895

10038171003817100381710038171003817

2

= 2119879119879119883(119863 minus )119883

119879119879

(41)

10 Mathematical Problems in Engineering

of which

(119894 119895) = 1 if (119894 119895) isin 119875(119896) (119897

119895) or (119894 119895) isin 119875(119896) (119897

119894)

0 otherwise(42)

Note that 119878119894and 119878

119890are corresponding to ldquowithin-scatter

matrixrdquo and ldquobetween-scatter matrixrdquo of the traditional LDAalternatively The optimal solution of MFA can be achievedby solving the following minimization problem That is

= arg119879

min 119879119879119883(119863 minus119882)119883

119879119879

119879119879119883(119863 minus )119883119879119879

(43)

We know that (43) is also a general eigenvalue decompositionproblem Let 119879PCA indicate the transformation matrix fromthe original space to PCA subspace with certain energyremaining and then the final projection of MFA is output as

119879MFA = 119879PCA (44)

As can be seenMFA constructs twoweightedmatrices119882and according to the intraclass compactness and interclassseparability In LFDA and PD-LFDA only one affinity isconstructed The difference lies in that the ldquoweightrdquo in LFDAand PD-LFDA is in the range of [0 1] according to the levelof difference Yet MFA distributes the same weight to its 119870-nearest neighborhoodsThe optimal solution ofMFA LFDAand PD-LFDA can be attributed to a general eigenvaluedecomposition problem Hence the idea of MFA LFDA andPD-LFDA is approximately similar in a certain interpretationRelationship with other methodologies can be analyzed in ananalogous way

4 Experimental Results

To illustrate the performance of PD-LFDA experimentson a real hyperspectral remote sensing image data setmdashAVIRIS Indian Pine 1992 are conducted in this sectionThe AVIRIS Indian Pines 1992 data set was gathered byNational Aeronautics and Space Administrator (NASA) withAirborne VisibleInfrared Imaging Spectrometer (AVIRIS)sensor over the Indian Pine test site in northwest Indians inJune 1992 This data set consists of 145 times 145 pixels and 224spectral reflectance bands ranging from 04 120583m to 245 120583mwith a spatial resolution of 20m The Indian Pines scene iscomposed of two-third agriculture and one-third forest orother natural perennial vegetation Some other landmarkssuch as dual lane highways rail line low density housing andsmaller roads are also in this image Since the scene was takenin June some main crops for example soybeans and cornare in their early growth stage with less than 5 coveragewhile the no-till min-till and clean-till indicate the amountof previous crop residue remaining The region map can bereferred to Figure 5(a) The 20 water absorption bands (ie[108ndash112 154ndash167] 224) were discarded

In this section performance of different dimensionreduction methods that is PCA LPP LFDA LDA JGLDAand RP [34] is compared with the proposed PD-LFDA

Table 1 Training set in AVIRIS Indian Pines 1992 database

ID Class Name1 Alfalfa (4618 3913)2 Corn-notill (1428136 952)3 Corn-mintill (83087 1048)4 Corn (23734 1434)5 Grass-pasture (48354 1118)6 Grass-trees (73071 973)7 Grass-pasture-mowed (2817 6071)8 Hay-windrowed (47850 1046)9 Oats (2015 7500)10 Soybean-notill (97286 884)11 Soybean-mintill (2455214 872)12 Soybean-clean (59354 911)13 Wheat (20528 1366)14 Woods (1265102 806)15 Buildings-Grass-Trees-Drives (38639 1010)16 Stone-Steel-Towers (9324 2581)Total 102491029 1004Numerical value in this table refers to number of samples number oftraining samples and pc respectively

Classification accuracy is reported via the concrete classifiersGenerally many dimension reduction research papers adopt119870-nearest neighborhood classifier (KNN) and support vectormachine (SVM) classifier to measure the performance of theextracting features after the dimension reduction where theoverall accuracy and kappa coefficient are detailed in thereports Hereby in this paper we also adopt KNN classifierand SVM classifier for performance measurement In KNNclassifier we select the value of 119870 as 1 5 and 9 so thatthree classifiers based on nearest neighborhoods are formedwhich are called 1NN 5NN and 9NN In SVM classifier weseek a hyperplane to separate classes in kernel-induced spacewhere the linear nonseparable classes in the original featurespace can be separated via kernel trick SVM as a robustand successful classifier has been widely used to evaluatethe performance of multifarious methods in many areas Forsimplicity and convenience we use LIBSVM package [35] forexperiments Accuracy of dimension reduced performancewill be reported by classified performance from SVM clas-sifier In the following schedule the feature subspace will becalculated at the first step from training samples by differentdimensional algorithms Table 1 gives a numerical statisticsof training samples corresponding to each class Then thenew sample will be projected into a low subspace by thetransformedmatrix Finally all the new samples are classifiedby SVM classifier

In this experiment a total of 1029 samples were selectedfor training and the remaining samples are used for testingNote that all the labeled samples in database are unbalancedand the available samples of each category differ dramaticallyThe following strategy is imposed for sample division A fixednumber of 15 samples are randomly selected to form thetraining sample yet the absent samples are randomly selected

Mathematical Problems in Engineering 11

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(b) 5NN

7 9 11 13 15Reduced space

0

01

02

03

04

05

06

07

08

09

1

Ove

rall

accu

racy

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(d) Liner SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF-SVM

Figure 2 Overall accuracy by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

12 Mathematical Problems in Engineering

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(b) 5NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(d) Linear SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF SVM

Figure 3 Kappa coefficient by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

Mathematical Problems in Engineering 13

(a) Pseudo 3-channel color image band = [12 79 140]

1

2

2

2

2

2 2

3

3

3

3

3

4

5

5

5

5

6

6 6 6

7

8

9

10

10

10 10

11

11

11

11

11 12

12 12

13 14

14

14 15

15

16

(b) Ground truth

(c) Distribution of training sample

Figure 4 Illustration of sample partition

from the remaining samples Under this strategy the trainingsamples and testing samples are listed in Table 1

Figure 2 shows the overall accuracy of different dimen-sion reduction methods applied to AVIRIS Indian 92AV3Cdata set The neighborhood of KNN classifier is selected as 15 and 9 respectively which produces three classifiers thatis 1NN 5NN and 9NN Three different kernel functionsare adopted for SVM classifier Then derived classifiers arealso used in this experiment that is linear SVM polynomialSVM and RBFSVM It can be deduced from Figures 2(a)sim2(c) that when the embedding space is greater than 5 pro-posed PD-LFDA performs the best while JGLDA performsthe worst The results produced by RP is slightly better thanJGLDA PCA LDA LPP and RP show the similar classifiedresults under KNN classifierThat is the proposed PD-LFDAoutperforms the others Meantime compared with LFDAthe proposed PD-LFDA leads to 2 more improvementson average Moreover it can be observed from (d) that theclassified accuracy increases steadily as the embedded spaceincreases However LDA demonstrates the highest overallaccuracywhen the reduced features vary to 9 while LFDAhas

the significant improvements when the number of reducedfeatures is greater than 9 This phenomenon of Figure 2(d)indicates the instability of linear SVM Nevertheless thesituation reversed for polynomial SVM and RBF SVM inFigures 2(e) and 2(f) wherein the proposed PD-LFDAwins a little improvement against LFDA and has significantimprovement compared with the others Inspired effects ofproposed PD-LFDA algorithm were achieved in all casesFurthermore Table 2 gives the detailed overall accuracyunder different feature dimensions using 3NN 7NN andRBFSVM classifiers which validates the feasibility of theproposed scheme in this paper

Figure 3 displays the results of kappa coefficient obtainedusing the different dimension reduction algorithms underKNN classifier and SVM classifiers The experimental cir-cumstance of Figure 3 is same as that of Figure 2 We canfind that from these results JGLDA performs the worst inmost cases except in Figure 3(e) The proposed PD-LFDAmethod outperforms the other methods and achieves thehighest kappa numerical value in most cases except usingthe linear SVM as the classifier In fact none of them work

14 Mathematical Problems in Engineering

686864717360

(a) PCA-7NN

687564707607

(b) LPP-7NN

792376538609

(c) LFDA-7NN

739070418130

(d) LDA-7NN

619556516209

(e) JGLDA-7NN

670962947164

(f) RP-7NN

799277288570

(g) PCA-RBFSVM

767473558288

(h) LPP-RBFSVM

837581518822

(i) LFDA-RBFSVM

728769288071

(j) LDA-RBFSVM

582453777121

(k) JGLDA-RBFSVM

763173278365

(l) RP-RBFSVM

Figure 5 Classified map generated by different dimension reduction methods where the overall accuracy kappa coefficient and averageclassification accuracy are listed at the top of each map respectively

steady in the case of linear SVM (Figure 3(d)) Note that thesituation is improved in polynomial SVM where the kappanumerical value of the proposed PD-LFDA is significantlybetter than the others All the achievements demonstrate therobustness of our contribution in PD-LFDA Simultaneouslyit is noticeable that LPP exhibits an average kappa levelThe kappa value gained by LPP is not seriously bad and isnot dramatically good The kappa results produced by RPare approximately the same as LPP A significant advantageof RP is the simple construction and computation wherethe accuracy is closed to LPP More details are summarizedin Table 3 It can be concluded that the kappa coefficientof proposed algorithm is higher than the other approacheswhich is more appropriate for the classification of HSI data

The visual results of all methods are presented in Figures5sim6 where the class labels are converted to pseudocolor

image The pseudocolor image of the hyperspectral imagefrom Indian 92AV3C database is shown in Figure 4(a) Theavailable labeled image which represents the truth groundis illustrated in Figure 4(b) where the labels are made byhuman The training samples are selected from the labeledimage represented as points in the image as shown inFigure 4(c) Each label number (ID) corresponds to eachclass name which is indexed in Table 1 In this experimentall the available labeled samples are used for testing whileapproximate 10 of samples are used for training Thesubspace is fixed to 13 (the number here is only used forreference it can be changed) For each experiment thedimension from original feature space is reduced to theobjective dimensionality thereafter the classified maps areinduced by 7NN classifier and RBF-SVM classifier Theoverall accuracy kappa coefficient and average accuracy are

Mathematical Problems in Engineering 15

837981698991

(a) PD-LFDA-7NN

848682798968

(b) D-LFDA-RBFSVM

Figure 6 Classified map of the proposed method

Table 2 Overall accuracy obtained using different dimensions offeature and different classifiers for Indian Pine Scene database ()

Item Dims7 9 11 13 15

3NNPCA 7007 6996 7026 7035 7087LPP 6575 6871 6835 6695 6546LFDA 7513 7895 7963 7992 8009LDA 6415 6606 6616 6596 6536PD-LFDA 7713 8094 816 8218 8263JGLDA 5405 5602 5704 5642 5751RP 6173 6444 6745 6632 6652

7NNPCA 6902 6902 691 6933 6972LPP 6782 7057 7027 6892 6681LFDA 7419 7777 7827 7863 7862LDA 6615 6909 6899 692 6882PD-LFDA 7736 809 8155 8172 8235JGLDA 5665 5811 5809 5812 5914RP 6194 6481 6696 6613 6603

RBF-SVMPCA 8051 7794 7718 7935 8176LPP 7101 7514 7614 7501 7314LFDA 7826 8187 82 8113 7915LDA 677 6834 6877 6642 6792PD-LFDA 7858 8267 8395 8423 8166JGLDA 5686 5606 5663 5689 5879RP 6957 7282 7441 7416 7543

also included at the top of each map respectively Figure 5displays the classifiedmaps of classicmethods in pseudocolorimages and the classified maps of proposed PD-LFDA arepresented in Figure 6 It can be observed from these mapsthat the best performance is achieved by PD-LFDA when7NN classifier is used in this case the overall accuracyis 8379 the kappa coefficient is 8169 and the averageaccuracy is 8991 Moreover the worst algorithm is JGLDAwhose overall accuracy is 6195 the kappa coefficient

Table 3 Kappa coefficient by different dimension reduction meth-ods and different classifiers applied to Indian Pine Scene database()

Item Dims7 9 11 13 15

3NNPCA 6592 658 6612 6624 6681LPP 6084 6422 6382 6219 605LFDA 7145 7586 7664 7697 7718LDA 5913 6126 6137 6118 6048PD-LFDA 7392 7824 7901 7966 8016JGLDA 4778 4992 5082 5006 5119RP 564 5963 6303 6172 6192

7NNPCA 6466 6466 6475 6501 6545LPP 6304 6629 6585 6429 6185LFDA 7047 7457 7515 7557 7557LDA 6132 646 6453 6478 6431PD-LFDA 7417 782 7893 7913 7985JGLDA 5029 519 5161 5159 5263RP 5638 5992 6237 6135 6124

RBF-SVMPCA 7767 747 7393 7636 7900LPP 6677 715 7264 7115 6921LFDA 7486 7915 7924 7822 7584LDA 629 6362 6427 6153 6323PD-LFDA 7551 8011 8159 8187 7899JGLDA 5109 5066 5133 5147 5369RP 653 6908 7085 706 7203

becomes 5651 and the average accuracy is only 6209Other methods such as PCA LPP and RP can produce thecomparable results and no one can outperform the otherHowever LDAoutperformsPCA LPP andRP yielding betterresult Similar conclusions can be achieved in the groupof RBF-SVM Generally proposed PD-LFDA significantlyoutperforms the rest in this experiment which indicates thecorrectness of improvements in proposed PD-LFDA

16 Mathematical Problems in Engineering

Table 4 Performance of dimension reduction on the whole labeled samples ()

Evaluated item MethodologiesPCA LDA LPP LFDA JGLDA RP PD-LFDA

7NNOverall accuracy 6868 739 6875 7923 6195 6709 8379Kappa coefficient 6471 7041 647 7653 5651 6294 8169Average accuracy 737 813 7607 8609 6209 7164 8991

RBF-SVMOverall accuracy 7992 7287 7674 8375 5824 7631 8486Kappa coefficient 7728 6928 7355 8151 5377 7327 8279Average accuracy 857 8071 8288 8822 7121 8365 8968

Finally details of assessment deduced by 7NN and RBF-SVM is summarized in Table 4 where the correspondingoverall accuracy kappa coefficient and average accuracy ofdifferent methods can be located collectively

5 Conclusions

In this paper we have analyzed local Fisher discriminantanalysis (LFDA) and found its weakness By replacing themaximum distance with local variance for the constructionof weight matrix introducing class prior probability into thecomputation of affinitymatrix an improved LFDA algorithmhas been proposed This novel approach is called PD-LFDAbecause the probability distribution (PD) is applied in LFDAalgorithm The proposed approach essentially can increasethe discriminant ability of transformed features in lowdimensional space The pattern found by the new approachis expected to be more accurate and coincides with thecharacter of HSI data and is conducive to classify HSI dataPD-LFDA has been evaluated on a real removing sensingAVIRIS Indian Pine 92AV3C data set We have compared theperformance of the proposed PD-LFDA with that of PCALPP LFDA LDA JGLDA and RP Both the numerical resultsand visual inspection of class maps have been obtained Inthe experiments KNN classifier and SVM classifier havebeen used We have argued that the proposed PD-LFDAexhibits the best performance and serves as a very effectivedimensionality reduction tool for high dimensional datasuch as hyperspectral image (HSI) data

Appendix

Procedure of Proposed Algorithm

The brief description of the algorithm to perform the pro-posed PD-LFDA method is already presented in Section 3The details of the algorithm are provided in Algorithm 2

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the Research of University ofMacau under Grants no MYRG205(Y1-L4)-FST11-TYY noMYRG187(Y1-L3)-FST11-TYY and no SRG010-FST11-TYYand by the National Natural Science Foundation of Chinaunder Grant no 61273244 This research project was alsosupported by the Science andTechnologyDevelopment Fund(FDCT) of Macau under Contract no 100-2012-A3

References

[1] W Li S Prasad Z Ye J E Fowler and M Cui ldquoLocality-preserving discriminant analysis for hyperspectral image clas-sification using local spatial informationrdquo in Proceedings ofthe 32nd IEEE International Geoscience and Remote SensingSymposium (IGARSS rsquo12) pp 4134ndash4137MunichGermany July2012

[2] HND LeM S Kim andD-H Kim ldquoComparison of singularvalue decomposition and principal component analysis appliedto hyperspectral imaging of biofilmrdquo in Proceedings of the IEEEPhotonics Conference (IPC rsquo12) pp 6ndash7 2012

[3] C K Chui and J Wang ldquoRandomized anisotropic transformfor nonlinear dimensionality reductionrdquo GEMmdashInternationalJournal on Geomathematics vol 1 no 1 pp 23ndash50 2010

[4] T V Bandos L Bruzzone and G Camps-Valls ldquoClassificationof hyperspectral images with regularized linear discriminantanalysisrdquo IEEE Transactions on Geoscience and Remote Sensingvol 47 no 3 pp 862ndash873 2009

[5] D Guangjun Z Yongsheng and J Song ldquoDimensionalityreduction of hyperspectral data based on ISOMAP algorithmrdquoin Proceedings of the 8th International Conference on ElectronicMeasurement and Instruments (ICEMI rsquo07) pp 3935ndash3938Xirsquoan China August 2007

[6] X Luo and M-F Jiang ldquoThe application of manifold learn-ing in dimensionality analysis for hyperspectral imageryrdquo inProceedings of the International Conference on Remote SensingEnvironment and Transportation Engineering (RSETE rsquo11) pp4572ndash4575 June 2011

[7] J Khodr and R Younes ldquoDimensionality reduction on hyper-spectral images a comparative review based on artificial datasrdquoin Proceedings of the 4th International Congress on Image andSignal Processing (CISP rsquo11) vol 4 pp 1875ndash1883 October 2011

Mathematical Problems in Engineering 17

[8] J Wen Z Tian H She and W Yan ldquoFeature extraction ofhyperspectral images based on preserving neighborhood dis-criminant embeddingrdquo in Proceedings of the 2nd InternationalConference on Image Analysis and Signal Processing (IASP rsquo10)pp 257ndash262 Zhejiang China April 2010

[9] Y-R Yeh S-Y Huang and Y-J Lee ldquoNonlinear dimensionreduction with kernel sliced inverse regressionrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 21 no 11 pp1590ndash1603 2009

[10] J He L Zhang Q Wang and Z Li ldquoUsing diffusion geometriccoordinates for hyperspectral imagery representationrdquo IEEEGeoscience and Remote Sensing Letters vol 6 no 4 pp 767ndash7712009

[11] J Peng P Zhang and N Riedel ldquoDiscriminant learninganalysisrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 38 no 6 pp 1614ndash1625 2008

[12] F S Tsai and K L Chan ldquoDimensionality reduction techniquesfor data explorationrdquo in Proceedings of the 6th InternationalConference on Information Communications and Signal Process-ing pp 1ndash5 December 2007

[13] M D Farrell Jr and R M Mersereau ldquoOn the impact of PCAdimension reduction for hyperspectral detection of difficulttargetsrdquo IEEE Geoscience and Remote Sensing Letters vol 2 no2 pp 192ndash195 2005

[14] S Prasad and L M Bruce ldquoLimitations of principal com-ponents analysis for hyperspectral target recognitionrdquo IEEEGeoscience and Remote Sensing Letters vol 5 no 4 pp 625ndash629 2008

[15] J Yu Q Tian T Rui and T S Huang ldquoIntegrating discriminantand descriptive information for dimension reduction and clas-sificationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 17 no 3 pp 372ndash377 2007

[16] J Kong S Wang J Wang L Ma B Fu and Y Lu ldquoAnovel approach for face recognition based on supervisedlocality preserving projection and maximummargin criterionrdquoin Proceedings of the International Conference on ComputerEngineering and Technology (ICCET rsquo09) vol 1 pp 419ndash423Singapore January 2009

[17] M Loog and R P W Duin ldquoLinear dimensionality reductionvia a heteroscedastic extension of LDA the Chernoff criterionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 26 no 6 pp 732ndash739 2004

[18] A C Jensen A Berge and A S Solberg ldquoRegressionapproaches to small sample inverse covariance matrix estima-tion for hyperspectral image classificationrdquo IEEE TransactionsonGeoscience andRemote Sensing vol 46 no 10 pp 2814ndash28222008

[19] J Jin BWang and L Zhang ldquoA novel approach based on Fisherdiscriminant null space for decomposition of mixed pixels inhyperspectral imageryrdquo IEEE Geoscience and Remote SensingLetters vol 7 no 4 pp 699ndash703 2010

[20] J Wen Z Tian X Liu and W Lin ldquoNeighborhood preservingorthogonal pnmf feature extraction for hyperspectral imageclassificationrdquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 6 no 2 pp 759ndash7682013

[21] Y Ren G Zhang G Yu and X Li ldquoLocal and global structurepreserving based feature selectionrdquoNeurocomputing vol 89 pp147ndash157 2012

[22] Z Fan Y Xu and D Zhang ldquoLocal linear discriminant analysisframework using sample neighborsrdquo IEEE Transactions onNeural Networks vol 22 no 7 pp 1119ndash1132 2011

[23] Y Wang S Huang D Liu and B Wang ldquoResearch advanceon band selection-based dimension reduction of hyperspectralremote sensing imagesrdquo in Proceedings of the 2nd InternationalConference on Remote Sensing Environment and TransportationEngineering (RSETE rsquo12) pp 1ndash4 IEEE Nanjing China June2012

[24] B Waske S van der Linden J A Benediktsson A Rabe andP Hostert ldquoSensitivity of support vector machines to randomfeature selection in classification of hyperspectral datardquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 7pp 2880ndash2889 2010

[25] M Sugiyama ldquoLocal fisher discriminant analysis for superviseddimensionality reductionrdquo in Proceedings of the 23rd Interna-tional Conference onMachine Learning (ICML rsquo06) pp 905ndash912ACM June 2006

[26] Y Cheng ldquoMean shift mode seeking and clusteringrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol17 no 8 pp 790ndash799 1995

[27] W Li S Prasad J E Fowler and L M Bruce ldquoLocality-preserving dimensionality reduction and classification forhyperspectral image analysisrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 50 no 4 pp 1185ndash1198 2012

[28] L Zelnik-Manor and P Perona ldquoSelf-tuning spectral cluster-ingrdquo in Proceedings of the 18th Annual Conference on NeuralInformation Processing Systems pp 1601ndash1608 December 2004

[29] W K Wong and H T Zhao ldquoSupervised optimal localitypreserving projectionrdquo Pattern Recognition vol 45 no 1 pp186ndash197 2012

[30] X He and P Niyogi ldquoLocality preserving projectionsrdquo inAdvances in Neural Information Processing Systems 16 SThrunL Saul and B Scholkopf Eds MIT Press Cambridge MassUSA 2004

[31] H Wang S Chen Z Hu and W Zheng ldquoLocality-preservedmaximum information projectionrdquo IEEE Transactions on Neu-ral Networks vol 19 no 4 pp 571ndash585 2008

[32] M Sugiyama ldquoDimensionality reduction ofmultimodal labeleddata by local fisher discriminant analysisrdquo Journal of MachineLearning Research vol 8 pp 1027ndash1061 2007

[33] S Yan D Xu B Zhang H J Zhang Q Yang and S Lin ldquoGraphembedding and extensions a general framework for dimen-sionality reductionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 40ndash51 2007

[34] C K Chui and J Wang ldquoDimensionality reduction of hyper-spectral imagery data for feature classificationrdquo in Handbookof Geomathematics pp 1005ndash1047 Springer Heidelberg Ger-many 2010

[35] C-C Chang and C-J Lin ldquoLIBSVM a Library for supportvector machinesrdquo ACM Transactions on Intelligent Systems andTechnology vol 2 no 3 article 27 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Subspace Learning via Local Probability ...downloads.hindawi.com/journals/mpe/2015/145136.pdf · and applicable toolkits were engendered one a er another. Hyperspectral

Mathematical Problems in Engineering 5

the spectrum space but also in the spatial space For a localpoint the calculation of making use of the diversity of its119870thnearest neighborhoods is not fully correct

An evident example is illustrated in Figure 1 where twogroups of points have different distributions In group (a)most neighbor points are closed to point 119909

0 while in group

(b) most neighbor points are far from point 1199090 However the

measurement of two cases are the same according to (19)Thiscan be found in Figure 1 where the distances between point1199090and its 119870th nearest neighborhoods (119870 = 7) are same in

both distributions which can be shown in Figures 1(a) and1(b)119871

1= 1198712This example indicates that the simplification of

local distribution by the distance between the sample 119909119894and

the 119870th nearest neighbor sample is unreasonable Actuallythe result by using of this simplification may raise someerrors

Based on the discussion above a novel approach whichis called PD-LFDA is proposed to overcome the weakness ofLFDA To be specific PD-LFDA incorporates two key pointsnamely

(1) The class prior probability is applied to compute theaffinity matrix

(2) The distribution of local patch is represented by theldquolocal variancerdquo instead of the ldquofarthest distancerdquo toconstruct the weight matrix

The proposed approach essentially increases the discriminantability of transformed features in low dimensional space Thepattern found by PD-LFDA is expected to be more accurateand coincids with the character of HSI data and is conduciveto classify HSI data

In this way a more sophisticated construction of affinitymatrix which is derived from [29] is proposed as follows

119882(119894 119895)

=

119901 (119897119894)2 exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)

sdot(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)) if 119897119894= 119897119895= 119888

0 if 119897119894= 119897119895

(20)

where 119901(119897119894) stands for the class prior probability of class 119909

119894

and 120588119894indicates the local variance Note that the denominator

item of (13) is 1119873119888 which will cancel out our prior effectif we use 119901(119897

119894) to replace 119901(119897

119894)2 (the construction of 119901(119897

119894)

will be given in (21)) Different part of this derivation playsthe same role as the original formulation for example forthe last item on one hand it plays the role of intraclassdiscriminating weight and on the other hand the productresult of119882 may reach zero if the Euclidean square distance sdot is very small for some data points For this case an extraitem (1 + exp(minus119909

119894minus 1199091198952120588119894120588119895)) is added to the construc-

tion of intraclass discriminating weight to prevent accuracytruncation By doing so our derivation can be viewed as an

integration of class prior probability the local weight andthe discriminating weight This construction is expected topreserve both the local neighborhood structure and the classinformation Besides this construction is expected to sharethe same advantages detailed in the original work

It is clear that (20) consists of two new factors comparedwith LFDA method (1) class prior probability 119901(119897

119894) and (2)

local variance 120588119894

Suppose class 119909119894to be class 119888 that is 119897

119894= 119888 so that the

probability of class 119909119894can be calculated by

119901 (119897119894) = 119901 (119888) =

119873119888

119873 (21)

where 119873119888 is the number of the samples in class 119888 whole 119873denotes the total number of samples and119873 = sum119862

119888=1119873119888

Please note that the item (1+exp(minus119909119894minus1199091198952120588119894120588119895)) in (20)

is used to prevent the extra rounding error produced from thefirst two items and to keep the total value of

119901 (119897119894)2 exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

(22)

which does not reach the minimum Here 120588lowastdenotes the

local scaling around 119909lowast In this paper a local scaling 119909

lowastis

measured by the standard deviation of local square distanceAssume that 119909(119894)

1 119909(119894)

2 119909

(119894)

119870are the119870-nearest samples of 119909

119894

and then the square distance between 119909119894and 119909(119894)

119896is given by

119889(119894)

119896=10038171003817100381710038171003817119909119894minus 119909(119894)

119896

10038171003817100381710038171003817

2

119896 = 1 2 119870 (23)

The corresponding mean (119894) can be defined as

(119894)=1

119870

119870

sum

119896=1

119889(119894)

119896

=1

119870

119870

sum

119896=1

10038171003817100381710038171003817119909119894minus 119909(119894)

119896

10038171003817100381710038171003817

2

(24)

where lowast 2 represents a square Euclidean distance and 119870 isa predefined parameter whose recommended value is 119870 = 7The standard deviation can be calculated as

120588119894= radic

1

119870

119870

sum

119896=1

(119889(119894)

119896minus (119894))

2

(25)

Note that in the above equation the item 1119870 becomesa constant that can be shifted outside Thus an equivalentformula is given by

120588119894=1

119870radic

119870

sum

119896=1

(119889(119894)

119896minus (119894))

2

(26)

Similar procedure can be deduced to 119909119895 Hence we have

120588119898=1

119870

119870

sum

119894=1

radic100381710038171003817100381710038171003817119909(119898119896)

119894

100381710038171003817100381710038171003817

2

minus1

119870

119870

sum

119895=1

100381710038171003817100381710038171003817119909(119898119896)

119895

100381710038171003817100381710038171003817

2

(27)

6 Mathematical Problems in Engineering

x1

x2

x3

x4

x5

x6

x7

L1

x0

(a)

x1

x2

x3

x4

x5

x6

x7

L2

x0

(b)

Figure 1 Different distributions of 1199090and the corresponding 119870-nearest neighborhoods (119870 = 7) (a) Most neighbors are closed to point 119909

0

(b) Most neighbors are far from point 1199090 The distances between point 119909

0and its 119870th nearest neighbors are the same in both distributions

1198711= 1198712

Comparing (19) with (27) it is noticeable that (28) holds

120588119898le 120588119894 (28)

Compared with the former definitions our definition has atleast the following advantages

(i) By incorporating the prior probability of each classwith local technique 119901(119897

119894) the proposed scheme is

expect to be a benefit for the classified accuracy(ii) The representation of local patches equation (26) is

described by local standard deviation 120588119894rather than

absolute diversity in (19) which is more accurate inmeasuring the local variance of data samples

(iii) Compared with the global calculation the proposedcalculation is taken on local patches which is efficientin getting rid of over-fitting

(iv) The proposed local scaling technique meets the char-acter of HSI data which is more applicable for theprocessing of hyperspectral image in real applications

Based on the above affinity defined an extended affinitymatrix can also be defined in a similar way Our definitiononly provides a heuristic exploration for reference Theaffinity can be further sparse for example by introducing theidea of 120576-nearest neighborhoods [31]

Theoptimal solution of improved scheme can be achievedby maximize the following criterion

119879PD-LFDA equiv arg max119879isinR119901times119902

tr (119879119879119878119887119879)

tr (119879119879119878119908119879)

(29)

It is evident that (29) has the similar form of (3) Thisfinding enlightens us that the transformation119879 can be simplyachieved by solving the generalized eigenvalue decomposi-tion of 119878minus1

119908119878119887 Moreover Let 119866 isin R119902times119902 be a 119902-dimensional

invertible square matrix It is clear that 119879PD-LFDA119866 is alsoan optimal solution of (29) This property indicates that

the optimal solution is not uniquely determined becauseof arbitrary arithmetic transformation of 119879PD-LFDA119866 Let 120593119894be the eigenvector of 119878minus1

119908119878119887corresponding to eigenvalue

119894

that is 119878119887120593119894= 119894119878119908120593119894 To cope with this issue a rescaling

procedure is adopted [25] Each eigenvector 120593119894119902

119894=1is rescaled

to satisfy the following constraint

120593119894119878119908120593119895= 1 if 119894 = 1198950 if 119894 = 119895

(30)

Then each eigenvector is weighted by the square root of itsassociated eigenvalue The transformed matrix 119879PD-LFDA ofthe proposed scheme is finally given by

119879PD-LFDA = radic11205931 radic21205932 radic1199021120593119902 isin R119901times119902 (31)

with descending order 1ge 2ge sdot sdot sdot ge

119902

For a new testing points 119909 the projected point in the newfeature space can be captured by 119910 = 119879119879PFDA119909 thus it can befurther analyzed in the transformed space

According to the above analysis we can design an algo-rithm which is called PD-LFDA Algorithm to perform ourproposed method The detailed description of this algorithmcan be found in the appendix (Algorithm 2) A summary ofthe calculation steps of PD-LFDA Algorithm is presented inAlgorithm 1

The advantage of PD-LFDA is discussed as followsFirstly to investigate the rank of the between-class scatter

matrix 119878119887of LDA 119878

119887can be rewritten as

119878119887=

119862

sum

119897=1

119873119897(119898119897minus 119898) (119898

119897minus 119898)119879

= [1198731(119898119897minus 119898) 119873

2(119898119897minus 119898) 119873

119871(119898119897minus 119898)]

sdot [1198731(119898119897minus 119898) 119873

2(119898119897minus 119898) 119873

119871(119898119897minus 119898)]

119879

(32)

Mathematical Problems in Engineering 7

Input HSI training samples119883 isin R119901times119873 dimensionality to be embedded 119902 the parameter 119870 of 119870NNand a test sample 119909

119905isin R119901

Step 1 For each sample 119909119894from the same class calculate the119882

119894119895by (20)

where the local scaling factor 120588119894is calculated via (26) or (27)

Step 2 Equations (13) and (14) can be globally and uniformly transformed into an equivalent formula via119908= 119882 sdot 119882

1

119887= 119882 sdot (119882

2minus1198821)

(i)

where the operator 119860 sdot 119861 denotes the dot product between 119860 and 119861 and

119882(119894 119895) =

119901(119897119894)2 exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

if 119897119894= 119897119895= 119888

if 119897119894= 119897119895

0

(iia)

1198821=

1

119873119888 if 119897

119894= 119897119895= 119888

0 others(iib)

1198822=1

119873(1119873times1

11times119873) (iic)

By the above formulas the product of elements in different matrices can be achieved via dot productbetween matrices The equations (iia) (iib) and (iic) can be gained by integrating the number ofeach class119873119888 the number of total training samples119873 and the local scaling 120588

119894 then matrices119882119882

11198822

can be calculatedStep 3 Construct within-scatter matrix

119908and between-scatter matrix

119887 according to (i)

Step 4 Define Laplacian matrices 119871lowast below119871lowast= 119863lowastminus lowast (iii)

where119863lowast is the row sum or column sum of119882lowast119863lowast119894119894= sum119895lowast(119894 119895) (or119863lowast

119894119894= sum119894lowast(119894 119895)) and the

notation lowast denotes one letter in 119908 119887Step 5 On the basis of (29) and (30) the transformation matrix can be achieved viaeigenvectors 119879 = radic

11205931 radic21205932 radic

1199021120593119902 119879 isin R119901times119902 that corresponding the 119902 leading

eigenvalues 119887120593119894= 119894119908120593119894in solving the general problem of

119887120593119894= 119894119908120593119894

Step 6 For a testing sample 119909119905isin R119901 the extracted feature is 119911

119905= 119879119879119909119905isin R119902

Output Transformation matrix 119879 and the extracted feature 119911119905

Algorithm 1 PD-LFDA Algorithm

Thereby

rank (119878119887) ⩽ rank ([119873

1(119898119897minus 119898) 119873

2(119898119897minus 119898)

119873119871(119898119897minus 119898)]) ⩽ 119862 minus 1

(33)

It is easy to infer that the rank of the between-class scattermatrix 119878

119887is119862minus1 atmost thus there are up to119862minus1meaningful

subfeatures that can be extracted Thanks to the help ofaffinity matrix 119882 when compared with the conventionalLDA the reduced subspace of proposed PD-LFDA can beany subdimensional space On the other hand the classicallocal fisherrsquos linear discriminant only weights the value ofsample pairs in the same classes while our method also takesin account the sample pairs in different classes Hereafter theproposedmethod will bemore flexible and the results will bemore adaptiveThe objective function of proposed method isquite similar to the conventional LDA hereby the optimalsolution is almost same as the conventional LDA whichindicates that it is also simple to implement and easy to revise

To further explore the relationship of LDA and PD-LFDAwe now rewrite the objective function of LDAandPD-LFDA respectively

119879LDA = arg max119879isinR119901times119902

trace 119879119879119878119887119879

subject to 119879119879119878119908119879 = 119868

(34)

119879PD-LFDA equiv arg max119879isinR119901times119902

trace 119879119879119878119887119879

subject to 119879119879119878119908119879 = 119868

(35)

This implies that LDA tries to maximize the between-class scatter and simultaneously constraint the within-classscatter to a certain level However such restriction is hardto constraint and no relaxation is imposed When the data isnot a single modal that is multimodal or unknown modalLDA often fails On the other hand benefiting from theflexible designing of affinity matrix119882 PD-LFDA gains morefreedom in (35) That is the separability of PD-LFDA will bemore distinct and the degree of freedom remains more than

8 Mathematical Problems in Engineering

Input HSI data samples119883 = 1199091 1199092 119909

119873 isin R119901times119873 the objective dimension to be embedded 119902

the nearest neighbor parameter 119870 (default 119870 equiv 7) and the test sample 119909119905isin R119901

Output Transformation matrix 119879 isin R119901times119902Steps are as follows

(1) Initialize matrices(2) 119878119908larr 0119901times119901

within-class scatter(3) 119878119887larr 0119901times119901

between-class scatter(4)(5) Compute within-class affinity matrix119882119908(6) for 119888 = 1 2 119862 do in a classwise manner(7) 119909

119888

119894119873119888

119894=1larr 119909

119895| 119897119895= 119888 the 119888th class data samples

(8) 119883 larr 119909119888

1 119909119888

2 119909

119888

119873119888 sample matrix

(9) 119882119894=(1119873119888times111times119873119888)

119873119888

(10)(11) Determine the local scaling(12) for 119894 = 1 2 119873119888 do(13) 119909

(119894)

119896larr the 119896th nearest neighbor of 119909119888

119894 119896 = 1 2 119870

(14) for 119896 = 1 2 119870 do(15) 119889

(119894)

119896= 119909119894minus 119909(119894)

1198962

(16) end for

(17) (119894)larr1

119870

119870

sum

119896=1

10038171003817100381710038171003817119909119894minus 119909(119894)

119896

10038171003817100381710038171003817

2

(18) 120588(119894)= radic

1

119870

119870

sum

119896=1

(119889(119894)

119896minus (119894))2

(19) end for(20)(21) Define local affinity matrix(22) for 119894 119895 = 1 2 119873119888 do(23) 119901(119897

119894) larr 119873

119888119873 prior probability

(24) 119860119894119895larr 119901(119897

119894) exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

(25) end for(26) 119860

119888= 119860

(27) end for(28)119882

119908= diag119882

11198822 119882

119862 in block diagonal manner

(29) 119860119908= diag119860

1 1198602 119860

119862 also in block diagonal manner

(30) for 119894 119895 = 1 2 119873 do(31)

119908(119894 119895) larr 119860

119908(119894 119895)119882

119908(119894 119895)

(32) end for(33)(34) Compute between-class affinity matrix119882119887

(35)119882119887larr(1119873times1

11times119873)

119873minus diag119882

11198822 119882

119862

(36) Let 119865nz denotes the nonzero flag of elements in119882119908 119865nz(119894 119895) = 1 if119882119908(119894 119895) = 0 119865nz(119894 119895) = 0 if119882119908(119894 119895) = 0

(37) 119865nz larr (1198821015840

119887= 0)

(38) 119860119887larr 0119873times119873

(39) 119860

119887(119865nz) = 119860119908(119865nz)

(40) 119860119887(not119865nz) = 1

(41) for 119894 119895 = 1 2 119873 do(42)

119887(119894 119895) larr 119860

119887(119894 119895)119882

119887(119894 119895)

(43) end for(44) Now construct Laplacian matrix for within affinity matrix

119908and between affinity matrix

119887

(45) Let(46)119863119908

119894119894= sum

119895

119908(119894 119895)119863119887

119894119894= sum

119895

119887(119894 119895)

(47) Then

Algorithm 2 Continued

Mathematical Problems in Engineering 9

(48) 119871119908= 119908minus 119863119908 119871119887= 119887minus 119863119887

(49) Construct two matrixs below(50) 119878

119887= 119883119871

119887119883119879 119878119908= 119883119871

119908119883119879

(51) Let 1205931 1205932 120593

119902 be the general eigenvector of

(52) 119878119887120593119894= 119894119878119908120593119894 forall119894 isin 1 2 119902

(53) with the corresponding eigenvalue in descending order 1ge 2ge sdot sdot sdot ge

119902

(54)(55) Finally the transformation matrix can be represented as(56) 119879 = radic120582

11205931 radic12058211205932 radic120582

1120593119902 isin R119901times119902

(57)(58) For a new test sample 119909

119905 the embedding 119911

119905is given by

(59) 119911119905= 119879119879119909119905isin R119902

Algorithm 2 Proposed PD-LFDA method

the conventional LDA thus our method is expected to bemore robust and significantly preponderant

For large scale data sets we discuss a scheme that canaccelerate the computation procedure of the within-scattermatrix 119878

119908 In our algorithm owning to the fact that we have

put penalty on the affinity matrix for different class samplesin constructing the between-scatter matrix the acceleratedprocedure will remain for further discussion

The within-class scatter 119878119908can be reformulated as

119878119908=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894119909119879

119894+ 119909119895119909119879

119895minus 119909119894119909119879

119895minus 119909119895119909119879

119894)

=

119873

sum

119894=1

(

119873

sum

119895=1

119908(119894 119895))119909

119894119909119879

119894minus

119873

sum

119894119895=1

119908(119894 119895) 119909

119894119909119879

119895

= 119883(119863119908minus 119908)119883119879

= 119883119908119883119879

(36)

Here

119863119908 (119894 119894) =

119873

sum

119895=1

119908(119894 119895)

119908= 119863119908minus 119908

(37)

119908can be block diagonal if all samples 119909

119894119873

119894=1are sorted

according to their labels This property implies that 119863119908and

119908can also be block diagonal matrix Hence if we compute

119878119908

through (36) then the procedure will be much moreefficient Similarly 119878

119887can also be formulated as

119878119887= 119883

119887119883119879

= 119883(119863119887minus 119887)119883119879

(38)

Nevertheless 119887is dense and can not be further sim-

plified However the simplified computational procedure of119908

saves for us part of time in a way In this paper weadopt the above procedure to accelerate 119878

119908and pursue 119878

119887

normally In addition to locality structure some papers showthat another property for example marginal information isalso important and should be preserved in the reduced spaceThe theory of extended LDA and LPP algorithm is developedrapidly recently Yan et al [33] summarized these algorithmsin a graph embedding framework and also proposed amarginal fisher analysis embedding (MFA) algorithm underthis framework

In MFA the criterion is characterized by intraclasscompactness and interclass marginal superability which isreplaced for thewithin-class scatter and between-class scatterseverally The intraclass relationship is reflected by an intrin-sic graph which is constructed by 119870-nearest neighborhoodsample data points in the same class while the interclasssuperability is mirrored by a penalty graph computed formarginal points from different classes Following this ideathe intraclass compactness is given as follows

119878119894= sum

119894119895 119894isin119873(119896)(119895)or 119895isin119873(119896)(119894)

10038171003817100381710038171003817119879119879119909119894minus 119879119879119909119895

10038171003817100381710038171003817

2

= 2119879119879119883 (119863 minus119882)119883

119879119879

(39)

where

119882(119894 119895) = 1 if 119894 isin 119873(119896) (119895) or 119895 isin 119873(119896) (119894) 0 otherwise

(40)

Here 119873(119896)(119895) represents the 119870-nearest neighborhood indexset of 119909

119895from the same class and119863 is the row sum (or column

sum) of 119882 119863(119894 119894) = sum119894119882119894119895 Interclass separability is indi-

cated by a penalty graph whose term is expressed as follows

119878119890= sum

119894119895 (119894119895)isin119875(119896)(119897119895)

or (119894119895)isin119875(119896)(119897119894)

10038171003817100381710038171003817119879119879119909119894minus 119879119879119909119895

10038171003817100381710038171003817

2

= 2119879119879119883(119863 minus )119883

119879119879

(41)

10 Mathematical Problems in Engineering

of which

(119894 119895) = 1 if (119894 119895) isin 119875(119896) (119897

119895) or (119894 119895) isin 119875(119896) (119897

119894)

0 otherwise(42)

Note that 119878119894and 119878

119890are corresponding to ldquowithin-scatter

matrixrdquo and ldquobetween-scatter matrixrdquo of the traditional LDAalternatively The optimal solution of MFA can be achievedby solving the following minimization problem That is

= arg119879

min 119879119879119883(119863 minus119882)119883

119879119879

119879119879119883(119863 minus )119883119879119879

(43)

We know that (43) is also a general eigenvalue decompositionproblem Let 119879PCA indicate the transformation matrix fromthe original space to PCA subspace with certain energyremaining and then the final projection of MFA is output as

119879MFA = 119879PCA (44)

As can be seenMFA constructs twoweightedmatrices119882and according to the intraclass compactness and interclassseparability In LFDA and PD-LFDA only one affinity isconstructed The difference lies in that the ldquoweightrdquo in LFDAand PD-LFDA is in the range of [0 1] according to the levelof difference Yet MFA distributes the same weight to its 119870-nearest neighborhoodsThe optimal solution ofMFA LFDAand PD-LFDA can be attributed to a general eigenvaluedecomposition problem Hence the idea of MFA LFDA andPD-LFDA is approximately similar in a certain interpretationRelationship with other methodologies can be analyzed in ananalogous way

4 Experimental Results

To illustrate the performance of PD-LFDA experimentson a real hyperspectral remote sensing image data setmdashAVIRIS Indian Pine 1992 are conducted in this sectionThe AVIRIS Indian Pines 1992 data set was gathered byNational Aeronautics and Space Administrator (NASA) withAirborne VisibleInfrared Imaging Spectrometer (AVIRIS)sensor over the Indian Pine test site in northwest Indians inJune 1992 This data set consists of 145 times 145 pixels and 224spectral reflectance bands ranging from 04 120583m to 245 120583mwith a spatial resolution of 20m The Indian Pines scene iscomposed of two-third agriculture and one-third forest orother natural perennial vegetation Some other landmarkssuch as dual lane highways rail line low density housing andsmaller roads are also in this image Since the scene was takenin June some main crops for example soybeans and cornare in their early growth stage with less than 5 coveragewhile the no-till min-till and clean-till indicate the amountof previous crop residue remaining The region map can bereferred to Figure 5(a) The 20 water absorption bands (ie[108ndash112 154ndash167] 224) were discarded

In this section performance of different dimensionreduction methods that is PCA LPP LFDA LDA JGLDAand RP [34] is compared with the proposed PD-LFDA

Table 1 Training set in AVIRIS Indian Pines 1992 database

ID Class Name1 Alfalfa (4618 3913)2 Corn-notill (1428136 952)3 Corn-mintill (83087 1048)4 Corn (23734 1434)5 Grass-pasture (48354 1118)6 Grass-trees (73071 973)7 Grass-pasture-mowed (2817 6071)8 Hay-windrowed (47850 1046)9 Oats (2015 7500)10 Soybean-notill (97286 884)11 Soybean-mintill (2455214 872)12 Soybean-clean (59354 911)13 Wheat (20528 1366)14 Woods (1265102 806)15 Buildings-Grass-Trees-Drives (38639 1010)16 Stone-Steel-Towers (9324 2581)Total 102491029 1004Numerical value in this table refers to number of samples number oftraining samples and pc respectively

Classification accuracy is reported via the concrete classifiersGenerally many dimension reduction research papers adopt119870-nearest neighborhood classifier (KNN) and support vectormachine (SVM) classifier to measure the performance of theextracting features after the dimension reduction where theoverall accuracy and kappa coefficient are detailed in thereports Hereby in this paper we also adopt KNN classifierand SVM classifier for performance measurement In KNNclassifier we select the value of 119870 as 1 5 and 9 so thatthree classifiers based on nearest neighborhoods are formedwhich are called 1NN 5NN and 9NN In SVM classifier weseek a hyperplane to separate classes in kernel-induced spacewhere the linear nonseparable classes in the original featurespace can be separated via kernel trick SVM as a robustand successful classifier has been widely used to evaluatethe performance of multifarious methods in many areas Forsimplicity and convenience we use LIBSVM package [35] forexperiments Accuracy of dimension reduced performancewill be reported by classified performance from SVM clas-sifier In the following schedule the feature subspace will becalculated at the first step from training samples by differentdimensional algorithms Table 1 gives a numerical statisticsof training samples corresponding to each class Then thenew sample will be projected into a low subspace by thetransformedmatrix Finally all the new samples are classifiedby SVM classifier

In this experiment a total of 1029 samples were selectedfor training and the remaining samples are used for testingNote that all the labeled samples in database are unbalancedand the available samples of each category differ dramaticallyThe following strategy is imposed for sample division A fixednumber of 15 samples are randomly selected to form thetraining sample yet the absent samples are randomly selected

Mathematical Problems in Engineering 11

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(b) 5NN

7 9 11 13 15Reduced space

0

01

02

03

04

05

06

07

08

09

1

Ove

rall

accu

racy

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(d) Liner SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF-SVM

Figure 2 Overall accuracy by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

12 Mathematical Problems in Engineering

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(b) 5NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(d) Linear SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF SVM

Figure 3 Kappa coefficient by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

Mathematical Problems in Engineering 13

(a) Pseudo 3-channel color image band = [12 79 140]

1

2

2

2

2

2 2

3

3

3

3

3

4

5

5

5

5

6

6 6 6

7

8

9

10

10

10 10

11

11

11

11

11 12

12 12

13 14

14

14 15

15

16

(b) Ground truth

(c) Distribution of training sample

Figure 4 Illustration of sample partition

from the remaining samples Under this strategy the trainingsamples and testing samples are listed in Table 1

Figure 2 shows the overall accuracy of different dimen-sion reduction methods applied to AVIRIS Indian 92AV3Cdata set The neighborhood of KNN classifier is selected as 15 and 9 respectively which produces three classifiers thatis 1NN 5NN and 9NN Three different kernel functionsare adopted for SVM classifier Then derived classifiers arealso used in this experiment that is linear SVM polynomialSVM and RBFSVM It can be deduced from Figures 2(a)sim2(c) that when the embedding space is greater than 5 pro-posed PD-LFDA performs the best while JGLDA performsthe worst The results produced by RP is slightly better thanJGLDA PCA LDA LPP and RP show the similar classifiedresults under KNN classifierThat is the proposed PD-LFDAoutperforms the others Meantime compared with LFDAthe proposed PD-LFDA leads to 2 more improvementson average Moreover it can be observed from (d) that theclassified accuracy increases steadily as the embedded spaceincreases However LDA demonstrates the highest overallaccuracywhen the reduced features vary to 9 while LFDAhas

the significant improvements when the number of reducedfeatures is greater than 9 This phenomenon of Figure 2(d)indicates the instability of linear SVM Nevertheless thesituation reversed for polynomial SVM and RBF SVM inFigures 2(e) and 2(f) wherein the proposed PD-LFDAwins a little improvement against LFDA and has significantimprovement compared with the others Inspired effects ofproposed PD-LFDA algorithm were achieved in all casesFurthermore Table 2 gives the detailed overall accuracyunder different feature dimensions using 3NN 7NN andRBFSVM classifiers which validates the feasibility of theproposed scheme in this paper

Figure 3 displays the results of kappa coefficient obtainedusing the different dimension reduction algorithms underKNN classifier and SVM classifiers The experimental cir-cumstance of Figure 3 is same as that of Figure 2 We canfind that from these results JGLDA performs the worst inmost cases except in Figure 3(e) The proposed PD-LFDAmethod outperforms the other methods and achieves thehighest kappa numerical value in most cases except usingthe linear SVM as the classifier In fact none of them work

14 Mathematical Problems in Engineering

686864717360

(a) PCA-7NN

687564707607

(b) LPP-7NN

792376538609

(c) LFDA-7NN

739070418130

(d) LDA-7NN

619556516209

(e) JGLDA-7NN

670962947164

(f) RP-7NN

799277288570

(g) PCA-RBFSVM

767473558288

(h) LPP-RBFSVM

837581518822

(i) LFDA-RBFSVM

728769288071

(j) LDA-RBFSVM

582453777121

(k) JGLDA-RBFSVM

763173278365

(l) RP-RBFSVM

Figure 5 Classified map generated by different dimension reduction methods where the overall accuracy kappa coefficient and averageclassification accuracy are listed at the top of each map respectively

steady in the case of linear SVM (Figure 3(d)) Note that thesituation is improved in polynomial SVM where the kappanumerical value of the proposed PD-LFDA is significantlybetter than the others All the achievements demonstrate therobustness of our contribution in PD-LFDA Simultaneouslyit is noticeable that LPP exhibits an average kappa levelThe kappa value gained by LPP is not seriously bad and isnot dramatically good The kappa results produced by RPare approximately the same as LPP A significant advantageof RP is the simple construction and computation wherethe accuracy is closed to LPP More details are summarizedin Table 3 It can be concluded that the kappa coefficientof proposed algorithm is higher than the other approacheswhich is more appropriate for the classification of HSI data

The visual results of all methods are presented in Figures5sim6 where the class labels are converted to pseudocolor

image The pseudocolor image of the hyperspectral imagefrom Indian 92AV3C database is shown in Figure 4(a) Theavailable labeled image which represents the truth groundis illustrated in Figure 4(b) where the labels are made byhuman The training samples are selected from the labeledimage represented as points in the image as shown inFigure 4(c) Each label number (ID) corresponds to eachclass name which is indexed in Table 1 In this experimentall the available labeled samples are used for testing whileapproximate 10 of samples are used for training Thesubspace is fixed to 13 (the number here is only used forreference it can be changed) For each experiment thedimension from original feature space is reduced to theobjective dimensionality thereafter the classified maps areinduced by 7NN classifier and RBF-SVM classifier Theoverall accuracy kappa coefficient and average accuracy are

Mathematical Problems in Engineering 15

837981698991

(a) PD-LFDA-7NN

848682798968

(b) D-LFDA-RBFSVM

Figure 6 Classified map of the proposed method

Table 2 Overall accuracy obtained using different dimensions offeature and different classifiers for Indian Pine Scene database ()

Item Dims7 9 11 13 15

3NNPCA 7007 6996 7026 7035 7087LPP 6575 6871 6835 6695 6546LFDA 7513 7895 7963 7992 8009LDA 6415 6606 6616 6596 6536PD-LFDA 7713 8094 816 8218 8263JGLDA 5405 5602 5704 5642 5751RP 6173 6444 6745 6632 6652

7NNPCA 6902 6902 691 6933 6972LPP 6782 7057 7027 6892 6681LFDA 7419 7777 7827 7863 7862LDA 6615 6909 6899 692 6882PD-LFDA 7736 809 8155 8172 8235JGLDA 5665 5811 5809 5812 5914RP 6194 6481 6696 6613 6603

RBF-SVMPCA 8051 7794 7718 7935 8176LPP 7101 7514 7614 7501 7314LFDA 7826 8187 82 8113 7915LDA 677 6834 6877 6642 6792PD-LFDA 7858 8267 8395 8423 8166JGLDA 5686 5606 5663 5689 5879RP 6957 7282 7441 7416 7543

also included at the top of each map respectively Figure 5displays the classifiedmaps of classicmethods in pseudocolorimages and the classified maps of proposed PD-LFDA arepresented in Figure 6 It can be observed from these mapsthat the best performance is achieved by PD-LFDA when7NN classifier is used in this case the overall accuracyis 8379 the kappa coefficient is 8169 and the averageaccuracy is 8991 Moreover the worst algorithm is JGLDAwhose overall accuracy is 6195 the kappa coefficient

Table 3 Kappa coefficient by different dimension reduction meth-ods and different classifiers applied to Indian Pine Scene database()

Item Dims7 9 11 13 15

3NNPCA 6592 658 6612 6624 6681LPP 6084 6422 6382 6219 605LFDA 7145 7586 7664 7697 7718LDA 5913 6126 6137 6118 6048PD-LFDA 7392 7824 7901 7966 8016JGLDA 4778 4992 5082 5006 5119RP 564 5963 6303 6172 6192

7NNPCA 6466 6466 6475 6501 6545LPP 6304 6629 6585 6429 6185LFDA 7047 7457 7515 7557 7557LDA 6132 646 6453 6478 6431PD-LFDA 7417 782 7893 7913 7985JGLDA 5029 519 5161 5159 5263RP 5638 5992 6237 6135 6124

RBF-SVMPCA 7767 747 7393 7636 7900LPP 6677 715 7264 7115 6921LFDA 7486 7915 7924 7822 7584LDA 629 6362 6427 6153 6323PD-LFDA 7551 8011 8159 8187 7899JGLDA 5109 5066 5133 5147 5369RP 653 6908 7085 706 7203

becomes 5651 and the average accuracy is only 6209Other methods such as PCA LPP and RP can produce thecomparable results and no one can outperform the otherHowever LDAoutperformsPCA LPP andRP yielding betterresult Similar conclusions can be achieved in the groupof RBF-SVM Generally proposed PD-LFDA significantlyoutperforms the rest in this experiment which indicates thecorrectness of improvements in proposed PD-LFDA

16 Mathematical Problems in Engineering

Table 4 Performance of dimension reduction on the whole labeled samples ()

Evaluated item MethodologiesPCA LDA LPP LFDA JGLDA RP PD-LFDA

7NNOverall accuracy 6868 739 6875 7923 6195 6709 8379Kappa coefficient 6471 7041 647 7653 5651 6294 8169Average accuracy 737 813 7607 8609 6209 7164 8991

RBF-SVMOverall accuracy 7992 7287 7674 8375 5824 7631 8486Kappa coefficient 7728 6928 7355 8151 5377 7327 8279Average accuracy 857 8071 8288 8822 7121 8365 8968

Finally details of assessment deduced by 7NN and RBF-SVM is summarized in Table 4 where the correspondingoverall accuracy kappa coefficient and average accuracy ofdifferent methods can be located collectively

5 Conclusions

In this paper we have analyzed local Fisher discriminantanalysis (LFDA) and found its weakness By replacing themaximum distance with local variance for the constructionof weight matrix introducing class prior probability into thecomputation of affinitymatrix an improved LFDA algorithmhas been proposed This novel approach is called PD-LFDAbecause the probability distribution (PD) is applied in LFDAalgorithm The proposed approach essentially can increasethe discriminant ability of transformed features in lowdimensional space The pattern found by the new approachis expected to be more accurate and coincides with thecharacter of HSI data and is conducive to classify HSI dataPD-LFDA has been evaluated on a real removing sensingAVIRIS Indian Pine 92AV3C data set We have compared theperformance of the proposed PD-LFDA with that of PCALPP LFDA LDA JGLDA and RP Both the numerical resultsand visual inspection of class maps have been obtained Inthe experiments KNN classifier and SVM classifier havebeen used We have argued that the proposed PD-LFDAexhibits the best performance and serves as a very effectivedimensionality reduction tool for high dimensional datasuch as hyperspectral image (HSI) data

Appendix

Procedure of Proposed Algorithm

The brief description of the algorithm to perform the pro-posed PD-LFDA method is already presented in Section 3The details of the algorithm are provided in Algorithm 2

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the Research of University ofMacau under Grants no MYRG205(Y1-L4)-FST11-TYY noMYRG187(Y1-L3)-FST11-TYY and no SRG010-FST11-TYYand by the National Natural Science Foundation of Chinaunder Grant no 61273244 This research project was alsosupported by the Science andTechnologyDevelopment Fund(FDCT) of Macau under Contract no 100-2012-A3

References

[1] W Li S Prasad Z Ye J E Fowler and M Cui ldquoLocality-preserving discriminant analysis for hyperspectral image clas-sification using local spatial informationrdquo in Proceedings ofthe 32nd IEEE International Geoscience and Remote SensingSymposium (IGARSS rsquo12) pp 4134ndash4137MunichGermany July2012

[2] HND LeM S Kim andD-H Kim ldquoComparison of singularvalue decomposition and principal component analysis appliedto hyperspectral imaging of biofilmrdquo in Proceedings of the IEEEPhotonics Conference (IPC rsquo12) pp 6ndash7 2012

[3] C K Chui and J Wang ldquoRandomized anisotropic transformfor nonlinear dimensionality reductionrdquo GEMmdashInternationalJournal on Geomathematics vol 1 no 1 pp 23ndash50 2010

[4] T V Bandos L Bruzzone and G Camps-Valls ldquoClassificationof hyperspectral images with regularized linear discriminantanalysisrdquo IEEE Transactions on Geoscience and Remote Sensingvol 47 no 3 pp 862ndash873 2009

[5] D Guangjun Z Yongsheng and J Song ldquoDimensionalityreduction of hyperspectral data based on ISOMAP algorithmrdquoin Proceedings of the 8th International Conference on ElectronicMeasurement and Instruments (ICEMI rsquo07) pp 3935ndash3938Xirsquoan China August 2007

[6] X Luo and M-F Jiang ldquoThe application of manifold learn-ing in dimensionality analysis for hyperspectral imageryrdquo inProceedings of the International Conference on Remote SensingEnvironment and Transportation Engineering (RSETE rsquo11) pp4572ndash4575 June 2011

[7] J Khodr and R Younes ldquoDimensionality reduction on hyper-spectral images a comparative review based on artificial datasrdquoin Proceedings of the 4th International Congress on Image andSignal Processing (CISP rsquo11) vol 4 pp 1875ndash1883 October 2011

Mathematical Problems in Engineering 17

[8] J Wen Z Tian H She and W Yan ldquoFeature extraction ofhyperspectral images based on preserving neighborhood dis-criminant embeddingrdquo in Proceedings of the 2nd InternationalConference on Image Analysis and Signal Processing (IASP rsquo10)pp 257ndash262 Zhejiang China April 2010

[9] Y-R Yeh S-Y Huang and Y-J Lee ldquoNonlinear dimensionreduction with kernel sliced inverse regressionrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 21 no 11 pp1590ndash1603 2009

[10] J He L Zhang Q Wang and Z Li ldquoUsing diffusion geometriccoordinates for hyperspectral imagery representationrdquo IEEEGeoscience and Remote Sensing Letters vol 6 no 4 pp 767ndash7712009

[11] J Peng P Zhang and N Riedel ldquoDiscriminant learninganalysisrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 38 no 6 pp 1614ndash1625 2008

[12] F S Tsai and K L Chan ldquoDimensionality reduction techniquesfor data explorationrdquo in Proceedings of the 6th InternationalConference on Information Communications and Signal Process-ing pp 1ndash5 December 2007

[13] M D Farrell Jr and R M Mersereau ldquoOn the impact of PCAdimension reduction for hyperspectral detection of difficulttargetsrdquo IEEE Geoscience and Remote Sensing Letters vol 2 no2 pp 192ndash195 2005

[14] S Prasad and L M Bruce ldquoLimitations of principal com-ponents analysis for hyperspectral target recognitionrdquo IEEEGeoscience and Remote Sensing Letters vol 5 no 4 pp 625ndash629 2008

[15] J Yu Q Tian T Rui and T S Huang ldquoIntegrating discriminantand descriptive information for dimension reduction and clas-sificationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 17 no 3 pp 372ndash377 2007

[16] J Kong S Wang J Wang L Ma B Fu and Y Lu ldquoAnovel approach for face recognition based on supervisedlocality preserving projection and maximummargin criterionrdquoin Proceedings of the International Conference on ComputerEngineering and Technology (ICCET rsquo09) vol 1 pp 419ndash423Singapore January 2009

[17] M Loog and R P W Duin ldquoLinear dimensionality reductionvia a heteroscedastic extension of LDA the Chernoff criterionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 26 no 6 pp 732ndash739 2004

[18] A C Jensen A Berge and A S Solberg ldquoRegressionapproaches to small sample inverse covariance matrix estima-tion for hyperspectral image classificationrdquo IEEE TransactionsonGeoscience andRemote Sensing vol 46 no 10 pp 2814ndash28222008

[19] J Jin BWang and L Zhang ldquoA novel approach based on Fisherdiscriminant null space for decomposition of mixed pixels inhyperspectral imageryrdquo IEEE Geoscience and Remote SensingLetters vol 7 no 4 pp 699ndash703 2010

[20] J Wen Z Tian X Liu and W Lin ldquoNeighborhood preservingorthogonal pnmf feature extraction for hyperspectral imageclassificationrdquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 6 no 2 pp 759ndash7682013

[21] Y Ren G Zhang G Yu and X Li ldquoLocal and global structurepreserving based feature selectionrdquoNeurocomputing vol 89 pp147ndash157 2012

[22] Z Fan Y Xu and D Zhang ldquoLocal linear discriminant analysisframework using sample neighborsrdquo IEEE Transactions onNeural Networks vol 22 no 7 pp 1119ndash1132 2011

[23] Y Wang S Huang D Liu and B Wang ldquoResearch advanceon band selection-based dimension reduction of hyperspectralremote sensing imagesrdquo in Proceedings of the 2nd InternationalConference on Remote Sensing Environment and TransportationEngineering (RSETE rsquo12) pp 1ndash4 IEEE Nanjing China June2012

[24] B Waske S van der Linden J A Benediktsson A Rabe andP Hostert ldquoSensitivity of support vector machines to randomfeature selection in classification of hyperspectral datardquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 7pp 2880ndash2889 2010

[25] M Sugiyama ldquoLocal fisher discriminant analysis for superviseddimensionality reductionrdquo in Proceedings of the 23rd Interna-tional Conference onMachine Learning (ICML rsquo06) pp 905ndash912ACM June 2006

[26] Y Cheng ldquoMean shift mode seeking and clusteringrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol17 no 8 pp 790ndash799 1995

[27] W Li S Prasad J E Fowler and L M Bruce ldquoLocality-preserving dimensionality reduction and classification forhyperspectral image analysisrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 50 no 4 pp 1185ndash1198 2012

[28] L Zelnik-Manor and P Perona ldquoSelf-tuning spectral cluster-ingrdquo in Proceedings of the 18th Annual Conference on NeuralInformation Processing Systems pp 1601ndash1608 December 2004

[29] W K Wong and H T Zhao ldquoSupervised optimal localitypreserving projectionrdquo Pattern Recognition vol 45 no 1 pp186ndash197 2012

[30] X He and P Niyogi ldquoLocality preserving projectionsrdquo inAdvances in Neural Information Processing Systems 16 SThrunL Saul and B Scholkopf Eds MIT Press Cambridge MassUSA 2004

[31] H Wang S Chen Z Hu and W Zheng ldquoLocality-preservedmaximum information projectionrdquo IEEE Transactions on Neu-ral Networks vol 19 no 4 pp 571ndash585 2008

[32] M Sugiyama ldquoDimensionality reduction ofmultimodal labeleddata by local fisher discriminant analysisrdquo Journal of MachineLearning Research vol 8 pp 1027ndash1061 2007

[33] S Yan D Xu B Zhang H J Zhang Q Yang and S Lin ldquoGraphembedding and extensions a general framework for dimen-sionality reductionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 40ndash51 2007

[34] C K Chui and J Wang ldquoDimensionality reduction of hyper-spectral imagery data for feature classificationrdquo in Handbookof Geomathematics pp 1005ndash1047 Springer Heidelberg Ger-many 2010

[35] C-C Chang and C-J Lin ldquoLIBSVM a Library for supportvector machinesrdquo ACM Transactions on Intelligent Systems andTechnology vol 2 no 3 article 27 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Subspace Learning via Local Probability ...downloads.hindawi.com/journals/mpe/2015/145136.pdf · and applicable toolkits were engendered one a er another. Hyperspectral

6 Mathematical Problems in Engineering

x1

x2

x3

x4

x5

x6

x7

L1

x0

(a)

x1

x2

x3

x4

x5

x6

x7

L2

x0

(b)

Figure 1 Different distributions of 1199090and the corresponding 119870-nearest neighborhoods (119870 = 7) (a) Most neighbors are closed to point 119909

0

(b) Most neighbors are far from point 1199090 The distances between point 119909

0and its 119870th nearest neighbors are the same in both distributions

1198711= 1198712

Comparing (19) with (27) it is noticeable that (28) holds

120588119898le 120588119894 (28)

Compared with the former definitions our definition has atleast the following advantages

(i) By incorporating the prior probability of each classwith local technique 119901(119897

119894) the proposed scheme is

expect to be a benefit for the classified accuracy(ii) The representation of local patches equation (26) is

described by local standard deviation 120588119894rather than

absolute diversity in (19) which is more accurate inmeasuring the local variance of data samples

(iii) Compared with the global calculation the proposedcalculation is taken on local patches which is efficientin getting rid of over-fitting

(iv) The proposed local scaling technique meets the char-acter of HSI data which is more applicable for theprocessing of hyperspectral image in real applications

Based on the above affinity defined an extended affinitymatrix can also be defined in a similar way Our definitiononly provides a heuristic exploration for reference Theaffinity can be further sparse for example by introducing theidea of 120576-nearest neighborhoods [31]

Theoptimal solution of improved scheme can be achievedby maximize the following criterion

119879PD-LFDA equiv arg max119879isinR119901times119902

tr (119879119879119878119887119879)

tr (119879119879119878119908119879)

(29)

It is evident that (29) has the similar form of (3) Thisfinding enlightens us that the transformation119879 can be simplyachieved by solving the generalized eigenvalue decomposi-tion of 119878minus1

119908119878119887 Moreover Let 119866 isin R119902times119902 be a 119902-dimensional

invertible square matrix It is clear that 119879PD-LFDA119866 is alsoan optimal solution of (29) This property indicates that

the optimal solution is not uniquely determined becauseof arbitrary arithmetic transformation of 119879PD-LFDA119866 Let 120593119894be the eigenvector of 119878minus1

119908119878119887corresponding to eigenvalue

119894

that is 119878119887120593119894= 119894119878119908120593119894 To cope with this issue a rescaling

procedure is adopted [25] Each eigenvector 120593119894119902

119894=1is rescaled

to satisfy the following constraint

120593119894119878119908120593119895= 1 if 119894 = 1198950 if 119894 = 119895

(30)

Then each eigenvector is weighted by the square root of itsassociated eigenvalue The transformed matrix 119879PD-LFDA ofthe proposed scheme is finally given by

119879PD-LFDA = radic11205931 radic21205932 radic1199021120593119902 isin R119901times119902 (31)

with descending order 1ge 2ge sdot sdot sdot ge

119902

For a new testing points 119909 the projected point in the newfeature space can be captured by 119910 = 119879119879PFDA119909 thus it can befurther analyzed in the transformed space

According to the above analysis we can design an algo-rithm which is called PD-LFDA Algorithm to perform ourproposed method The detailed description of this algorithmcan be found in the appendix (Algorithm 2) A summary ofthe calculation steps of PD-LFDA Algorithm is presented inAlgorithm 1

The advantage of PD-LFDA is discussed as followsFirstly to investigate the rank of the between-class scatter

matrix 119878119887of LDA 119878

119887can be rewritten as

119878119887=

119862

sum

119897=1

119873119897(119898119897minus 119898) (119898

119897minus 119898)119879

= [1198731(119898119897minus 119898) 119873

2(119898119897minus 119898) 119873

119871(119898119897minus 119898)]

sdot [1198731(119898119897minus 119898) 119873

2(119898119897minus 119898) 119873

119871(119898119897minus 119898)]

119879

(32)

Mathematical Problems in Engineering 7

Input HSI training samples119883 isin R119901times119873 dimensionality to be embedded 119902 the parameter 119870 of 119870NNand a test sample 119909

119905isin R119901

Step 1 For each sample 119909119894from the same class calculate the119882

119894119895by (20)

where the local scaling factor 120588119894is calculated via (26) or (27)

Step 2 Equations (13) and (14) can be globally and uniformly transformed into an equivalent formula via119908= 119882 sdot 119882

1

119887= 119882 sdot (119882

2minus1198821)

(i)

where the operator 119860 sdot 119861 denotes the dot product between 119860 and 119861 and

119882(119894 119895) =

119901(119897119894)2 exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

if 119897119894= 119897119895= 119888

if 119897119894= 119897119895

0

(iia)

1198821=

1

119873119888 if 119897

119894= 119897119895= 119888

0 others(iib)

1198822=1

119873(1119873times1

11times119873) (iic)

By the above formulas the product of elements in different matrices can be achieved via dot productbetween matrices The equations (iia) (iib) and (iic) can be gained by integrating the number ofeach class119873119888 the number of total training samples119873 and the local scaling 120588

119894 then matrices119882119882

11198822

can be calculatedStep 3 Construct within-scatter matrix

119908and between-scatter matrix

119887 according to (i)

Step 4 Define Laplacian matrices 119871lowast below119871lowast= 119863lowastminus lowast (iii)

where119863lowast is the row sum or column sum of119882lowast119863lowast119894119894= sum119895lowast(119894 119895) (or119863lowast

119894119894= sum119894lowast(119894 119895)) and the

notation lowast denotes one letter in 119908 119887Step 5 On the basis of (29) and (30) the transformation matrix can be achieved viaeigenvectors 119879 = radic

11205931 radic21205932 radic

1199021120593119902 119879 isin R119901times119902 that corresponding the 119902 leading

eigenvalues 119887120593119894= 119894119908120593119894in solving the general problem of

119887120593119894= 119894119908120593119894

Step 6 For a testing sample 119909119905isin R119901 the extracted feature is 119911

119905= 119879119879119909119905isin R119902

Output Transformation matrix 119879 and the extracted feature 119911119905

Algorithm 1 PD-LFDA Algorithm

Thereby

rank (119878119887) ⩽ rank ([119873

1(119898119897minus 119898) 119873

2(119898119897minus 119898)

119873119871(119898119897minus 119898)]) ⩽ 119862 minus 1

(33)

It is easy to infer that the rank of the between-class scattermatrix 119878

119887is119862minus1 atmost thus there are up to119862minus1meaningful

subfeatures that can be extracted Thanks to the help ofaffinity matrix 119882 when compared with the conventionalLDA the reduced subspace of proposed PD-LFDA can beany subdimensional space On the other hand the classicallocal fisherrsquos linear discriminant only weights the value ofsample pairs in the same classes while our method also takesin account the sample pairs in different classes Hereafter theproposedmethod will bemore flexible and the results will bemore adaptiveThe objective function of proposed method isquite similar to the conventional LDA hereby the optimalsolution is almost same as the conventional LDA whichindicates that it is also simple to implement and easy to revise

To further explore the relationship of LDA and PD-LFDAwe now rewrite the objective function of LDAandPD-LFDA respectively

119879LDA = arg max119879isinR119901times119902

trace 119879119879119878119887119879

subject to 119879119879119878119908119879 = 119868

(34)

119879PD-LFDA equiv arg max119879isinR119901times119902

trace 119879119879119878119887119879

subject to 119879119879119878119908119879 = 119868

(35)

This implies that LDA tries to maximize the between-class scatter and simultaneously constraint the within-classscatter to a certain level However such restriction is hardto constraint and no relaxation is imposed When the data isnot a single modal that is multimodal or unknown modalLDA often fails On the other hand benefiting from theflexible designing of affinity matrix119882 PD-LFDA gains morefreedom in (35) That is the separability of PD-LFDA will bemore distinct and the degree of freedom remains more than

8 Mathematical Problems in Engineering

Input HSI data samples119883 = 1199091 1199092 119909

119873 isin R119901times119873 the objective dimension to be embedded 119902

the nearest neighbor parameter 119870 (default 119870 equiv 7) and the test sample 119909119905isin R119901

Output Transformation matrix 119879 isin R119901times119902Steps are as follows

(1) Initialize matrices(2) 119878119908larr 0119901times119901

within-class scatter(3) 119878119887larr 0119901times119901

between-class scatter(4)(5) Compute within-class affinity matrix119882119908(6) for 119888 = 1 2 119862 do in a classwise manner(7) 119909

119888

119894119873119888

119894=1larr 119909

119895| 119897119895= 119888 the 119888th class data samples

(8) 119883 larr 119909119888

1 119909119888

2 119909

119888

119873119888 sample matrix

(9) 119882119894=(1119873119888times111times119873119888)

119873119888

(10)(11) Determine the local scaling(12) for 119894 = 1 2 119873119888 do(13) 119909

(119894)

119896larr the 119896th nearest neighbor of 119909119888

119894 119896 = 1 2 119870

(14) for 119896 = 1 2 119870 do(15) 119889

(119894)

119896= 119909119894minus 119909(119894)

1198962

(16) end for

(17) (119894)larr1

119870

119870

sum

119896=1

10038171003817100381710038171003817119909119894minus 119909(119894)

119896

10038171003817100381710038171003817

2

(18) 120588(119894)= radic

1

119870

119870

sum

119896=1

(119889(119894)

119896minus (119894))2

(19) end for(20)(21) Define local affinity matrix(22) for 119894 119895 = 1 2 119873119888 do(23) 119901(119897

119894) larr 119873

119888119873 prior probability

(24) 119860119894119895larr 119901(119897

119894) exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

(25) end for(26) 119860

119888= 119860

(27) end for(28)119882

119908= diag119882

11198822 119882

119862 in block diagonal manner

(29) 119860119908= diag119860

1 1198602 119860

119862 also in block diagonal manner

(30) for 119894 119895 = 1 2 119873 do(31)

119908(119894 119895) larr 119860

119908(119894 119895)119882

119908(119894 119895)

(32) end for(33)(34) Compute between-class affinity matrix119882119887

(35)119882119887larr(1119873times1

11times119873)

119873minus diag119882

11198822 119882

119862

(36) Let 119865nz denotes the nonzero flag of elements in119882119908 119865nz(119894 119895) = 1 if119882119908(119894 119895) = 0 119865nz(119894 119895) = 0 if119882119908(119894 119895) = 0

(37) 119865nz larr (1198821015840

119887= 0)

(38) 119860119887larr 0119873times119873

(39) 119860

119887(119865nz) = 119860119908(119865nz)

(40) 119860119887(not119865nz) = 1

(41) for 119894 119895 = 1 2 119873 do(42)

119887(119894 119895) larr 119860

119887(119894 119895)119882

119887(119894 119895)

(43) end for(44) Now construct Laplacian matrix for within affinity matrix

119908and between affinity matrix

119887

(45) Let(46)119863119908

119894119894= sum

119895

119908(119894 119895)119863119887

119894119894= sum

119895

119887(119894 119895)

(47) Then

Algorithm 2 Continued

Mathematical Problems in Engineering 9

(48) 119871119908= 119908minus 119863119908 119871119887= 119887minus 119863119887

(49) Construct two matrixs below(50) 119878

119887= 119883119871

119887119883119879 119878119908= 119883119871

119908119883119879

(51) Let 1205931 1205932 120593

119902 be the general eigenvector of

(52) 119878119887120593119894= 119894119878119908120593119894 forall119894 isin 1 2 119902

(53) with the corresponding eigenvalue in descending order 1ge 2ge sdot sdot sdot ge

119902

(54)(55) Finally the transformation matrix can be represented as(56) 119879 = radic120582

11205931 radic12058211205932 radic120582

1120593119902 isin R119901times119902

(57)(58) For a new test sample 119909

119905 the embedding 119911

119905is given by

(59) 119911119905= 119879119879119909119905isin R119902

Algorithm 2 Proposed PD-LFDA method

the conventional LDA thus our method is expected to bemore robust and significantly preponderant

For large scale data sets we discuss a scheme that canaccelerate the computation procedure of the within-scattermatrix 119878

119908 In our algorithm owning to the fact that we have

put penalty on the affinity matrix for different class samplesin constructing the between-scatter matrix the acceleratedprocedure will remain for further discussion

The within-class scatter 119878119908can be reformulated as

119878119908=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894119909119879

119894+ 119909119895119909119879

119895minus 119909119894119909119879

119895minus 119909119895119909119879

119894)

=

119873

sum

119894=1

(

119873

sum

119895=1

119908(119894 119895))119909

119894119909119879

119894minus

119873

sum

119894119895=1

119908(119894 119895) 119909

119894119909119879

119895

= 119883(119863119908minus 119908)119883119879

= 119883119908119883119879

(36)

Here

119863119908 (119894 119894) =

119873

sum

119895=1

119908(119894 119895)

119908= 119863119908minus 119908

(37)

119908can be block diagonal if all samples 119909

119894119873

119894=1are sorted

according to their labels This property implies that 119863119908and

119908can also be block diagonal matrix Hence if we compute

119878119908

through (36) then the procedure will be much moreefficient Similarly 119878

119887can also be formulated as

119878119887= 119883

119887119883119879

= 119883(119863119887minus 119887)119883119879

(38)

Nevertheless 119887is dense and can not be further sim-

plified However the simplified computational procedure of119908

saves for us part of time in a way In this paper weadopt the above procedure to accelerate 119878

119908and pursue 119878

119887

normally In addition to locality structure some papers showthat another property for example marginal information isalso important and should be preserved in the reduced spaceThe theory of extended LDA and LPP algorithm is developedrapidly recently Yan et al [33] summarized these algorithmsin a graph embedding framework and also proposed amarginal fisher analysis embedding (MFA) algorithm underthis framework

In MFA the criterion is characterized by intraclasscompactness and interclass marginal superability which isreplaced for thewithin-class scatter and between-class scatterseverally The intraclass relationship is reflected by an intrin-sic graph which is constructed by 119870-nearest neighborhoodsample data points in the same class while the interclasssuperability is mirrored by a penalty graph computed formarginal points from different classes Following this ideathe intraclass compactness is given as follows

119878119894= sum

119894119895 119894isin119873(119896)(119895)or 119895isin119873(119896)(119894)

10038171003817100381710038171003817119879119879119909119894minus 119879119879119909119895

10038171003817100381710038171003817

2

= 2119879119879119883 (119863 minus119882)119883

119879119879

(39)

where

119882(119894 119895) = 1 if 119894 isin 119873(119896) (119895) or 119895 isin 119873(119896) (119894) 0 otherwise

(40)

Here 119873(119896)(119895) represents the 119870-nearest neighborhood indexset of 119909

119895from the same class and119863 is the row sum (or column

sum) of 119882 119863(119894 119894) = sum119894119882119894119895 Interclass separability is indi-

cated by a penalty graph whose term is expressed as follows

119878119890= sum

119894119895 (119894119895)isin119875(119896)(119897119895)

or (119894119895)isin119875(119896)(119897119894)

10038171003817100381710038171003817119879119879119909119894minus 119879119879119909119895

10038171003817100381710038171003817

2

= 2119879119879119883(119863 minus )119883

119879119879

(41)

10 Mathematical Problems in Engineering

of which

(119894 119895) = 1 if (119894 119895) isin 119875(119896) (119897

119895) or (119894 119895) isin 119875(119896) (119897

119894)

0 otherwise(42)

Note that 119878119894and 119878

119890are corresponding to ldquowithin-scatter

matrixrdquo and ldquobetween-scatter matrixrdquo of the traditional LDAalternatively The optimal solution of MFA can be achievedby solving the following minimization problem That is

= arg119879

min 119879119879119883(119863 minus119882)119883

119879119879

119879119879119883(119863 minus )119883119879119879

(43)

We know that (43) is also a general eigenvalue decompositionproblem Let 119879PCA indicate the transformation matrix fromthe original space to PCA subspace with certain energyremaining and then the final projection of MFA is output as

119879MFA = 119879PCA (44)

As can be seenMFA constructs twoweightedmatrices119882and according to the intraclass compactness and interclassseparability In LFDA and PD-LFDA only one affinity isconstructed The difference lies in that the ldquoweightrdquo in LFDAand PD-LFDA is in the range of [0 1] according to the levelof difference Yet MFA distributes the same weight to its 119870-nearest neighborhoodsThe optimal solution ofMFA LFDAand PD-LFDA can be attributed to a general eigenvaluedecomposition problem Hence the idea of MFA LFDA andPD-LFDA is approximately similar in a certain interpretationRelationship with other methodologies can be analyzed in ananalogous way

4 Experimental Results

To illustrate the performance of PD-LFDA experimentson a real hyperspectral remote sensing image data setmdashAVIRIS Indian Pine 1992 are conducted in this sectionThe AVIRIS Indian Pines 1992 data set was gathered byNational Aeronautics and Space Administrator (NASA) withAirborne VisibleInfrared Imaging Spectrometer (AVIRIS)sensor over the Indian Pine test site in northwest Indians inJune 1992 This data set consists of 145 times 145 pixels and 224spectral reflectance bands ranging from 04 120583m to 245 120583mwith a spatial resolution of 20m The Indian Pines scene iscomposed of two-third agriculture and one-third forest orother natural perennial vegetation Some other landmarkssuch as dual lane highways rail line low density housing andsmaller roads are also in this image Since the scene was takenin June some main crops for example soybeans and cornare in their early growth stage with less than 5 coveragewhile the no-till min-till and clean-till indicate the amountof previous crop residue remaining The region map can bereferred to Figure 5(a) The 20 water absorption bands (ie[108ndash112 154ndash167] 224) were discarded

In this section performance of different dimensionreduction methods that is PCA LPP LFDA LDA JGLDAand RP [34] is compared with the proposed PD-LFDA

Table 1 Training set in AVIRIS Indian Pines 1992 database

ID Class Name1 Alfalfa (4618 3913)2 Corn-notill (1428136 952)3 Corn-mintill (83087 1048)4 Corn (23734 1434)5 Grass-pasture (48354 1118)6 Grass-trees (73071 973)7 Grass-pasture-mowed (2817 6071)8 Hay-windrowed (47850 1046)9 Oats (2015 7500)10 Soybean-notill (97286 884)11 Soybean-mintill (2455214 872)12 Soybean-clean (59354 911)13 Wheat (20528 1366)14 Woods (1265102 806)15 Buildings-Grass-Trees-Drives (38639 1010)16 Stone-Steel-Towers (9324 2581)Total 102491029 1004Numerical value in this table refers to number of samples number oftraining samples and pc respectively

Classification accuracy is reported via the concrete classifiersGenerally many dimension reduction research papers adopt119870-nearest neighborhood classifier (KNN) and support vectormachine (SVM) classifier to measure the performance of theextracting features after the dimension reduction where theoverall accuracy and kappa coefficient are detailed in thereports Hereby in this paper we also adopt KNN classifierand SVM classifier for performance measurement In KNNclassifier we select the value of 119870 as 1 5 and 9 so thatthree classifiers based on nearest neighborhoods are formedwhich are called 1NN 5NN and 9NN In SVM classifier weseek a hyperplane to separate classes in kernel-induced spacewhere the linear nonseparable classes in the original featurespace can be separated via kernel trick SVM as a robustand successful classifier has been widely used to evaluatethe performance of multifarious methods in many areas Forsimplicity and convenience we use LIBSVM package [35] forexperiments Accuracy of dimension reduced performancewill be reported by classified performance from SVM clas-sifier In the following schedule the feature subspace will becalculated at the first step from training samples by differentdimensional algorithms Table 1 gives a numerical statisticsof training samples corresponding to each class Then thenew sample will be projected into a low subspace by thetransformedmatrix Finally all the new samples are classifiedby SVM classifier

In this experiment a total of 1029 samples were selectedfor training and the remaining samples are used for testingNote that all the labeled samples in database are unbalancedand the available samples of each category differ dramaticallyThe following strategy is imposed for sample division A fixednumber of 15 samples are randomly selected to form thetraining sample yet the absent samples are randomly selected

Mathematical Problems in Engineering 11

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(b) 5NN

7 9 11 13 15Reduced space

0

01

02

03

04

05

06

07

08

09

1

Ove

rall

accu

racy

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(d) Liner SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF-SVM

Figure 2 Overall accuracy by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

12 Mathematical Problems in Engineering

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(b) 5NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(d) Linear SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF SVM

Figure 3 Kappa coefficient by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

Mathematical Problems in Engineering 13

(a) Pseudo 3-channel color image band = [12 79 140]

1

2

2

2

2

2 2

3

3

3

3

3

4

5

5

5

5

6

6 6 6

7

8

9

10

10

10 10

11

11

11

11

11 12

12 12

13 14

14

14 15

15

16

(b) Ground truth

(c) Distribution of training sample

Figure 4 Illustration of sample partition

from the remaining samples Under this strategy the trainingsamples and testing samples are listed in Table 1

Figure 2 shows the overall accuracy of different dimen-sion reduction methods applied to AVIRIS Indian 92AV3Cdata set The neighborhood of KNN classifier is selected as 15 and 9 respectively which produces three classifiers thatis 1NN 5NN and 9NN Three different kernel functionsare adopted for SVM classifier Then derived classifiers arealso used in this experiment that is linear SVM polynomialSVM and RBFSVM It can be deduced from Figures 2(a)sim2(c) that when the embedding space is greater than 5 pro-posed PD-LFDA performs the best while JGLDA performsthe worst The results produced by RP is slightly better thanJGLDA PCA LDA LPP and RP show the similar classifiedresults under KNN classifierThat is the proposed PD-LFDAoutperforms the others Meantime compared with LFDAthe proposed PD-LFDA leads to 2 more improvementson average Moreover it can be observed from (d) that theclassified accuracy increases steadily as the embedded spaceincreases However LDA demonstrates the highest overallaccuracywhen the reduced features vary to 9 while LFDAhas

the significant improvements when the number of reducedfeatures is greater than 9 This phenomenon of Figure 2(d)indicates the instability of linear SVM Nevertheless thesituation reversed for polynomial SVM and RBF SVM inFigures 2(e) and 2(f) wherein the proposed PD-LFDAwins a little improvement against LFDA and has significantimprovement compared with the others Inspired effects ofproposed PD-LFDA algorithm were achieved in all casesFurthermore Table 2 gives the detailed overall accuracyunder different feature dimensions using 3NN 7NN andRBFSVM classifiers which validates the feasibility of theproposed scheme in this paper

Figure 3 displays the results of kappa coefficient obtainedusing the different dimension reduction algorithms underKNN classifier and SVM classifiers The experimental cir-cumstance of Figure 3 is same as that of Figure 2 We canfind that from these results JGLDA performs the worst inmost cases except in Figure 3(e) The proposed PD-LFDAmethod outperforms the other methods and achieves thehighest kappa numerical value in most cases except usingthe linear SVM as the classifier In fact none of them work

14 Mathematical Problems in Engineering

686864717360

(a) PCA-7NN

687564707607

(b) LPP-7NN

792376538609

(c) LFDA-7NN

739070418130

(d) LDA-7NN

619556516209

(e) JGLDA-7NN

670962947164

(f) RP-7NN

799277288570

(g) PCA-RBFSVM

767473558288

(h) LPP-RBFSVM

837581518822

(i) LFDA-RBFSVM

728769288071

(j) LDA-RBFSVM

582453777121

(k) JGLDA-RBFSVM

763173278365

(l) RP-RBFSVM

Figure 5 Classified map generated by different dimension reduction methods where the overall accuracy kappa coefficient and averageclassification accuracy are listed at the top of each map respectively

steady in the case of linear SVM (Figure 3(d)) Note that thesituation is improved in polynomial SVM where the kappanumerical value of the proposed PD-LFDA is significantlybetter than the others All the achievements demonstrate therobustness of our contribution in PD-LFDA Simultaneouslyit is noticeable that LPP exhibits an average kappa levelThe kappa value gained by LPP is not seriously bad and isnot dramatically good The kappa results produced by RPare approximately the same as LPP A significant advantageof RP is the simple construction and computation wherethe accuracy is closed to LPP More details are summarizedin Table 3 It can be concluded that the kappa coefficientof proposed algorithm is higher than the other approacheswhich is more appropriate for the classification of HSI data

The visual results of all methods are presented in Figures5sim6 where the class labels are converted to pseudocolor

image The pseudocolor image of the hyperspectral imagefrom Indian 92AV3C database is shown in Figure 4(a) Theavailable labeled image which represents the truth groundis illustrated in Figure 4(b) where the labels are made byhuman The training samples are selected from the labeledimage represented as points in the image as shown inFigure 4(c) Each label number (ID) corresponds to eachclass name which is indexed in Table 1 In this experimentall the available labeled samples are used for testing whileapproximate 10 of samples are used for training Thesubspace is fixed to 13 (the number here is only used forreference it can be changed) For each experiment thedimension from original feature space is reduced to theobjective dimensionality thereafter the classified maps areinduced by 7NN classifier and RBF-SVM classifier Theoverall accuracy kappa coefficient and average accuracy are

Mathematical Problems in Engineering 15

837981698991

(a) PD-LFDA-7NN

848682798968

(b) D-LFDA-RBFSVM

Figure 6 Classified map of the proposed method

Table 2 Overall accuracy obtained using different dimensions offeature and different classifiers for Indian Pine Scene database ()

Item Dims7 9 11 13 15

3NNPCA 7007 6996 7026 7035 7087LPP 6575 6871 6835 6695 6546LFDA 7513 7895 7963 7992 8009LDA 6415 6606 6616 6596 6536PD-LFDA 7713 8094 816 8218 8263JGLDA 5405 5602 5704 5642 5751RP 6173 6444 6745 6632 6652

7NNPCA 6902 6902 691 6933 6972LPP 6782 7057 7027 6892 6681LFDA 7419 7777 7827 7863 7862LDA 6615 6909 6899 692 6882PD-LFDA 7736 809 8155 8172 8235JGLDA 5665 5811 5809 5812 5914RP 6194 6481 6696 6613 6603

RBF-SVMPCA 8051 7794 7718 7935 8176LPP 7101 7514 7614 7501 7314LFDA 7826 8187 82 8113 7915LDA 677 6834 6877 6642 6792PD-LFDA 7858 8267 8395 8423 8166JGLDA 5686 5606 5663 5689 5879RP 6957 7282 7441 7416 7543

also included at the top of each map respectively Figure 5displays the classifiedmaps of classicmethods in pseudocolorimages and the classified maps of proposed PD-LFDA arepresented in Figure 6 It can be observed from these mapsthat the best performance is achieved by PD-LFDA when7NN classifier is used in this case the overall accuracyis 8379 the kappa coefficient is 8169 and the averageaccuracy is 8991 Moreover the worst algorithm is JGLDAwhose overall accuracy is 6195 the kappa coefficient

Table 3 Kappa coefficient by different dimension reduction meth-ods and different classifiers applied to Indian Pine Scene database()

Item Dims7 9 11 13 15

3NNPCA 6592 658 6612 6624 6681LPP 6084 6422 6382 6219 605LFDA 7145 7586 7664 7697 7718LDA 5913 6126 6137 6118 6048PD-LFDA 7392 7824 7901 7966 8016JGLDA 4778 4992 5082 5006 5119RP 564 5963 6303 6172 6192

7NNPCA 6466 6466 6475 6501 6545LPP 6304 6629 6585 6429 6185LFDA 7047 7457 7515 7557 7557LDA 6132 646 6453 6478 6431PD-LFDA 7417 782 7893 7913 7985JGLDA 5029 519 5161 5159 5263RP 5638 5992 6237 6135 6124

RBF-SVMPCA 7767 747 7393 7636 7900LPP 6677 715 7264 7115 6921LFDA 7486 7915 7924 7822 7584LDA 629 6362 6427 6153 6323PD-LFDA 7551 8011 8159 8187 7899JGLDA 5109 5066 5133 5147 5369RP 653 6908 7085 706 7203

becomes 5651 and the average accuracy is only 6209Other methods such as PCA LPP and RP can produce thecomparable results and no one can outperform the otherHowever LDAoutperformsPCA LPP andRP yielding betterresult Similar conclusions can be achieved in the groupof RBF-SVM Generally proposed PD-LFDA significantlyoutperforms the rest in this experiment which indicates thecorrectness of improvements in proposed PD-LFDA

16 Mathematical Problems in Engineering

Table 4 Performance of dimension reduction on the whole labeled samples ()

Evaluated item MethodologiesPCA LDA LPP LFDA JGLDA RP PD-LFDA

7NNOverall accuracy 6868 739 6875 7923 6195 6709 8379Kappa coefficient 6471 7041 647 7653 5651 6294 8169Average accuracy 737 813 7607 8609 6209 7164 8991

RBF-SVMOverall accuracy 7992 7287 7674 8375 5824 7631 8486Kappa coefficient 7728 6928 7355 8151 5377 7327 8279Average accuracy 857 8071 8288 8822 7121 8365 8968

Finally details of assessment deduced by 7NN and RBF-SVM is summarized in Table 4 where the correspondingoverall accuracy kappa coefficient and average accuracy ofdifferent methods can be located collectively

5 Conclusions

In this paper we have analyzed local Fisher discriminantanalysis (LFDA) and found its weakness By replacing themaximum distance with local variance for the constructionof weight matrix introducing class prior probability into thecomputation of affinitymatrix an improved LFDA algorithmhas been proposed This novel approach is called PD-LFDAbecause the probability distribution (PD) is applied in LFDAalgorithm The proposed approach essentially can increasethe discriminant ability of transformed features in lowdimensional space The pattern found by the new approachis expected to be more accurate and coincides with thecharacter of HSI data and is conducive to classify HSI dataPD-LFDA has been evaluated on a real removing sensingAVIRIS Indian Pine 92AV3C data set We have compared theperformance of the proposed PD-LFDA with that of PCALPP LFDA LDA JGLDA and RP Both the numerical resultsand visual inspection of class maps have been obtained Inthe experiments KNN classifier and SVM classifier havebeen used We have argued that the proposed PD-LFDAexhibits the best performance and serves as a very effectivedimensionality reduction tool for high dimensional datasuch as hyperspectral image (HSI) data

Appendix

Procedure of Proposed Algorithm

The brief description of the algorithm to perform the pro-posed PD-LFDA method is already presented in Section 3The details of the algorithm are provided in Algorithm 2

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the Research of University ofMacau under Grants no MYRG205(Y1-L4)-FST11-TYY noMYRG187(Y1-L3)-FST11-TYY and no SRG010-FST11-TYYand by the National Natural Science Foundation of Chinaunder Grant no 61273244 This research project was alsosupported by the Science andTechnologyDevelopment Fund(FDCT) of Macau under Contract no 100-2012-A3

References

[1] W Li S Prasad Z Ye J E Fowler and M Cui ldquoLocality-preserving discriminant analysis for hyperspectral image clas-sification using local spatial informationrdquo in Proceedings ofthe 32nd IEEE International Geoscience and Remote SensingSymposium (IGARSS rsquo12) pp 4134ndash4137MunichGermany July2012

[2] HND LeM S Kim andD-H Kim ldquoComparison of singularvalue decomposition and principal component analysis appliedto hyperspectral imaging of biofilmrdquo in Proceedings of the IEEEPhotonics Conference (IPC rsquo12) pp 6ndash7 2012

[3] C K Chui and J Wang ldquoRandomized anisotropic transformfor nonlinear dimensionality reductionrdquo GEMmdashInternationalJournal on Geomathematics vol 1 no 1 pp 23ndash50 2010

[4] T V Bandos L Bruzzone and G Camps-Valls ldquoClassificationof hyperspectral images with regularized linear discriminantanalysisrdquo IEEE Transactions on Geoscience and Remote Sensingvol 47 no 3 pp 862ndash873 2009

[5] D Guangjun Z Yongsheng and J Song ldquoDimensionalityreduction of hyperspectral data based on ISOMAP algorithmrdquoin Proceedings of the 8th International Conference on ElectronicMeasurement and Instruments (ICEMI rsquo07) pp 3935ndash3938Xirsquoan China August 2007

[6] X Luo and M-F Jiang ldquoThe application of manifold learn-ing in dimensionality analysis for hyperspectral imageryrdquo inProceedings of the International Conference on Remote SensingEnvironment and Transportation Engineering (RSETE rsquo11) pp4572ndash4575 June 2011

[7] J Khodr and R Younes ldquoDimensionality reduction on hyper-spectral images a comparative review based on artificial datasrdquoin Proceedings of the 4th International Congress on Image andSignal Processing (CISP rsquo11) vol 4 pp 1875ndash1883 October 2011

Mathematical Problems in Engineering 17

[8] J Wen Z Tian H She and W Yan ldquoFeature extraction ofhyperspectral images based on preserving neighborhood dis-criminant embeddingrdquo in Proceedings of the 2nd InternationalConference on Image Analysis and Signal Processing (IASP rsquo10)pp 257ndash262 Zhejiang China April 2010

[9] Y-R Yeh S-Y Huang and Y-J Lee ldquoNonlinear dimensionreduction with kernel sliced inverse regressionrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 21 no 11 pp1590ndash1603 2009

[10] J He L Zhang Q Wang and Z Li ldquoUsing diffusion geometriccoordinates for hyperspectral imagery representationrdquo IEEEGeoscience and Remote Sensing Letters vol 6 no 4 pp 767ndash7712009

[11] J Peng P Zhang and N Riedel ldquoDiscriminant learninganalysisrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 38 no 6 pp 1614ndash1625 2008

[12] F S Tsai and K L Chan ldquoDimensionality reduction techniquesfor data explorationrdquo in Proceedings of the 6th InternationalConference on Information Communications and Signal Process-ing pp 1ndash5 December 2007

[13] M D Farrell Jr and R M Mersereau ldquoOn the impact of PCAdimension reduction for hyperspectral detection of difficulttargetsrdquo IEEE Geoscience and Remote Sensing Letters vol 2 no2 pp 192ndash195 2005

[14] S Prasad and L M Bruce ldquoLimitations of principal com-ponents analysis for hyperspectral target recognitionrdquo IEEEGeoscience and Remote Sensing Letters vol 5 no 4 pp 625ndash629 2008

[15] J Yu Q Tian T Rui and T S Huang ldquoIntegrating discriminantand descriptive information for dimension reduction and clas-sificationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 17 no 3 pp 372ndash377 2007

[16] J Kong S Wang J Wang L Ma B Fu and Y Lu ldquoAnovel approach for face recognition based on supervisedlocality preserving projection and maximummargin criterionrdquoin Proceedings of the International Conference on ComputerEngineering and Technology (ICCET rsquo09) vol 1 pp 419ndash423Singapore January 2009

[17] M Loog and R P W Duin ldquoLinear dimensionality reductionvia a heteroscedastic extension of LDA the Chernoff criterionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 26 no 6 pp 732ndash739 2004

[18] A C Jensen A Berge and A S Solberg ldquoRegressionapproaches to small sample inverse covariance matrix estima-tion for hyperspectral image classificationrdquo IEEE TransactionsonGeoscience andRemote Sensing vol 46 no 10 pp 2814ndash28222008

[19] J Jin BWang and L Zhang ldquoA novel approach based on Fisherdiscriminant null space for decomposition of mixed pixels inhyperspectral imageryrdquo IEEE Geoscience and Remote SensingLetters vol 7 no 4 pp 699ndash703 2010

[20] J Wen Z Tian X Liu and W Lin ldquoNeighborhood preservingorthogonal pnmf feature extraction for hyperspectral imageclassificationrdquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 6 no 2 pp 759ndash7682013

[21] Y Ren G Zhang G Yu and X Li ldquoLocal and global structurepreserving based feature selectionrdquoNeurocomputing vol 89 pp147ndash157 2012

[22] Z Fan Y Xu and D Zhang ldquoLocal linear discriminant analysisframework using sample neighborsrdquo IEEE Transactions onNeural Networks vol 22 no 7 pp 1119ndash1132 2011

[23] Y Wang S Huang D Liu and B Wang ldquoResearch advanceon band selection-based dimension reduction of hyperspectralremote sensing imagesrdquo in Proceedings of the 2nd InternationalConference on Remote Sensing Environment and TransportationEngineering (RSETE rsquo12) pp 1ndash4 IEEE Nanjing China June2012

[24] B Waske S van der Linden J A Benediktsson A Rabe andP Hostert ldquoSensitivity of support vector machines to randomfeature selection in classification of hyperspectral datardquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 7pp 2880ndash2889 2010

[25] M Sugiyama ldquoLocal fisher discriminant analysis for superviseddimensionality reductionrdquo in Proceedings of the 23rd Interna-tional Conference onMachine Learning (ICML rsquo06) pp 905ndash912ACM June 2006

[26] Y Cheng ldquoMean shift mode seeking and clusteringrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol17 no 8 pp 790ndash799 1995

[27] W Li S Prasad J E Fowler and L M Bruce ldquoLocality-preserving dimensionality reduction and classification forhyperspectral image analysisrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 50 no 4 pp 1185ndash1198 2012

[28] L Zelnik-Manor and P Perona ldquoSelf-tuning spectral cluster-ingrdquo in Proceedings of the 18th Annual Conference on NeuralInformation Processing Systems pp 1601ndash1608 December 2004

[29] W K Wong and H T Zhao ldquoSupervised optimal localitypreserving projectionrdquo Pattern Recognition vol 45 no 1 pp186ndash197 2012

[30] X He and P Niyogi ldquoLocality preserving projectionsrdquo inAdvances in Neural Information Processing Systems 16 SThrunL Saul and B Scholkopf Eds MIT Press Cambridge MassUSA 2004

[31] H Wang S Chen Z Hu and W Zheng ldquoLocality-preservedmaximum information projectionrdquo IEEE Transactions on Neu-ral Networks vol 19 no 4 pp 571ndash585 2008

[32] M Sugiyama ldquoDimensionality reduction ofmultimodal labeleddata by local fisher discriminant analysisrdquo Journal of MachineLearning Research vol 8 pp 1027ndash1061 2007

[33] S Yan D Xu B Zhang H J Zhang Q Yang and S Lin ldquoGraphembedding and extensions a general framework for dimen-sionality reductionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 40ndash51 2007

[34] C K Chui and J Wang ldquoDimensionality reduction of hyper-spectral imagery data for feature classificationrdquo in Handbookof Geomathematics pp 1005ndash1047 Springer Heidelberg Ger-many 2010

[35] C-C Chang and C-J Lin ldquoLIBSVM a Library for supportvector machinesrdquo ACM Transactions on Intelligent Systems andTechnology vol 2 no 3 article 27 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Subspace Learning via Local Probability ...downloads.hindawi.com/journals/mpe/2015/145136.pdf · and applicable toolkits were engendered one a er another. Hyperspectral

Mathematical Problems in Engineering 7

Input HSI training samples119883 isin R119901times119873 dimensionality to be embedded 119902 the parameter 119870 of 119870NNand a test sample 119909

119905isin R119901

Step 1 For each sample 119909119894from the same class calculate the119882

119894119895by (20)

where the local scaling factor 120588119894is calculated via (26) or (27)

Step 2 Equations (13) and (14) can be globally and uniformly transformed into an equivalent formula via119908= 119882 sdot 119882

1

119887= 119882 sdot (119882

2minus1198821)

(i)

where the operator 119860 sdot 119861 denotes the dot product between 119860 and 119861 and

119882(119894 119895) =

119901(119897119894)2 exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

if 119897119894= 119897119895= 119888

if 119897119894= 119897119895

0

(iia)

1198821=

1

119873119888 if 119897

119894= 119897119895= 119888

0 others(iib)

1198822=1

119873(1119873times1

11times119873) (iic)

By the above formulas the product of elements in different matrices can be achieved via dot productbetween matrices The equations (iia) (iib) and (iic) can be gained by integrating the number ofeach class119873119888 the number of total training samples119873 and the local scaling 120588

119894 then matrices119882119882

11198822

can be calculatedStep 3 Construct within-scatter matrix

119908and between-scatter matrix

119887 according to (i)

Step 4 Define Laplacian matrices 119871lowast below119871lowast= 119863lowastminus lowast (iii)

where119863lowast is the row sum or column sum of119882lowast119863lowast119894119894= sum119895lowast(119894 119895) (or119863lowast

119894119894= sum119894lowast(119894 119895)) and the

notation lowast denotes one letter in 119908 119887Step 5 On the basis of (29) and (30) the transformation matrix can be achieved viaeigenvectors 119879 = radic

11205931 radic21205932 radic

1199021120593119902 119879 isin R119901times119902 that corresponding the 119902 leading

eigenvalues 119887120593119894= 119894119908120593119894in solving the general problem of

119887120593119894= 119894119908120593119894

Step 6 For a testing sample 119909119905isin R119901 the extracted feature is 119911

119905= 119879119879119909119905isin R119902

Output Transformation matrix 119879 and the extracted feature 119911119905

Algorithm 1 PD-LFDA Algorithm

Thereby

rank (119878119887) ⩽ rank ([119873

1(119898119897minus 119898) 119873

2(119898119897minus 119898)

119873119871(119898119897minus 119898)]) ⩽ 119862 minus 1

(33)

It is easy to infer that the rank of the between-class scattermatrix 119878

119887is119862minus1 atmost thus there are up to119862minus1meaningful

subfeatures that can be extracted Thanks to the help ofaffinity matrix 119882 when compared with the conventionalLDA the reduced subspace of proposed PD-LFDA can beany subdimensional space On the other hand the classicallocal fisherrsquos linear discriminant only weights the value ofsample pairs in the same classes while our method also takesin account the sample pairs in different classes Hereafter theproposedmethod will bemore flexible and the results will bemore adaptiveThe objective function of proposed method isquite similar to the conventional LDA hereby the optimalsolution is almost same as the conventional LDA whichindicates that it is also simple to implement and easy to revise

To further explore the relationship of LDA and PD-LFDAwe now rewrite the objective function of LDAandPD-LFDA respectively

119879LDA = arg max119879isinR119901times119902

trace 119879119879119878119887119879

subject to 119879119879119878119908119879 = 119868

(34)

119879PD-LFDA equiv arg max119879isinR119901times119902

trace 119879119879119878119887119879

subject to 119879119879119878119908119879 = 119868

(35)

This implies that LDA tries to maximize the between-class scatter and simultaneously constraint the within-classscatter to a certain level However such restriction is hardto constraint and no relaxation is imposed When the data isnot a single modal that is multimodal or unknown modalLDA often fails On the other hand benefiting from theflexible designing of affinity matrix119882 PD-LFDA gains morefreedom in (35) That is the separability of PD-LFDA will bemore distinct and the degree of freedom remains more than

8 Mathematical Problems in Engineering

Input HSI data samples119883 = 1199091 1199092 119909

119873 isin R119901times119873 the objective dimension to be embedded 119902

the nearest neighbor parameter 119870 (default 119870 equiv 7) and the test sample 119909119905isin R119901

Output Transformation matrix 119879 isin R119901times119902Steps are as follows

(1) Initialize matrices(2) 119878119908larr 0119901times119901

within-class scatter(3) 119878119887larr 0119901times119901

between-class scatter(4)(5) Compute within-class affinity matrix119882119908(6) for 119888 = 1 2 119862 do in a classwise manner(7) 119909

119888

119894119873119888

119894=1larr 119909

119895| 119897119895= 119888 the 119888th class data samples

(8) 119883 larr 119909119888

1 119909119888

2 119909

119888

119873119888 sample matrix

(9) 119882119894=(1119873119888times111times119873119888)

119873119888

(10)(11) Determine the local scaling(12) for 119894 = 1 2 119873119888 do(13) 119909

(119894)

119896larr the 119896th nearest neighbor of 119909119888

119894 119896 = 1 2 119870

(14) for 119896 = 1 2 119870 do(15) 119889

(119894)

119896= 119909119894minus 119909(119894)

1198962

(16) end for

(17) (119894)larr1

119870

119870

sum

119896=1

10038171003817100381710038171003817119909119894minus 119909(119894)

119896

10038171003817100381710038171003817

2

(18) 120588(119894)= radic

1

119870

119870

sum

119896=1

(119889(119894)

119896minus (119894))2

(19) end for(20)(21) Define local affinity matrix(22) for 119894 119895 = 1 2 119873119888 do(23) 119901(119897

119894) larr 119873

119888119873 prior probability

(24) 119860119894119895larr 119901(119897

119894) exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

(25) end for(26) 119860

119888= 119860

(27) end for(28)119882

119908= diag119882

11198822 119882

119862 in block diagonal manner

(29) 119860119908= diag119860

1 1198602 119860

119862 also in block diagonal manner

(30) for 119894 119895 = 1 2 119873 do(31)

119908(119894 119895) larr 119860

119908(119894 119895)119882

119908(119894 119895)

(32) end for(33)(34) Compute between-class affinity matrix119882119887

(35)119882119887larr(1119873times1

11times119873)

119873minus diag119882

11198822 119882

119862

(36) Let 119865nz denotes the nonzero flag of elements in119882119908 119865nz(119894 119895) = 1 if119882119908(119894 119895) = 0 119865nz(119894 119895) = 0 if119882119908(119894 119895) = 0

(37) 119865nz larr (1198821015840

119887= 0)

(38) 119860119887larr 0119873times119873

(39) 119860

119887(119865nz) = 119860119908(119865nz)

(40) 119860119887(not119865nz) = 1

(41) for 119894 119895 = 1 2 119873 do(42)

119887(119894 119895) larr 119860

119887(119894 119895)119882

119887(119894 119895)

(43) end for(44) Now construct Laplacian matrix for within affinity matrix

119908and between affinity matrix

119887

(45) Let(46)119863119908

119894119894= sum

119895

119908(119894 119895)119863119887

119894119894= sum

119895

119887(119894 119895)

(47) Then

Algorithm 2 Continued

Mathematical Problems in Engineering 9

(48) 119871119908= 119908minus 119863119908 119871119887= 119887minus 119863119887

(49) Construct two matrixs below(50) 119878

119887= 119883119871

119887119883119879 119878119908= 119883119871

119908119883119879

(51) Let 1205931 1205932 120593

119902 be the general eigenvector of

(52) 119878119887120593119894= 119894119878119908120593119894 forall119894 isin 1 2 119902

(53) with the corresponding eigenvalue in descending order 1ge 2ge sdot sdot sdot ge

119902

(54)(55) Finally the transformation matrix can be represented as(56) 119879 = radic120582

11205931 radic12058211205932 radic120582

1120593119902 isin R119901times119902

(57)(58) For a new test sample 119909

119905 the embedding 119911

119905is given by

(59) 119911119905= 119879119879119909119905isin R119902

Algorithm 2 Proposed PD-LFDA method

the conventional LDA thus our method is expected to bemore robust and significantly preponderant

For large scale data sets we discuss a scheme that canaccelerate the computation procedure of the within-scattermatrix 119878

119908 In our algorithm owning to the fact that we have

put penalty on the affinity matrix for different class samplesin constructing the between-scatter matrix the acceleratedprocedure will remain for further discussion

The within-class scatter 119878119908can be reformulated as

119878119908=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894119909119879

119894+ 119909119895119909119879

119895minus 119909119894119909119879

119895minus 119909119895119909119879

119894)

=

119873

sum

119894=1

(

119873

sum

119895=1

119908(119894 119895))119909

119894119909119879

119894minus

119873

sum

119894119895=1

119908(119894 119895) 119909

119894119909119879

119895

= 119883(119863119908minus 119908)119883119879

= 119883119908119883119879

(36)

Here

119863119908 (119894 119894) =

119873

sum

119895=1

119908(119894 119895)

119908= 119863119908minus 119908

(37)

119908can be block diagonal if all samples 119909

119894119873

119894=1are sorted

according to their labels This property implies that 119863119908and

119908can also be block diagonal matrix Hence if we compute

119878119908

through (36) then the procedure will be much moreefficient Similarly 119878

119887can also be formulated as

119878119887= 119883

119887119883119879

= 119883(119863119887minus 119887)119883119879

(38)

Nevertheless 119887is dense and can not be further sim-

plified However the simplified computational procedure of119908

saves for us part of time in a way In this paper weadopt the above procedure to accelerate 119878

119908and pursue 119878

119887

normally In addition to locality structure some papers showthat another property for example marginal information isalso important and should be preserved in the reduced spaceThe theory of extended LDA and LPP algorithm is developedrapidly recently Yan et al [33] summarized these algorithmsin a graph embedding framework and also proposed amarginal fisher analysis embedding (MFA) algorithm underthis framework

In MFA the criterion is characterized by intraclasscompactness and interclass marginal superability which isreplaced for thewithin-class scatter and between-class scatterseverally The intraclass relationship is reflected by an intrin-sic graph which is constructed by 119870-nearest neighborhoodsample data points in the same class while the interclasssuperability is mirrored by a penalty graph computed formarginal points from different classes Following this ideathe intraclass compactness is given as follows

119878119894= sum

119894119895 119894isin119873(119896)(119895)or 119895isin119873(119896)(119894)

10038171003817100381710038171003817119879119879119909119894minus 119879119879119909119895

10038171003817100381710038171003817

2

= 2119879119879119883 (119863 minus119882)119883

119879119879

(39)

where

119882(119894 119895) = 1 if 119894 isin 119873(119896) (119895) or 119895 isin 119873(119896) (119894) 0 otherwise

(40)

Here 119873(119896)(119895) represents the 119870-nearest neighborhood indexset of 119909

119895from the same class and119863 is the row sum (or column

sum) of 119882 119863(119894 119894) = sum119894119882119894119895 Interclass separability is indi-

cated by a penalty graph whose term is expressed as follows

119878119890= sum

119894119895 (119894119895)isin119875(119896)(119897119895)

or (119894119895)isin119875(119896)(119897119894)

10038171003817100381710038171003817119879119879119909119894minus 119879119879119909119895

10038171003817100381710038171003817

2

= 2119879119879119883(119863 minus )119883

119879119879

(41)

10 Mathematical Problems in Engineering

of which

(119894 119895) = 1 if (119894 119895) isin 119875(119896) (119897

119895) or (119894 119895) isin 119875(119896) (119897

119894)

0 otherwise(42)

Note that 119878119894and 119878

119890are corresponding to ldquowithin-scatter

matrixrdquo and ldquobetween-scatter matrixrdquo of the traditional LDAalternatively The optimal solution of MFA can be achievedby solving the following minimization problem That is

= arg119879

min 119879119879119883(119863 minus119882)119883

119879119879

119879119879119883(119863 minus )119883119879119879

(43)

We know that (43) is also a general eigenvalue decompositionproblem Let 119879PCA indicate the transformation matrix fromthe original space to PCA subspace with certain energyremaining and then the final projection of MFA is output as

119879MFA = 119879PCA (44)

As can be seenMFA constructs twoweightedmatrices119882and according to the intraclass compactness and interclassseparability In LFDA and PD-LFDA only one affinity isconstructed The difference lies in that the ldquoweightrdquo in LFDAand PD-LFDA is in the range of [0 1] according to the levelof difference Yet MFA distributes the same weight to its 119870-nearest neighborhoodsThe optimal solution ofMFA LFDAand PD-LFDA can be attributed to a general eigenvaluedecomposition problem Hence the idea of MFA LFDA andPD-LFDA is approximately similar in a certain interpretationRelationship with other methodologies can be analyzed in ananalogous way

4 Experimental Results

To illustrate the performance of PD-LFDA experimentson a real hyperspectral remote sensing image data setmdashAVIRIS Indian Pine 1992 are conducted in this sectionThe AVIRIS Indian Pines 1992 data set was gathered byNational Aeronautics and Space Administrator (NASA) withAirborne VisibleInfrared Imaging Spectrometer (AVIRIS)sensor over the Indian Pine test site in northwest Indians inJune 1992 This data set consists of 145 times 145 pixels and 224spectral reflectance bands ranging from 04 120583m to 245 120583mwith a spatial resolution of 20m The Indian Pines scene iscomposed of two-third agriculture and one-third forest orother natural perennial vegetation Some other landmarkssuch as dual lane highways rail line low density housing andsmaller roads are also in this image Since the scene was takenin June some main crops for example soybeans and cornare in their early growth stage with less than 5 coveragewhile the no-till min-till and clean-till indicate the amountof previous crop residue remaining The region map can bereferred to Figure 5(a) The 20 water absorption bands (ie[108ndash112 154ndash167] 224) were discarded

In this section performance of different dimensionreduction methods that is PCA LPP LFDA LDA JGLDAand RP [34] is compared with the proposed PD-LFDA

Table 1 Training set in AVIRIS Indian Pines 1992 database

ID Class Name1 Alfalfa (4618 3913)2 Corn-notill (1428136 952)3 Corn-mintill (83087 1048)4 Corn (23734 1434)5 Grass-pasture (48354 1118)6 Grass-trees (73071 973)7 Grass-pasture-mowed (2817 6071)8 Hay-windrowed (47850 1046)9 Oats (2015 7500)10 Soybean-notill (97286 884)11 Soybean-mintill (2455214 872)12 Soybean-clean (59354 911)13 Wheat (20528 1366)14 Woods (1265102 806)15 Buildings-Grass-Trees-Drives (38639 1010)16 Stone-Steel-Towers (9324 2581)Total 102491029 1004Numerical value in this table refers to number of samples number oftraining samples and pc respectively

Classification accuracy is reported via the concrete classifiersGenerally many dimension reduction research papers adopt119870-nearest neighborhood classifier (KNN) and support vectormachine (SVM) classifier to measure the performance of theextracting features after the dimension reduction where theoverall accuracy and kappa coefficient are detailed in thereports Hereby in this paper we also adopt KNN classifierand SVM classifier for performance measurement In KNNclassifier we select the value of 119870 as 1 5 and 9 so thatthree classifiers based on nearest neighborhoods are formedwhich are called 1NN 5NN and 9NN In SVM classifier weseek a hyperplane to separate classes in kernel-induced spacewhere the linear nonseparable classes in the original featurespace can be separated via kernel trick SVM as a robustand successful classifier has been widely used to evaluatethe performance of multifarious methods in many areas Forsimplicity and convenience we use LIBSVM package [35] forexperiments Accuracy of dimension reduced performancewill be reported by classified performance from SVM clas-sifier In the following schedule the feature subspace will becalculated at the first step from training samples by differentdimensional algorithms Table 1 gives a numerical statisticsof training samples corresponding to each class Then thenew sample will be projected into a low subspace by thetransformedmatrix Finally all the new samples are classifiedby SVM classifier

In this experiment a total of 1029 samples were selectedfor training and the remaining samples are used for testingNote that all the labeled samples in database are unbalancedand the available samples of each category differ dramaticallyThe following strategy is imposed for sample division A fixednumber of 15 samples are randomly selected to form thetraining sample yet the absent samples are randomly selected

Mathematical Problems in Engineering 11

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(b) 5NN

7 9 11 13 15Reduced space

0

01

02

03

04

05

06

07

08

09

1

Ove

rall

accu

racy

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(d) Liner SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF-SVM

Figure 2 Overall accuracy by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

12 Mathematical Problems in Engineering

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(b) 5NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(d) Linear SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF SVM

Figure 3 Kappa coefficient by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

Mathematical Problems in Engineering 13

(a) Pseudo 3-channel color image band = [12 79 140]

1

2

2

2

2

2 2

3

3

3

3

3

4

5

5

5

5

6

6 6 6

7

8

9

10

10

10 10

11

11

11

11

11 12

12 12

13 14

14

14 15

15

16

(b) Ground truth

(c) Distribution of training sample

Figure 4 Illustration of sample partition

from the remaining samples Under this strategy the trainingsamples and testing samples are listed in Table 1

Figure 2 shows the overall accuracy of different dimen-sion reduction methods applied to AVIRIS Indian 92AV3Cdata set The neighborhood of KNN classifier is selected as 15 and 9 respectively which produces three classifiers thatis 1NN 5NN and 9NN Three different kernel functionsare adopted for SVM classifier Then derived classifiers arealso used in this experiment that is linear SVM polynomialSVM and RBFSVM It can be deduced from Figures 2(a)sim2(c) that when the embedding space is greater than 5 pro-posed PD-LFDA performs the best while JGLDA performsthe worst The results produced by RP is slightly better thanJGLDA PCA LDA LPP and RP show the similar classifiedresults under KNN classifierThat is the proposed PD-LFDAoutperforms the others Meantime compared with LFDAthe proposed PD-LFDA leads to 2 more improvementson average Moreover it can be observed from (d) that theclassified accuracy increases steadily as the embedded spaceincreases However LDA demonstrates the highest overallaccuracywhen the reduced features vary to 9 while LFDAhas

the significant improvements when the number of reducedfeatures is greater than 9 This phenomenon of Figure 2(d)indicates the instability of linear SVM Nevertheless thesituation reversed for polynomial SVM and RBF SVM inFigures 2(e) and 2(f) wherein the proposed PD-LFDAwins a little improvement against LFDA and has significantimprovement compared with the others Inspired effects ofproposed PD-LFDA algorithm were achieved in all casesFurthermore Table 2 gives the detailed overall accuracyunder different feature dimensions using 3NN 7NN andRBFSVM classifiers which validates the feasibility of theproposed scheme in this paper

Figure 3 displays the results of kappa coefficient obtainedusing the different dimension reduction algorithms underKNN classifier and SVM classifiers The experimental cir-cumstance of Figure 3 is same as that of Figure 2 We canfind that from these results JGLDA performs the worst inmost cases except in Figure 3(e) The proposed PD-LFDAmethod outperforms the other methods and achieves thehighest kappa numerical value in most cases except usingthe linear SVM as the classifier In fact none of them work

14 Mathematical Problems in Engineering

686864717360

(a) PCA-7NN

687564707607

(b) LPP-7NN

792376538609

(c) LFDA-7NN

739070418130

(d) LDA-7NN

619556516209

(e) JGLDA-7NN

670962947164

(f) RP-7NN

799277288570

(g) PCA-RBFSVM

767473558288

(h) LPP-RBFSVM

837581518822

(i) LFDA-RBFSVM

728769288071

(j) LDA-RBFSVM

582453777121

(k) JGLDA-RBFSVM

763173278365

(l) RP-RBFSVM

Figure 5 Classified map generated by different dimension reduction methods where the overall accuracy kappa coefficient and averageclassification accuracy are listed at the top of each map respectively

steady in the case of linear SVM (Figure 3(d)) Note that thesituation is improved in polynomial SVM where the kappanumerical value of the proposed PD-LFDA is significantlybetter than the others All the achievements demonstrate therobustness of our contribution in PD-LFDA Simultaneouslyit is noticeable that LPP exhibits an average kappa levelThe kappa value gained by LPP is not seriously bad and isnot dramatically good The kappa results produced by RPare approximately the same as LPP A significant advantageof RP is the simple construction and computation wherethe accuracy is closed to LPP More details are summarizedin Table 3 It can be concluded that the kappa coefficientof proposed algorithm is higher than the other approacheswhich is more appropriate for the classification of HSI data

The visual results of all methods are presented in Figures5sim6 where the class labels are converted to pseudocolor

image The pseudocolor image of the hyperspectral imagefrom Indian 92AV3C database is shown in Figure 4(a) Theavailable labeled image which represents the truth groundis illustrated in Figure 4(b) where the labels are made byhuman The training samples are selected from the labeledimage represented as points in the image as shown inFigure 4(c) Each label number (ID) corresponds to eachclass name which is indexed in Table 1 In this experimentall the available labeled samples are used for testing whileapproximate 10 of samples are used for training Thesubspace is fixed to 13 (the number here is only used forreference it can be changed) For each experiment thedimension from original feature space is reduced to theobjective dimensionality thereafter the classified maps areinduced by 7NN classifier and RBF-SVM classifier Theoverall accuracy kappa coefficient and average accuracy are

Mathematical Problems in Engineering 15

837981698991

(a) PD-LFDA-7NN

848682798968

(b) D-LFDA-RBFSVM

Figure 6 Classified map of the proposed method

Table 2 Overall accuracy obtained using different dimensions offeature and different classifiers for Indian Pine Scene database ()

Item Dims7 9 11 13 15

3NNPCA 7007 6996 7026 7035 7087LPP 6575 6871 6835 6695 6546LFDA 7513 7895 7963 7992 8009LDA 6415 6606 6616 6596 6536PD-LFDA 7713 8094 816 8218 8263JGLDA 5405 5602 5704 5642 5751RP 6173 6444 6745 6632 6652

7NNPCA 6902 6902 691 6933 6972LPP 6782 7057 7027 6892 6681LFDA 7419 7777 7827 7863 7862LDA 6615 6909 6899 692 6882PD-LFDA 7736 809 8155 8172 8235JGLDA 5665 5811 5809 5812 5914RP 6194 6481 6696 6613 6603

RBF-SVMPCA 8051 7794 7718 7935 8176LPP 7101 7514 7614 7501 7314LFDA 7826 8187 82 8113 7915LDA 677 6834 6877 6642 6792PD-LFDA 7858 8267 8395 8423 8166JGLDA 5686 5606 5663 5689 5879RP 6957 7282 7441 7416 7543

also included at the top of each map respectively Figure 5displays the classifiedmaps of classicmethods in pseudocolorimages and the classified maps of proposed PD-LFDA arepresented in Figure 6 It can be observed from these mapsthat the best performance is achieved by PD-LFDA when7NN classifier is used in this case the overall accuracyis 8379 the kappa coefficient is 8169 and the averageaccuracy is 8991 Moreover the worst algorithm is JGLDAwhose overall accuracy is 6195 the kappa coefficient

Table 3 Kappa coefficient by different dimension reduction meth-ods and different classifiers applied to Indian Pine Scene database()

Item Dims7 9 11 13 15

3NNPCA 6592 658 6612 6624 6681LPP 6084 6422 6382 6219 605LFDA 7145 7586 7664 7697 7718LDA 5913 6126 6137 6118 6048PD-LFDA 7392 7824 7901 7966 8016JGLDA 4778 4992 5082 5006 5119RP 564 5963 6303 6172 6192

7NNPCA 6466 6466 6475 6501 6545LPP 6304 6629 6585 6429 6185LFDA 7047 7457 7515 7557 7557LDA 6132 646 6453 6478 6431PD-LFDA 7417 782 7893 7913 7985JGLDA 5029 519 5161 5159 5263RP 5638 5992 6237 6135 6124

RBF-SVMPCA 7767 747 7393 7636 7900LPP 6677 715 7264 7115 6921LFDA 7486 7915 7924 7822 7584LDA 629 6362 6427 6153 6323PD-LFDA 7551 8011 8159 8187 7899JGLDA 5109 5066 5133 5147 5369RP 653 6908 7085 706 7203

becomes 5651 and the average accuracy is only 6209Other methods such as PCA LPP and RP can produce thecomparable results and no one can outperform the otherHowever LDAoutperformsPCA LPP andRP yielding betterresult Similar conclusions can be achieved in the groupof RBF-SVM Generally proposed PD-LFDA significantlyoutperforms the rest in this experiment which indicates thecorrectness of improvements in proposed PD-LFDA

16 Mathematical Problems in Engineering

Table 4 Performance of dimension reduction on the whole labeled samples ()

Evaluated item MethodologiesPCA LDA LPP LFDA JGLDA RP PD-LFDA

7NNOverall accuracy 6868 739 6875 7923 6195 6709 8379Kappa coefficient 6471 7041 647 7653 5651 6294 8169Average accuracy 737 813 7607 8609 6209 7164 8991

RBF-SVMOverall accuracy 7992 7287 7674 8375 5824 7631 8486Kappa coefficient 7728 6928 7355 8151 5377 7327 8279Average accuracy 857 8071 8288 8822 7121 8365 8968

Finally details of assessment deduced by 7NN and RBF-SVM is summarized in Table 4 where the correspondingoverall accuracy kappa coefficient and average accuracy ofdifferent methods can be located collectively

5 Conclusions

In this paper we have analyzed local Fisher discriminantanalysis (LFDA) and found its weakness By replacing themaximum distance with local variance for the constructionof weight matrix introducing class prior probability into thecomputation of affinitymatrix an improved LFDA algorithmhas been proposed This novel approach is called PD-LFDAbecause the probability distribution (PD) is applied in LFDAalgorithm The proposed approach essentially can increasethe discriminant ability of transformed features in lowdimensional space The pattern found by the new approachis expected to be more accurate and coincides with thecharacter of HSI data and is conducive to classify HSI dataPD-LFDA has been evaluated on a real removing sensingAVIRIS Indian Pine 92AV3C data set We have compared theperformance of the proposed PD-LFDA with that of PCALPP LFDA LDA JGLDA and RP Both the numerical resultsand visual inspection of class maps have been obtained Inthe experiments KNN classifier and SVM classifier havebeen used We have argued that the proposed PD-LFDAexhibits the best performance and serves as a very effectivedimensionality reduction tool for high dimensional datasuch as hyperspectral image (HSI) data

Appendix

Procedure of Proposed Algorithm

The brief description of the algorithm to perform the pro-posed PD-LFDA method is already presented in Section 3The details of the algorithm are provided in Algorithm 2

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the Research of University ofMacau under Grants no MYRG205(Y1-L4)-FST11-TYY noMYRG187(Y1-L3)-FST11-TYY and no SRG010-FST11-TYYand by the National Natural Science Foundation of Chinaunder Grant no 61273244 This research project was alsosupported by the Science andTechnologyDevelopment Fund(FDCT) of Macau under Contract no 100-2012-A3

References

[1] W Li S Prasad Z Ye J E Fowler and M Cui ldquoLocality-preserving discriminant analysis for hyperspectral image clas-sification using local spatial informationrdquo in Proceedings ofthe 32nd IEEE International Geoscience and Remote SensingSymposium (IGARSS rsquo12) pp 4134ndash4137MunichGermany July2012

[2] HND LeM S Kim andD-H Kim ldquoComparison of singularvalue decomposition and principal component analysis appliedto hyperspectral imaging of biofilmrdquo in Proceedings of the IEEEPhotonics Conference (IPC rsquo12) pp 6ndash7 2012

[3] C K Chui and J Wang ldquoRandomized anisotropic transformfor nonlinear dimensionality reductionrdquo GEMmdashInternationalJournal on Geomathematics vol 1 no 1 pp 23ndash50 2010

[4] T V Bandos L Bruzzone and G Camps-Valls ldquoClassificationof hyperspectral images with regularized linear discriminantanalysisrdquo IEEE Transactions on Geoscience and Remote Sensingvol 47 no 3 pp 862ndash873 2009

[5] D Guangjun Z Yongsheng and J Song ldquoDimensionalityreduction of hyperspectral data based on ISOMAP algorithmrdquoin Proceedings of the 8th International Conference on ElectronicMeasurement and Instruments (ICEMI rsquo07) pp 3935ndash3938Xirsquoan China August 2007

[6] X Luo and M-F Jiang ldquoThe application of manifold learn-ing in dimensionality analysis for hyperspectral imageryrdquo inProceedings of the International Conference on Remote SensingEnvironment and Transportation Engineering (RSETE rsquo11) pp4572ndash4575 June 2011

[7] J Khodr and R Younes ldquoDimensionality reduction on hyper-spectral images a comparative review based on artificial datasrdquoin Proceedings of the 4th International Congress on Image andSignal Processing (CISP rsquo11) vol 4 pp 1875ndash1883 October 2011

Mathematical Problems in Engineering 17

[8] J Wen Z Tian H She and W Yan ldquoFeature extraction ofhyperspectral images based on preserving neighborhood dis-criminant embeddingrdquo in Proceedings of the 2nd InternationalConference on Image Analysis and Signal Processing (IASP rsquo10)pp 257ndash262 Zhejiang China April 2010

[9] Y-R Yeh S-Y Huang and Y-J Lee ldquoNonlinear dimensionreduction with kernel sliced inverse regressionrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 21 no 11 pp1590ndash1603 2009

[10] J He L Zhang Q Wang and Z Li ldquoUsing diffusion geometriccoordinates for hyperspectral imagery representationrdquo IEEEGeoscience and Remote Sensing Letters vol 6 no 4 pp 767ndash7712009

[11] J Peng P Zhang and N Riedel ldquoDiscriminant learninganalysisrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 38 no 6 pp 1614ndash1625 2008

[12] F S Tsai and K L Chan ldquoDimensionality reduction techniquesfor data explorationrdquo in Proceedings of the 6th InternationalConference on Information Communications and Signal Process-ing pp 1ndash5 December 2007

[13] M D Farrell Jr and R M Mersereau ldquoOn the impact of PCAdimension reduction for hyperspectral detection of difficulttargetsrdquo IEEE Geoscience and Remote Sensing Letters vol 2 no2 pp 192ndash195 2005

[14] S Prasad and L M Bruce ldquoLimitations of principal com-ponents analysis for hyperspectral target recognitionrdquo IEEEGeoscience and Remote Sensing Letters vol 5 no 4 pp 625ndash629 2008

[15] J Yu Q Tian T Rui and T S Huang ldquoIntegrating discriminantand descriptive information for dimension reduction and clas-sificationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 17 no 3 pp 372ndash377 2007

[16] J Kong S Wang J Wang L Ma B Fu and Y Lu ldquoAnovel approach for face recognition based on supervisedlocality preserving projection and maximummargin criterionrdquoin Proceedings of the International Conference on ComputerEngineering and Technology (ICCET rsquo09) vol 1 pp 419ndash423Singapore January 2009

[17] M Loog and R P W Duin ldquoLinear dimensionality reductionvia a heteroscedastic extension of LDA the Chernoff criterionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 26 no 6 pp 732ndash739 2004

[18] A C Jensen A Berge and A S Solberg ldquoRegressionapproaches to small sample inverse covariance matrix estima-tion for hyperspectral image classificationrdquo IEEE TransactionsonGeoscience andRemote Sensing vol 46 no 10 pp 2814ndash28222008

[19] J Jin BWang and L Zhang ldquoA novel approach based on Fisherdiscriminant null space for decomposition of mixed pixels inhyperspectral imageryrdquo IEEE Geoscience and Remote SensingLetters vol 7 no 4 pp 699ndash703 2010

[20] J Wen Z Tian X Liu and W Lin ldquoNeighborhood preservingorthogonal pnmf feature extraction for hyperspectral imageclassificationrdquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 6 no 2 pp 759ndash7682013

[21] Y Ren G Zhang G Yu and X Li ldquoLocal and global structurepreserving based feature selectionrdquoNeurocomputing vol 89 pp147ndash157 2012

[22] Z Fan Y Xu and D Zhang ldquoLocal linear discriminant analysisframework using sample neighborsrdquo IEEE Transactions onNeural Networks vol 22 no 7 pp 1119ndash1132 2011

[23] Y Wang S Huang D Liu and B Wang ldquoResearch advanceon band selection-based dimension reduction of hyperspectralremote sensing imagesrdquo in Proceedings of the 2nd InternationalConference on Remote Sensing Environment and TransportationEngineering (RSETE rsquo12) pp 1ndash4 IEEE Nanjing China June2012

[24] B Waske S van der Linden J A Benediktsson A Rabe andP Hostert ldquoSensitivity of support vector machines to randomfeature selection in classification of hyperspectral datardquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 7pp 2880ndash2889 2010

[25] M Sugiyama ldquoLocal fisher discriminant analysis for superviseddimensionality reductionrdquo in Proceedings of the 23rd Interna-tional Conference onMachine Learning (ICML rsquo06) pp 905ndash912ACM June 2006

[26] Y Cheng ldquoMean shift mode seeking and clusteringrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol17 no 8 pp 790ndash799 1995

[27] W Li S Prasad J E Fowler and L M Bruce ldquoLocality-preserving dimensionality reduction and classification forhyperspectral image analysisrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 50 no 4 pp 1185ndash1198 2012

[28] L Zelnik-Manor and P Perona ldquoSelf-tuning spectral cluster-ingrdquo in Proceedings of the 18th Annual Conference on NeuralInformation Processing Systems pp 1601ndash1608 December 2004

[29] W K Wong and H T Zhao ldquoSupervised optimal localitypreserving projectionrdquo Pattern Recognition vol 45 no 1 pp186ndash197 2012

[30] X He and P Niyogi ldquoLocality preserving projectionsrdquo inAdvances in Neural Information Processing Systems 16 SThrunL Saul and B Scholkopf Eds MIT Press Cambridge MassUSA 2004

[31] H Wang S Chen Z Hu and W Zheng ldquoLocality-preservedmaximum information projectionrdquo IEEE Transactions on Neu-ral Networks vol 19 no 4 pp 571ndash585 2008

[32] M Sugiyama ldquoDimensionality reduction ofmultimodal labeleddata by local fisher discriminant analysisrdquo Journal of MachineLearning Research vol 8 pp 1027ndash1061 2007

[33] S Yan D Xu B Zhang H J Zhang Q Yang and S Lin ldquoGraphembedding and extensions a general framework for dimen-sionality reductionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 40ndash51 2007

[34] C K Chui and J Wang ldquoDimensionality reduction of hyper-spectral imagery data for feature classificationrdquo in Handbookof Geomathematics pp 1005ndash1047 Springer Heidelberg Ger-many 2010

[35] C-C Chang and C-J Lin ldquoLIBSVM a Library for supportvector machinesrdquo ACM Transactions on Intelligent Systems andTechnology vol 2 no 3 article 27 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article Subspace Learning via Local Probability ...downloads.hindawi.com/journals/mpe/2015/145136.pdf · and applicable toolkits were engendered one a er another. Hyperspectral

8 Mathematical Problems in Engineering

Input HSI data samples119883 = 1199091 1199092 119909

119873 isin R119901times119873 the objective dimension to be embedded 119902

the nearest neighbor parameter 119870 (default 119870 equiv 7) and the test sample 119909119905isin R119901

Output Transformation matrix 119879 isin R119901times119902Steps are as follows

(1) Initialize matrices(2) 119878119908larr 0119901times119901

within-class scatter(3) 119878119887larr 0119901times119901

between-class scatter(4)(5) Compute within-class affinity matrix119882119908(6) for 119888 = 1 2 119862 do in a classwise manner(7) 119909

119888

119894119873119888

119894=1larr 119909

119895| 119897119895= 119888 the 119888th class data samples

(8) 119883 larr 119909119888

1 119909119888

2 119909

119888

119873119888 sample matrix

(9) 119882119894=(1119873119888times111times119873119888)

119873119888

(10)(11) Determine the local scaling(12) for 119894 = 1 2 119873119888 do(13) 119909

(119894)

119896larr the 119896th nearest neighbor of 119909119888

119894 119896 = 1 2 119870

(14) for 119896 = 1 2 119870 do(15) 119889

(119894)

119896= 119909119894minus 119909(119894)

1198962

(16) end for

(17) (119894)larr1

119870

119870

sum

119896=1

10038171003817100381710038171003817119909119894minus 119909(119894)

119896

10038171003817100381710038171003817

2

(18) 120588(119894)= radic

1

119870

119870

sum

119896=1

(119889(119894)

119896minus (119894))2

(19) end for(20)(21) Define local affinity matrix(22) for 119894 119895 = 1 2 119873119888 do(23) 119901(119897

119894) larr 119873

119888119873 prior probability

(24) 119860119894119895larr 119901(119897

119894) exp(minus

10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

)(1 + exp(minus10038171003817100381710038171003817119909119894minus 119909119895

10038171003817100381710038171003817

2

120588119894120588119895

))

(25) end for(26) 119860

119888= 119860

(27) end for(28)119882

119908= diag119882

11198822 119882

119862 in block diagonal manner

(29) 119860119908= diag119860

1 1198602 119860

119862 also in block diagonal manner

(30) for 119894 119895 = 1 2 119873 do(31)

119908(119894 119895) larr 119860

119908(119894 119895)119882

119908(119894 119895)

(32) end for(33)(34) Compute between-class affinity matrix119882119887

(35)119882119887larr(1119873times1

11times119873)

119873minus diag119882

11198822 119882

119862

(36) Let 119865nz denotes the nonzero flag of elements in119882119908 119865nz(119894 119895) = 1 if119882119908(119894 119895) = 0 119865nz(119894 119895) = 0 if119882119908(119894 119895) = 0

(37) 119865nz larr (1198821015840

119887= 0)

(38) 119860119887larr 0119873times119873

(39) 119860

119887(119865nz) = 119860119908(119865nz)

(40) 119860119887(not119865nz) = 1

(41) for 119894 119895 = 1 2 119873 do(42)

119887(119894 119895) larr 119860

119887(119894 119895)119882

119887(119894 119895)

(43) end for(44) Now construct Laplacian matrix for within affinity matrix

119908and between affinity matrix

119887

(45) Let(46)119863119908

119894119894= sum

119895

119908(119894 119895)119863119887

119894119894= sum

119895

119887(119894 119895)

(47) Then

Algorithm 2 Continued

Mathematical Problems in Engineering 9

(48) 119871119908= 119908minus 119863119908 119871119887= 119887minus 119863119887

(49) Construct two matrixs below(50) 119878

119887= 119883119871

119887119883119879 119878119908= 119883119871

119908119883119879

(51) Let 1205931 1205932 120593

119902 be the general eigenvector of

(52) 119878119887120593119894= 119894119878119908120593119894 forall119894 isin 1 2 119902

(53) with the corresponding eigenvalue in descending order 1ge 2ge sdot sdot sdot ge

119902

(54)(55) Finally the transformation matrix can be represented as(56) 119879 = radic120582

11205931 radic12058211205932 radic120582

1120593119902 isin R119901times119902

(57)(58) For a new test sample 119909

119905 the embedding 119911

119905is given by

(59) 119911119905= 119879119879119909119905isin R119902

Algorithm 2 Proposed PD-LFDA method

the conventional LDA thus our method is expected to bemore robust and significantly preponderant

For large scale data sets we discuss a scheme that canaccelerate the computation procedure of the within-scattermatrix 119878

119908 In our algorithm owning to the fact that we have

put penalty on the affinity matrix for different class samplesin constructing the between-scatter matrix the acceleratedprocedure will remain for further discussion

The within-class scatter 119878119908can be reformulated as

119878119908=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894119909119879

119894+ 119909119895119909119879

119895minus 119909119894119909119879

119895minus 119909119895119909119879

119894)

=

119873

sum

119894=1

(

119873

sum

119895=1

119908(119894 119895))119909

119894119909119879

119894minus

119873

sum

119894119895=1

119908(119894 119895) 119909

119894119909119879

119895

= 119883(119863119908minus 119908)119883119879

= 119883119908119883119879

(36)

Here

119863119908 (119894 119894) =

119873

sum

119895=1

119908(119894 119895)

119908= 119863119908minus 119908

(37)

119908can be block diagonal if all samples 119909

119894119873

119894=1are sorted

according to their labels This property implies that 119863119908and

119908can also be block diagonal matrix Hence if we compute

119878119908

through (36) then the procedure will be much moreefficient Similarly 119878

119887can also be formulated as

119878119887= 119883

119887119883119879

= 119883(119863119887minus 119887)119883119879

(38)

Nevertheless 119887is dense and can not be further sim-

plified However the simplified computational procedure of119908

saves for us part of time in a way In this paper weadopt the above procedure to accelerate 119878

119908and pursue 119878

119887

normally In addition to locality structure some papers showthat another property for example marginal information isalso important and should be preserved in the reduced spaceThe theory of extended LDA and LPP algorithm is developedrapidly recently Yan et al [33] summarized these algorithmsin a graph embedding framework and also proposed amarginal fisher analysis embedding (MFA) algorithm underthis framework

In MFA the criterion is characterized by intraclasscompactness and interclass marginal superability which isreplaced for thewithin-class scatter and between-class scatterseverally The intraclass relationship is reflected by an intrin-sic graph which is constructed by 119870-nearest neighborhoodsample data points in the same class while the interclasssuperability is mirrored by a penalty graph computed formarginal points from different classes Following this ideathe intraclass compactness is given as follows

119878119894= sum

119894119895 119894isin119873(119896)(119895)or 119895isin119873(119896)(119894)

10038171003817100381710038171003817119879119879119909119894minus 119879119879119909119895

10038171003817100381710038171003817

2

= 2119879119879119883 (119863 minus119882)119883

119879119879

(39)

where

119882(119894 119895) = 1 if 119894 isin 119873(119896) (119895) or 119895 isin 119873(119896) (119894) 0 otherwise

(40)

Here 119873(119896)(119895) represents the 119870-nearest neighborhood indexset of 119909

119895from the same class and119863 is the row sum (or column

sum) of 119882 119863(119894 119894) = sum119894119882119894119895 Interclass separability is indi-

cated by a penalty graph whose term is expressed as follows

119878119890= sum

119894119895 (119894119895)isin119875(119896)(119897119895)

or (119894119895)isin119875(119896)(119897119894)

10038171003817100381710038171003817119879119879119909119894minus 119879119879119909119895

10038171003817100381710038171003817

2

= 2119879119879119883(119863 minus )119883

119879119879

(41)

10 Mathematical Problems in Engineering

of which

(119894 119895) = 1 if (119894 119895) isin 119875(119896) (119897

119895) or (119894 119895) isin 119875(119896) (119897

119894)

0 otherwise(42)

Note that 119878119894and 119878

119890are corresponding to ldquowithin-scatter

matrixrdquo and ldquobetween-scatter matrixrdquo of the traditional LDAalternatively The optimal solution of MFA can be achievedby solving the following minimization problem That is

= arg119879

min 119879119879119883(119863 minus119882)119883

119879119879

119879119879119883(119863 minus )119883119879119879

(43)

We know that (43) is also a general eigenvalue decompositionproblem Let 119879PCA indicate the transformation matrix fromthe original space to PCA subspace with certain energyremaining and then the final projection of MFA is output as

119879MFA = 119879PCA (44)

As can be seenMFA constructs twoweightedmatrices119882and according to the intraclass compactness and interclassseparability In LFDA and PD-LFDA only one affinity isconstructed The difference lies in that the ldquoweightrdquo in LFDAand PD-LFDA is in the range of [0 1] according to the levelof difference Yet MFA distributes the same weight to its 119870-nearest neighborhoodsThe optimal solution ofMFA LFDAand PD-LFDA can be attributed to a general eigenvaluedecomposition problem Hence the idea of MFA LFDA andPD-LFDA is approximately similar in a certain interpretationRelationship with other methodologies can be analyzed in ananalogous way

4 Experimental Results

To illustrate the performance of PD-LFDA experimentson a real hyperspectral remote sensing image data setmdashAVIRIS Indian Pine 1992 are conducted in this sectionThe AVIRIS Indian Pines 1992 data set was gathered byNational Aeronautics and Space Administrator (NASA) withAirborne VisibleInfrared Imaging Spectrometer (AVIRIS)sensor over the Indian Pine test site in northwest Indians inJune 1992 This data set consists of 145 times 145 pixels and 224spectral reflectance bands ranging from 04 120583m to 245 120583mwith a spatial resolution of 20m The Indian Pines scene iscomposed of two-third agriculture and one-third forest orother natural perennial vegetation Some other landmarkssuch as dual lane highways rail line low density housing andsmaller roads are also in this image Since the scene was takenin June some main crops for example soybeans and cornare in their early growth stage with less than 5 coveragewhile the no-till min-till and clean-till indicate the amountof previous crop residue remaining The region map can bereferred to Figure 5(a) The 20 water absorption bands (ie[108ndash112 154ndash167] 224) were discarded

In this section performance of different dimensionreduction methods that is PCA LPP LFDA LDA JGLDAand RP [34] is compared with the proposed PD-LFDA

Table 1 Training set in AVIRIS Indian Pines 1992 database

ID Class Name1 Alfalfa (4618 3913)2 Corn-notill (1428136 952)3 Corn-mintill (83087 1048)4 Corn (23734 1434)5 Grass-pasture (48354 1118)6 Grass-trees (73071 973)7 Grass-pasture-mowed (2817 6071)8 Hay-windrowed (47850 1046)9 Oats (2015 7500)10 Soybean-notill (97286 884)11 Soybean-mintill (2455214 872)12 Soybean-clean (59354 911)13 Wheat (20528 1366)14 Woods (1265102 806)15 Buildings-Grass-Trees-Drives (38639 1010)16 Stone-Steel-Towers (9324 2581)Total 102491029 1004Numerical value in this table refers to number of samples number oftraining samples and pc respectively

Classification accuracy is reported via the concrete classifiersGenerally many dimension reduction research papers adopt119870-nearest neighborhood classifier (KNN) and support vectormachine (SVM) classifier to measure the performance of theextracting features after the dimension reduction where theoverall accuracy and kappa coefficient are detailed in thereports Hereby in this paper we also adopt KNN classifierand SVM classifier for performance measurement In KNNclassifier we select the value of 119870 as 1 5 and 9 so thatthree classifiers based on nearest neighborhoods are formedwhich are called 1NN 5NN and 9NN In SVM classifier weseek a hyperplane to separate classes in kernel-induced spacewhere the linear nonseparable classes in the original featurespace can be separated via kernel trick SVM as a robustand successful classifier has been widely used to evaluatethe performance of multifarious methods in many areas Forsimplicity and convenience we use LIBSVM package [35] forexperiments Accuracy of dimension reduced performancewill be reported by classified performance from SVM clas-sifier In the following schedule the feature subspace will becalculated at the first step from training samples by differentdimensional algorithms Table 1 gives a numerical statisticsof training samples corresponding to each class Then thenew sample will be projected into a low subspace by thetransformedmatrix Finally all the new samples are classifiedby SVM classifier

In this experiment a total of 1029 samples were selectedfor training and the remaining samples are used for testingNote that all the labeled samples in database are unbalancedand the available samples of each category differ dramaticallyThe following strategy is imposed for sample division A fixednumber of 15 samples are randomly selected to form thetraining sample yet the absent samples are randomly selected

Mathematical Problems in Engineering 11

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(b) 5NN

7 9 11 13 15Reduced space

0

01

02

03

04

05

06

07

08

09

1

Ove

rall

accu

racy

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(d) Liner SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF-SVM

Figure 2 Overall accuracy by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

12 Mathematical Problems in Engineering

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(b) 5NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(d) Linear SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF SVM

Figure 3 Kappa coefficient by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

Mathematical Problems in Engineering 13

(a) Pseudo 3-channel color image band = [12 79 140]

1

2

2

2

2

2 2

3

3

3

3

3

4

5

5

5

5

6

6 6 6

7

8

9

10

10

10 10

11

11

11

11

11 12

12 12

13 14

14

14 15

15

16

(b) Ground truth

(c) Distribution of training sample

Figure 4 Illustration of sample partition

from the remaining samples Under this strategy the trainingsamples and testing samples are listed in Table 1

Figure 2 shows the overall accuracy of different dimen-sion reduction methods applied to AVIRIS Indian 92AV3Cdata set The neighborhood of KNN classifier is selected as 15 and 9 respectively which produces three classifiers thatis 1NN 5NN and 9NN Three different kernel functionsare adopted for SVM classifier Then derived classifiers arealso used in this experiment that is linear SVM polynomialSVM and RBFSVM It can be deduced from Figures 2(a)sim2(c) that when the embedding space is greater than 5 pro-posed PD-LFDA performs the best while JGLDA performsthe worst The results produced by RP is slightly better thanJGLDA PCA LDA LPP and RP show the similar classifiedresults under KNN classifierThat is the proposed PD-LFDAoutperforms the others Meantime compared with LFDAthe proposed PD-LFDA leads to 2 more improvementson average Moreover it can be observed from (d) that theclassified accuracy increases steadily as the embedded spaceincreases However LDA demonstrates the highest overallaccuracywhen the reduced features vary to 9 while LFDAhas

the significant improvements when the number of reducedfeatures is greater than 9 This phenomenon of Figure 2(d)indicates the instability of linear SVM Nevertheless thesituation reversed for polynomial SVM and RBF SVM inFigures 2(e) and 2(f) wherein the proposed PD-LFDAwins a little improvement against LFDA and has significantimprovement compared with the others Inspired effects ofproposed PD-LFDA algorithm were achieved in all casesFurthermore Table 2 gives the detailed overall accuracyunder different feature dimensions using 3NN 7NN andRBFSVM classifiers which validates the feasibility of theproposed scheme in this paper

Figure 3 displays the results of kappa coefficient obtainedusing the different dimension reduction algorithms underKNN classifier and SVM classifiers The experimental cir-cumstance of Figure 3 is same as that of Figure 2 We canfind that from these results JGLDA performs the worst inmost cases except in Figure 3(e) The proposed PD-LFDAmethod outperforms the other methods and achieves thehighest kappa numerical value in most cases except usingthe linear SVM as the classifier In fact none of them work

14 Mathematical Problems in Engineering

686864717360

(a) PCA-7NN

687564707607

(b) LPP-7NN

792376538609

(c) LFDA-7NN

739070418130

(d) LDA-7NN

619556516209

(e) JGLDA-7NN

670962947164

(f) RP-7NN

799277288570

(g) PCA-RBFSVM

767473558288

(h) LPP-RBFSVM

837581518822

(i) LFDA-RBFSVM

728769288071

(j) LDA-RBFSVM

582453777121

(k) JGLDA-RBFSVM

763173278365

(l) RP-RBFSVM

Figure 5 Classified map generated by different dimension reduction methods where the overall accuracy kappa coefficient and averageclassification accuracy are listed at the top of each map respectively

steady in the case of linear SVM (Figure 3(d)) Note that thesituation is improved in polynomial SVM where the kappanumerical value of the proposed PD-LFDA is significantlybetter than the others All the achievements demonstrate therobustness of our contribution in PD-LFDA Simultaneouslyit is noticeable that LPP exhibits an average kappa levelThe kappa value gained by LPP is not seriously bad and isnot dramatically good The kappa results produced by RPare approximately the same as LPP A significant advantageof RP is the simple construction and computation wherethe accuracy is closed to LPP More details are summarizedin Table 3 It can be concluded that the kappa coefficientof proposed algorithm is higher than the other approacheswhich is more appropriate for the classification of HSI data

The visual results of all methods are presented in Figures5sim6 where the class labels are converted to pseudocolor

image The pseudocolor image of the hyperspectral imagefrom Indian 92AV3C database is shown in Figure 4(a) Theavailable labeled image which represents the truth groundis illustrated in Figure 4(b) where the labels are made byhuman The training samples are selected from the labeledimage represented as points in the image as shown inFigure 4(c) Each label number (ID) corresponds to eachclass name which is indexed in Table 1 In this experimentall the available labeled samples are used for testing whileapproximate 10 of samples are used for training Thesubspace is fixed to 13 (the number here is only used forreference it can be changed) For each experiment thedimension from original feature space is reduced to theobjective dimensionality thereafter the classified maps areinduced by 7NN classifier and RBF-SVM classifier Theoverall accuracy kappa coefficient and average accuracy are

Mathematical Problems in Engineering 15

837981698991

(a) PD-LFDA-7NN

848682798968

(b) D-LFDA-RBFSVM

Figure 6 Classified map of the proposed method

Table 2 Overall accuracy obtained using different dimensions offeature and different classifiers for Indian Pine Scene database ()

Item Dims7 9 11 13 15

3NNPCA 7007 6996 7026 7035 7087LPP 6575 6871 6835 6695 6546LFDA 7513 7895 7963 7992 8009LDA 6415 6606 6616 6596 6536PD-LFDA 7713 8094 816 8218 8263JGLDA 5405 5602 5704 5642 5751RP 6173 6444 6745 6632 6652

7NNPCA 6902 6902 691 6933 6972LPP 6782 7057 7027 6892 6681LFDA 7419 7777 7827 7863 7862LDA 6615 6909 6899 692 6882PD-LFDA 7736 809 8155 8172 8235JGLDA 5665 5811 5809 5812 5914RP 6194 6481 6696 6613 6603

RBF-SVMPCA 8051 7794 7718 7935 8176LPP 7101 7514 7614 7501 7314LFDA 7826 8187 82 8113 7915LDA 677 6834 6877 6642 6792PD-LFDA 7858 8267 8395 8423 8166JGLDA 5686 5606 5663 5689 5879RP 6957 7282 7441 7416 7543

also included at the top of each map respectively Figure 5displays the classifiedmaps of classicmethods in pseudocolorimages and the classified maps of proposed PD-LFDA arepresented in Figure 6 It can be observed from these mapsthat the best performance is achieved by PD-LFDA when7NN classifier is used in this case the overall accuracyis 8379 the kappa coefficient is 8169 and the averageaccuracy is 8991 Moreover the worst algorithm is JGLDAwhose overall accuracy is 6195 the kappa coefficient

Table 3 Kappa coefficient by different dimension reduction meth-ods and different classifiers applied to Indian Pine Scene database()

Item Dims7 9 11 13 15

3NNPCA 6592 658 6612 6624 6681LPP 6084 6422 6382 6219 605LFDA 7145 7586 7664 7697 7718LDA 5913 6126 6137 6118 6048PD-LFDA 7392 7824 7901 7966 8016JGLDA 4778 4992 5082 5006 5119RP 564 5963 6303 6172 6192

7NNPCA 6466 6466 6475 6501 6545LPP 6304 6629 6585 6429 6185LFDA 7047 7457 7515 7557 7557LDA 6132 646 6453 6478 6431PD-LFDA 7417 782 7893 7913 7985JGLDA 5029 519 5161 5159 5263RP 5638 5992 6237 6135 6124

RBF-SVMPCA 7767 747 7393 7636 7900LPP 6677 715 7264 7115 6921LFDA 7486 7915 7924 7822 7584LDA 629 6362 6427 6153 6323PD-LFDA 7551 8011 8159 8187 7899JGLDA 5109 5066 5133 5147 5369RP 653 6908 7085 706 7203

becomes 5651 and the average accuracy is only 6209Other methods such as PCA LPP and RP can produce thecomparable results and no one can outperform the otherHowever LDAoutperformsPCA LPP andRP yielding betterresult Similar conclusions can be achieved in the groupof RBF-SVM Generally proposed PD-LFDA significantlyoutperforms the rest in this experiment which indicates thecorrectness of improvements in proposed PD-LFDA

16 Mathematical Problems in Engineering

Table 4 Performance of dimension reduction on the whole labeled samples ()

Evaluated item MethodologiesPCA LDA LPP LFDA JGLDA RP PD-LFDA

7NNOverall accuracy 6868 739 6875 7923 6195 6709 8379Kappa coefficient 6471 7041 647 7653 5651 6294 8169Average accuracy 737 813 7607 8609 6209 7164 8991

RBF-SVMOverall accuracy 7992 7287 7674 8375 5824 7631 8486Kappa coefficient 7728 6928 7355 8151 5377 7327 8279Average accuracy 857 8071 8288 8822 7121 8365 8968

Finally details of assessment deduced by 7NN and RBF-SVM is summarized in Table 4 where the correspondingoverall accuracy kappa coefficient and average accuracy ofdifferent methods can be located collectively

5 Conclusions

In this paper we have analyzed local Fisher discriminantanalysis (LFDA) and found its weakness By replacing themaximum distance with local variance for the constructionof weight matrix introducing class prior probability into thecomputation of affinitymatrix an improved LFDA algorithmhas been proposed This novel approach is called PD-LFDAbecause the probability distribution (PD) is applied in LFDAalgorithm The proposed approach essentially can increasethe discriminant ability of transformed features in lowdimensional space The pattern found by the new approachis expected to be more accurate and coincides with thecharacter of HSI data and is conducive to classify HSI dataPD-LFDA has been evaluated on a real removing sensingAVIRIS Indian Pine 92AV3C data set We have compared theperformance of the proposed PD-LFDA with that of PCALPP LFDA LDA JGLDA and RP Both the numerical resultsand visual inspection of class maps have been obtained Inthe experiments KNN classifier and SVM classifier havebeen used We have argued that the proposed PD-LFDAexhibits the best performance and serves as a very effectivedimensionality reduction tool for high dimensional datasuch as hyperspectral image (HSI) data

Appendix

Procedure of Proposed Algorithm

The brief description of the algorithm to perform the pro-posed PD-LFDA method is already presented in Section 3The details of the algorithm are provided in Algorithm 2

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the Research of University ofMacau under Grants no MYRG205(Y1-L4)-FST11-TYY noMYRG187(Y1-L3)-FST11-TYY and no SRG010-FST11-TYYand by the National Natural Science Foundation of Chinaunder Grant no 61273244 This research project was alsosupported by the Science andTechnologyDevelopment Fund(FDCT) of Macau under Contract no 100-2012-A3

References

[1] W Li S Prasad Z Ye J E Fowler and M Cui ldquoLocality-preserving discriminant analysis for hyperspectral image clas-sification using local spatial informationrdquo in Proceedings ofthe 32nd IEEE International Geoscience and Remote SensingSymposium (IGARSS rsquo12) pp 4134ndash4137MunichGermany July2012

[2] HND LeM S Kim andD-H Kim ldquoComparison of singularvalue decomposition and principal component analysis appliedto hyperspectral imaging of biofilmrdquo in Proceedings of the IEEEPhotonics Conference (IPC rsquo12) pp 6ndash7 2012

[3] C K Chui and J Wang ldquoRandomized anisotropic transformfor nonlinear dimensionality reductionrdquo GEMmdashInternationalJournal on Geomathematics vol 1 no 1 pp 23ndash50 2010

[4] T V Bandos L Bruzzone and G Camps-Valls ldquoClassificationof hyperspectral images with regularized linear discriminantanalysisrdquo IEEE Transactions on Geoscience and Remote Sensingvol 47 no 3 pp 862ndash873 2009

[5] D Guangjun Z Yongsheng and J Song ldquoDimensionalityreduction of hyperspectral data based on ISOMAP algorithmrdquoin Proceedings of the 8th International Conference on ElectronicMeasurement and Instruments (ICEMI rsquo07) pp 3935ndash3938Xirsquoan China August 2007

[6] X Luo and M-F Jiang ldquoThe application of manifold learn-ing in dimensionality analysis for hyperspectral imageryrdquo inProceedings of the International Conference on Remote SensingEnvironment and Transportation Engineering (RSETE rsquo11) pp4572ndash4575 June 2011

[7] J Khodr and R Younes ldquoDimensionality reduction on hyper-spectral images a comparative review based on artificial datasrdquoin Proceedings of the 4th International Congress on Image andSignal Processing (CISP rsquo11) vol 4 pp 1875ndash1883 October 2011

Mathematical Problems in Engineering 17

[8] J Wen Z Tian H She and W Yan ldquoFeature extraction ofhyperspectral images based on preserving neighborhood dis-criminant embeddingrdquo in Proceedings of the 2nd InternationalConference on Image Analysis and Signal Processing (IASP rsquo10)pp 257ndash262 Zhejiang China April 2010

[9] Y-R Yeh S-Y Huang and Y-J Lee ldquoNonlinear dimensionreduction with kernel sliced inverse regressionrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 21 no 11 pp1590ndash1603 2009

[10] J He L Zhang Q Wang and Z Li ldquoUsing diffusion geometriccoordinates for hyperspectral imagery representationrdquo IEEEGeoscience and Remote Sensing Letters vol 6 no 4 pp 767ndash7712009

[11] J Peng P Zhang and N Riedel ldquoDiscriminant learninganalysisrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 38 no 6 pp 1614ndash1625 2008

[12] F S Tsai and K L Chan ldquoDimensionality reduction techniquesfor data explorationrdquo in Proceedings of the 6th InternationalConference on Information Communications and Signal Process-ing pp 1ndash5 December 2007

[13] M D Farrell Jr and R M Mersereau ldquoOn the impact of PCAdimension reduction for hyperspectral detection of difficulttargetsrdquo IEEE Geoscience and Remote Sensing Letters vol 2 no2 pp 192ndash195 2005

[14] S Prasad and L M Bruce ldquoLimitations of principal com-ponents analysis for hyperspectral target recognitionrdquo IEEEGeoscience and Remote Sensing Letters vol 5 no 4 pp 625ndash629 2008

[15] J Yu Q Tian T Rui and T S Huang ldquoIntegrating discriminantand descriptive information for dimension reduction and clas-sificationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 17 no 3 pp 372ndash377 2007

[16] J Kong S Wang J Wang L Ma B Fu and Y Lu ldquoAnovel approach for face recognition based on supervisedlocality preserving projection and maximummargin criterionrdquoin Proceedings of the International Conference on ComputerEngineering and Technology (ICCET rsquo09) vol 1 pp 419ndash423Singapore January 2009

[17] M Loog and R P W Duin ldquoLinear dimensionality reductionvia a heteroscedastic extension of LDA the Chernoff criterionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 26 no 6 pp 732ndash739 2004

[18] A C Jensen A Berge and A S Solberg ldquoRegressionapproaches to small sample inverse covariance matrix estima-tion for hyperspectral image classificationrdquo IEEE TransactionsonGeoscience andRemote Sensing vol 46 no 10 pp 2814ndash28222008

[19] J Jin BWang and L Zhang ldquoA novel approach based on Fisherdiscriminant null space for decomposition of mixed pixels inhyperspectral imageryrdquo IEEE Geoscience and Remote SensingLetters vol 7 no 4 pp 699ndash703 2010

[20] J Wen Z Tian X Liu and W Lin ldquoNeighborhood preservingorthogonal pnmf feature extraction for hyperspectral imageclassificationrdquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 6 no 2 pp 759ndash7682013

[21] Y Ren G Zhang G Yu and X Li ldquoLocal and global structurepreserving based feature selectionrdquoNeurocomputing vol 89 pp147ndash157 2012

[22] Z Fan Y Xu and D Zhang ldquoLocal linear discriminant analysisframework using sample neighborsrdquo IEEE Transactions onNeural Networks vol 22 no 7 pp 1119ndash1132 2011

[23] Y Wang S Huang D Liu and B Wang ldquoResearch advanceon band selection-based dimension reduction of hyperspectralremote sensing imagesrdquo in Proceedings of the 2nd InternationalConference on Remote Sensing Environment and TransportationEngineering (RSETE rsquo12) pp 1ndash4 IEEE Nanjing China June2012

[24] B Waske S van der Linden J A Benediktsson A Rabe andP Hostert ldquoSensitivity of support vector machines to randomfeature selection in classification of hyperspectral datardquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 7pp 2880ndash2889 2010

[25] M Sugiyama ldquoLocal fisher discriminant analysis for superviseddimensionality reductionrdquo in Proceedings of the 23rd Interna-tional Conference onMachine Learning (ICML rsquo06) pp 905ndash912ACM June 2006

[26] Y Cheng ldquoMean shift mode seeking and clusteringrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol17 no 8 pp 790ndash799 1995

[27] W Li S Prasad J E Fowler and L M Bruce ldquoLocality-preserving dimensionality reduction and classification forhyperspectral image analysisrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 50 no 4 pp 1185ndash1198 2012

[28] L Zelnik-Manor and P Perona ldquoSelf-tuning spectral cluster-ingrdquo in Proceedings of the 18th Annual Conference on NeuralInformation Processing Systems pp 1601ndash1608 December 2004

[29] W K Wong and H T Zhao ldquoSupervised optimal localitypreserving projectionrdquo Pattern Recognition vol 45 no 1 pp186ndash197 2012

[30] X He and P Niyogi ldquoLocality preserving projectionsrdquo inAdvances in Neural Information Processing Systems 16 SThrunL Saul and B Scholkopf Eds MIT Press Cambridge MassUSA 2004

[31] H Wang S Chen Z Hu and W Zheng ldquoLocality-preservedmaximum information projectionrdquo IEEE Transactions on Neu-ral Networks vol 19 no 4 pp 571ndash585 2008

[32] M Sugiyama ldquoDimensionality reduction ofmultimodal labeleddata by local fisher discriminant analysisrdquo Journal of MachineLearning Research vol 8 pp 1027ndash1061 2007

[33] S Yan D Xu B Zhang H J Zhang Q Yang and S Lin ldquoGraphembedding and extensions a general framework for dimen-sionality reductionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 40ndash51 2007

[34] C K Chui and J Wang ldquoDimensionality reduction of hyper-spectral imagery data for feature classificationrdquo in Handbookof Geomathematics pp 1005ndash1047 Springer Heidelberg Ger-many 2010

[35] C-C Chang and C-J Lin ldquoLIBSVM a Library for supportvector machinesrdquo ACM Transactions on Intelligent Systems andTechnology vol 2 no 3 article 27 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article Subspace Learning via Local Probability ...downloads.hindawi.com/journals/mpe/2015/145136.pdf · and applicable toolkits were engendered one a er another. Hyperspectral

Mathematical Problems in Engineering 9

(48) 119871119908= 119908minus 119863119908 119871119887= 119887minus 119863119887

(49) Construct two matrixs below(50) 119878

119887= 119883119871

119887119883119879 119878119908= 119883119871

119908119883119879

(51) Let 1205931 1205932 120593

119902 be the general eigenvector of

(52) 119878119887120593119894= 119894119878119908120593119894 forall119894 isin 1 2 119902

(53) with the corresponding eigenvalue in descending order 1ge 2ge sdot sdot sdot ge

119902

(54)(55) Finally the transformation matrix can be represented as(56) 119879 = radic120582

11205931 radic12058211205932 radic120582

1120593119902 isin R119901times119902

(57)(58) For a new test sample 119909

119905 the embedding 119911

119905is given by

(59) 119911119905= 119879119879119909119905isin R119902

Algorithm 2 Proposed PD-LFDA method

the conventional LDA thus our method is expected to bemore robust and significantly preponderant

For large scale data sets we discuss a scheme that canaccelerate the computation procedure of the within-scattermatrix 119878

119908 In our algorithm owning to the fact that we have

put penalty on the affinity matrix for different class samplesin constructing the between-scatter matrix the acceleratedprocedure will remain for further discussion

The within-class scatter 119878119908can be reformulated as

119878119908=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894minus 119909119895) (119909119894minus 119909119895)119879

=1

2

119873

sum

119894119895=1

119908(119894 119895) (119909

119894119909119879

119894+ 119909119895119909119879

119895minus 119909119894119909119879

119895minus 119909119895119909119879

119894)

=

119873

sum

119894=1

(

119873

sum

119895=1

119908(119894 119895))119909

119894119909119879

119894minus

119873

sum

119894119895=1

119908(119894 119895) 119909

119894119909119879

119895

= 119883(119863119908minus 119908)119883119879

= 119883119908119883119879

(36)

Here

119863119908 (119894 119894) =

119873

sum

119895=1

119908(119894 119895)

119908= 119863119908minus 119908

(37)

119908can be block diagonal if all samples 119909

119894119873

119894=1are sorted

according to their labels This property implies that 119863119908and

119908can also be block diagonal matrix Hence if we compute

119878119908

through (36) then the procedure will be much moreefficient Similarly 119878

119887can also be formulated as

119878119887= 119883

119887119883119879

= 119883(119863119887minus 119887)119883119879

(38)

Nevertheless 119887is dense and can not be further sim-

plified However the simplified computational procedure of119908

saves for us part of time in a way In this paper weadopt the above procedure to accelerate 119878

119908and pursue 119878

119887

normally In addition to locality structure some papers showthat another property for example marginal information isalso important and should be preserved in the reduced spaceThe theory of extended LDA and LPP algorithm is developedrapidly recently Yan et al [33] summarized these algorithmsin a graph embedding framework and also proposed amarginal fisher analysis embedding (MFA) algorithm underthis framework

In MFA the criterion is characterized by intraclasscompactness and interclass marginal superability which isreplaced for thewithin-class scatter and between-class scatterseverally The intraclass relationship is reflected by an intrin-sic graph which is constructed by 119870-nearest neighborhoodsample data points in the same class while the interclasssuperability is mirrored by a penalty graph computed formarginal points from different classes Following this ideathe intraclass compactness is given as follows

119878119894= sum

119894119895 119894isin119873(119896)(119895)or 119895isin119873(119896)(119894)

10038171003817100381710038171003817119879119879119909119894minus 119879119879119909119895

10038171003817100381710038171003817

2

= 2119879119879119883 (119863 minus119882)119883

119879119879

(39)

where

119882(119894 119895) = 1 if 119894 isin 119873(119896) (119895) or 119895 isin 119873(119896) (119894) 0 otherwise

(40)

Here 119873(119896)(119895) represents the 119870-nearest neighborhood indexset of 119909

119895from the same class and119863 is the row sum (or column

sum) of 119882 119863(119894 119894) = sum119894119882119894119895 Interclass separability is indi-

cated by a penalty graph whose term is expressed as follows

119878119890= sum

119894119895 (119894119895)isin119875(119896)(119897119895)

or (119894119895)isin119875(119896)(119897119894)

10038171003817100381710038171003817119879119879119909119894minus 119879119879119909119895

10038171003817100381710038171003817

2

= 2119879119879119883(119863 minus )119883

119879119879

(41)

10 Mathematical Problems in Engineering

of which

(119894 119895) = 1 if (119894 119895) isin 119875(119896) (119897

119895) or (119894 119895) isin 119875(119896) (119897

119894)

0 otherwise(42)

Note that 119878119894and 119878

119890are corresponding to ldquowithin-scatter

matrixrdquo and ldquobetween-scatter matrixrdquo of the traditional LDAalternatively The optimal solution of MFA can be achievedby solving the following minimization problem That is

= arg119879

min 119879119879119883(119863 minus119882)119883

119879119879

119879119879119883(119863 minus )119883119879119879

(43)

We know that (43) is also a general eigenvalue decompositionproblem Let 119879PCA indicate the transformation matrix fromthe original space to PCA subspace with certain energyremaining and then the final projection of MFA is output as

119879MFA = 119879PCA (44)

As can be seenMFA constructs twoweightedmatrices119882and according to the intraclass compactness and interclassseparability In LFDA and PD-LFDA only one affinity isconstructed The difference lies in that the ldquoweightrdquo in LFDAand PD-LFDA is in the range of [0 1] according to the levelof difference Yet MFA distributes the same weight to its 119870-nearest neighborhoodsThe optimal solution ofMFA LFDAand PD-LFDA can be attributed to a general eigenvaluedecomposition problem Hence the idea of MFA LFDA andPD-LFDA is approximately similar in a certain interpretationRelationship with other methodologies can be analyzed in ananalogous way

4 Experimental Results

To illustrate the performance of PD-LFDA experimentson a real hyperspectral remote sensing image data setmdashAVIRIS Indian Pine 1992 are conducted in this sectionThe AVIRIS Indian Pines 1992 data set was gathered byNational Aeronautics and Space Administrator (NASA) withAirborne VisibleInfrared Imaging Spectrometer (AVIRIS)sensor over the Indian Pine test site in northwest Indians inJune 1992 This data set consists of 145 times 145 pixels and 224spectral reflectance bands ranging from 04 120583m to 245 120583mwith a spatial resolution of 20m The Indian Pines scene iscomposed of two-third agriculture and one-third forest orother natural perennial vegetation Some other landmarkssuch as dual lane highways rail line low density housing andsmaller roads are also in this image Since the scene was takenin June some main crops for example soybeans and cornare in their early growth stage with less than 5 coveragewhile the no-till min-till and clean-till indicate the amountof previous crop residue remaining The region map can bereferred to Figure 5(a) The 20 water absorption bands (ie[108ndash112 154ndash167] 224) were discarded

In this section performance of different dimensionreduction methods that is PCA LPP LFDA LDA JGLDAand RP [34] is compared with the proposed PD-LFDA

Table 1 Training set in AVIRIS Indian Pines 1992 database

ID Class Name1 Alfalfa (4618 3913)2 Corn-notill (1428136 952)3 Corn-mintill (83087 1048)4 Corn (23734 1434)5 Grass-pasture (48354 1118)6 Grass-trees (73071 973)7 Grass-pasture-mowed (2817 6071)8 Hay-windrowed (47850 1046)9 Oats (2015 7500)10 Soybean-notill (97286 884)11 Soybean-mintill (2455214 872)12 Soybean-clean (59354 911)13 Wheat (20528 1366)14 Woods (1265102 806)15 Buildings-Grass-Trees-Drives (38639 1010)16 Stone-Steel-Towers (9324 2581)Total 102491029 1004Numerical value in this table refers to number of samples number oftraining samples and pc respectively

Classification accuracy is reported via the concrete classifiersGenerally many dimension reduction research papers adopt119870-nearest neighborhood classifier (KNN) and support vectormachine (SVM) classifier to measure the performance of theextracting features after the dimension reduction where theoverall accuracy and kappa coefficient are detailed in thereports Hereby in this paper we also adopt KNN classifierand SVM classifier for performance measurement In KNNclassifier we select the value of 119870 as 1 5 and 9 so thatthree classifiers based on nearest neighborhoods are formedwhich are called 1NN 5NN and 9NN In SVM classifier weseek a hyperplane to separate classes in kernel-induced spacewhere the linear nonseparable classes in the original featurespace can be separated via kernel trick SVM as a robustand successful classifier has been widely used to evaluatethe performance of multifarious methods in many areas Forsimplicity and convenience we use LIBSVM package [35] forexperiments Accuracy of dimension reduced performancewill be reported by classified performance from SVM clas-sifier In the following schedule the feature subspace will becalculated at the first step from training samples by differentdimensional algorithms Table 1 gives a numerical statisticsof training samples corresponding to each class Then thenew sample will be projected into a low subspace by thetransformedmatrix Finally all the new samples are classifiedby SVM classifier

In this experiment a total of 1029 samples were selectedfor training and the remaining samples are used for testingNote that all the labeled samples in database are unbalancedand the available samples of each category differ dramaticallyThe following strategy is imposed for sample division A fixednumber of 15 samples are randomly selected to form thetraining sample yet the absent samples are randomly selected

Mathematical Problems in Engineering 11

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(b) 5NN

7 9 11 13 15Reduced space

0

01

02

03

04

05

06

07

08

09

1

Ove

rall

accu

racy

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(d) Liner SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF-SVM

Figure 2 Overall accuracy by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

12 Mathematical Problems in Engineering

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(b) 5NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(d) Linear SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF SVM

Figure 3 Kappa coefficient by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

Mathematical Problems in Engineering 13

(a) Pseudo 3-channel color image band = [12 79 140]

1

2

2

2

2

2 2

3

3

3

3

3

4

5

5

5

5

6

6 6 6

7

8

9

10

10

10 10

11

11

11

11

11 12

12 12

13 14

14

14 15

15

16

(b) Ground truth

(c) Distribution of training sample

Figure 4 Illustration of sample partition

from the remaining samples Under this strategy the trainingsamples and testing samples are listed in Table 1

Figure 2 shows the overall accuracy of different dimen-sion reduction methods applied to AVIRIS Indian 92AV3Cdata set The neighborhood of KNN classifier is selected as 15 and 9 respectively which produces three classifiers thatis 1NN 5NN and 9NN Three different kernel functionsare adopted for SVM classifier Then derived classifiers arealso used in this experiment that is linear SVM polynomialSVM and RBFSVM It can be deduced from Figures 2(a)sim2(c) that when the embedding space is greater than 5 pro-posed PD-LFDA performs the best while JGLDA performsthe worst The results produced by RP is slightly better thanJGLDA PCA LDA LPP and RP show the similar classifiedresults under KNN classifierThat is the proposed PD-LFDAoutperforms the others Meantime compared with LFDAthe proposed PD-LFDA leads to 2 more improvementson average Moreover it can be observed from (d) that theclassified accuracy increases steadily as the embedded spaceincreases However LDA demonstrates the highest overallaccuracywhen the reduced features vary to 9 while LFDAhas

the significant improvements when the number of reducedfeatures is greater than 9 This phenomenon of Figure 2(d)indicates the instability of linear SVM Nevertheless thesituation reversed for polynomial SVM and RBF SVM inFigures 2(e) and 2(f) wherein the proposed PD-LFDAwins a little improvement against LFDA and has significantimprovement compared with the others Inspired effects ofproposed PD-LFDA algorithm were achieved in all casesFurthermore Table 2 gives the detailed overall accuracyunder different feature dimensions using 3NN 7NN andRBFSVM classifiers which validates the feasibility of theproposed scheme in this paper

Figure 3 displays the results of kappa coefficient obtainedusing the different dimension reduction algorithms underKNN classifier and SVM classifiers The experimental cir-cumstance of Figure 3 is same as that of Figure 2 We canfind that from these results JGLDA performs the worst inmost cases except in Figure 3(e) The proposed PD-LFDAmethod outperforms the other methods and achieves thehighest kappa numerical value in most cases except usingthe linear SVM as the classifier In fact none of them work

14 Mathematical Problems in Engineering

686864717360

(a) PCA-7NN

687564707607

(b) LPP-7NN

792376538609

(c) LFDA-7NN

739070418130

(d) LDA-7NN

619556516209

(e) JGLDA-7NN

670962947164

(f) RP-7NN

799277288570

(g) PCA-RBFSVM

767473558288

(h) LPP-RBFSVM

837581518822

(i) LFDA-RBFSVM

728769288071

(j) LDA-RBFSVM

582453777121

(k) JGLDA-RBFSVM

763173278365

(l) RP-RBFSVM

Figure 5 Classified map generated by different dimension reduction methods where the overall accuracy kappa coefficient and averageclassification accuracy are listed at the top of each map respectively

steady in the case of linear SVM (Figure 3(d)) Note that thesituation is improved in polynomial SVM where the kappanumerical value of the proposed PD-LFDA is significantlybetter than the others All the achievements demonstrate therobustness of our contribution in PD-LFDA Simultaneouslyit is noticeable that LPP exhibits an average kappa levelThe kappa value gained by LPP is not seriously bad and isnot dramatically good The kappa results produced by RPare approximately the same as LPP A significant advantageof RP is the simple construction and computation wherethe accuracy is closed to LPP More details are summarizedin Table 3 It can be concluded that the kappa coefficientof proposed algorithm is higher than the other approacheswhich is more appropriate for the classification of HSI data

The visual results of all methods are presented in Figures5sim6 where the class labels are converted to pseudocolor

image The pseudocolor image of the hyperspectral imagefrom Indian 92AV3C database is shown in Figure 4(a) Theavailable labeled image which represents the truth groundis illustrated in Figure 4(b) where the labels are made byhuman The training samples are selected from the labeledimage represented as points in the image as shown inFigure 4(c) Each label number (ID) corresponds to eachclass name which is indexed in Table 1 In this experimentall the available labeled samples are used for testing whileapproximate 10 of samples are used for training Thesubspace is fixed to 13 (the number here is only used forreference it can be changed) For each experiment thedimension from original feature space is reduced to theobjective dimensionality thereafter the classified maps areinduced by 7NN classifier and RBF-SVM classifier Theoverall accuracy kappa coefficient and average accuracy are

Mathematical Problems in Engineering 15

837981698991

(a) PD-LFDA-7NN

848682798968

(b) D-LFDA-RBFSVM

Figure 6 Classified map of the proposed method

Table 2 Overall accuracy obtained using different dimensions offeature and different classifiers for Indian Pine Scene database ()

Item Dims7 9 11 13 15

3NNPCA 7007 6996 7026 7035 7087LPP 6575 6871 6835 6695 6546LFDA 7513 7895 7963 7992 8009LDA 6415 6606 6616 6596 6536PD-LFDA 7713 8094 816 8218 8263JGLDA 5405 5602 5704 5642 5751RP 6173 6444 6745 6632 6652

7NNPCA 6902 6902 691 6933 6972LPP 6782 7057 7027 6892 6681LFDA 7419 7777 7827 7863 7862LDA 6615 6909 6899 692 6882PD-LFDA 7736 809 8155 8172 8235JGLDA 5665 5811 5809 5812 5914RP 6194 6481 6696 6613 6603

RBF-SVMPCA 8051 7794 7718 7935 8176LPP 7101 7514 7614 7501 7314LFDA 7826 8187 82 8113 7915LDA 677 6834 6877 6642 6792PD-LFDA 7858 8267 8395 8423 8166JGLDA 5686 5606 5663 5689 5879RP 6957 7282 7441 7416 7543

also included at the top of each map respectively Figure 5displays the classifiedmaps of classicmethods in pseudocolorimages and the classified maps of proposed PD-LFDA arepresented in Figure 6 It can be observed from these mapsthat the best performance is achieved by PD-LFDA when7NN classifier is used in this case the overall accuracyis 8379 the kappa coefficient is 8169 and the averageaccuracy is 8991 Moreover the worst algorithm is JGLDAwhose overall accuracy is 6195 the kappa coefficient

Table 3 Kappa coefficient by different dimension reduction meth-ods and different classifiers applied to Indian Pine Scene database()

Item Dims7 9 11 13 15

3NNPCA 6592 658 6612 6624 6681LPP 6084 6422 6382 6219 605LFDA 7145 7586 7664 7697 7718LDA 5913 6126 6137 6118 6048PD-LFDA 7392 7824 7901 7966 8016JGLDA 4778 4992 5082 5006 5119RP 564 5963 6303 6172 6192

7NNPCA 6466 6466 6475 6501 6545LPP 6304 6629 6585 6429 6185LFDA 7047 7457 7515 7557 7557LDA 6132 646 6453 6478 6431PD-LFDA 7417 782 7893 7913 7985JGLDA 5029 519 5161 5159 5263RP 5638 5992 6237 6135 6124

RBF-SVMPCA 7767 747 7393 7636 7900LPP 6677 715 7264 7115 6921LFDA 7486 7915 7924 7822 7584LDA 629 6362 6427 6153 6323PD-LFDA 7551 8011 8159 8187 7899JGLDA 5109 5066 5133 5147 5369RP 653 6908 7085 706 7203

becomes 5651 and the average accuracy is only 6209Other methods such as PCA LPP and RP can produce thecomparable results and no one can outperform the otherHowever LDAoutperformsPCA LPP andRP yielding betterresult Similar conclusions can be achieved in the groupof RBF-SVM Generally proposed PD-LFDA significantlyoutperforms the rest in this experiment which indicates thecorrectness of improvements in proposed PD-LFDA

16 Mathematical Problems in Engineering

Table 4 Performance of dimension reduction on the whole labeled samples ()

Evaluated item MethodologiesPCA LDA LPP LFDA JGLDA RP PD-LFDA

7NNOverall accuracy 6868 739 6875 7923 6195 6709 8379Kappa coefficient 6471 7041 647 7653 5651 6294 8169Average accuracy 737 813 7607 8609 6209 7164 8991

RBF-SVMOverall accuracy 7992 7287 7674 8375 5824 7631 8486Kappa coefficient 7728 6928 7355 8151 5377 7327 8279Average accuracy 857 8071 8288 8822 7121 8365 8968

Finally details of assessment deduced by 7NN and RBF-SVM is summarized in Table 4 where the correspondingoverall accuracy kappa coefficient and average accuracy ofdifferent methods can be located collectively

5 Conclusions

In this paper we have analyzed local Fisher discriminantanalysis (LFDA) and found its weakness By replacing themaximum distance with local variance for the constructionof weight matrix introducing class prior probability into thecomputation of affinitymatrix an improved LFDA algorithmhas been proposed This novel approach is called PD-LFDAbecause the probability distribution (PD) is applied in LFDAalgorithm The proposed approach essentially can increasethe discriminant ability of transformed features in lowdimensional space The pattern found by the new approachis expected to be more accurate and coincides with thecharacter of HSI data and is conducive to classify HSI dataPD-LFDA has been evaluated on a real removing sensingAVIRIS Indian Pine 92AV3C data set We have compared theperformance of the proposed PD-LFDA with that of PCALPP LFDA LDA JGLDA and RP Both the numerical resultsand visual inspection of class maps have been obtained Inthe experiments KNN classifier and SVM classifier havebeen used We have argued that the proposed PD-LFDAexhibits the best performance and serves as a very effectivedimensionality reduction tool for high dimensional datasuch as hyperspectral image (HSI) data

Appendix

Procedure of Proposed Algorithm

The brief description of the algorithm to perform the pro-posed PD-LFDA method is already presented in Section 3The details of the algorithm are provided in Algorithm 2

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the Research of University ofMacau under Grants no MYRG205(Y1-L4)-FST11-TYY noMYRG187(Y1-L3)-FST11-TYY and no SRG010-FST11-TYYand by the National Natural Science Foundation of Chinaunder Grant no 61273244 This research project was alsosupported by the Science andTechnologyDevelopment Fund(FDCT) of Macau under Contract no 100-2012-A3

References

[1] W Li S Prasad Z Ye J E Fowler and M Cui ldquoLocality-preserving discriminant analysis for hyperspectral image clas-sification using local spatial informationrdquo in Proceedings ofthe 32nd IEEE International Geoscience and Remote SensingSymposium (IGARSS rsquo12) pp 4134ndash4137MunichGermany July2012

[2] HND LeM S Kim andD-H Kim ldquoComparison of singularvalue decomposition and principal component analysis appliedto hyperspectral imaging of biofilmrdquo in Proceedings of the IEEEPhotonics Conference (IPC rsquo12) pp 6ndash7 2012

[3] C K Chui and J Wang ldquoRandomized anisotropic transformfor nonlinear dimensionality reductionrdquo GEMmdashInternationalJournal on Geomathematics vol 1 no 1 pp 23ndash50 2010

[4] T V Bandos L Bruzzone and G Camps-Valls ldquoClassificationof hyperspectral images with regularized linear discriminantanalysisrdquo IEEE Transactions on Geoscience and Remote Sensingvol 47 no 3 pp 862ndash873 2009

[5] D Guangjun Z Yongsheng and J Song ldquoDimensionalityreduction of hyperspectral data based on ISOMAP algorithmrdquoin Proceedings of the 8th International Conference on ElectronicMeasurement and Instruments (ICEMI rsquo07) pp 3935ndash3938Xirsquoan China August 2007

[6] X Luo and M-F Jiang ldquoThe application of manifold learn-ing in dimensionality analysis for hyperspectral imageryrdquo inProceedings of the International Conference on Remote SensingEnvironment and Transportation Engineering (RSETE rsquo11) pp4572ndash4575 June 2011

[7] J Khodr and R Younes ldquoDimensionality reduction on hyper-spectral images a comparative review based on artificial datasrdquoin Proceedings of the 4th International Congress on Image andSignal Processing (CISP rsquo11) vol 4 pp 1875ndash1883 October 2011

Mathematical Problems in Engineering 17

[8] J Wen Z Tian H She and W Yan ldquoFeature extraction ofhyperspectral images based on preserving neighborhood dis-criminant embeddingrdquo in Proceedings of the 2nd InternationalConference on Image Analysis and Signal Processing (IASP rsquo10)pp 257ndash262 Zhejiang China April 2010

[9] Y-R Yeh S-Y Huang and Y-J Lee ldquoNonlinear dimensionreduction with kernel sliced inverse regressionrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 21 no 11 pp1590ndash1603 2009

[10] J He L Zhang Q Wang and Z Li ldquoUsing diffusion geometriccoordinates for hyperspectral imagery representationrdquo IEEEGeoscience and Remote Sensing Letters vol 6 no 4 pp 767ndash7712009

[11] J Peng P Zhang and N Riedel ldquoDiscriminant learninganalysisrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 38 no 6 pp 1614ndash1625 2008

[12] F S Tsai and K L Chan ldquoDimensionality reduction techniquesfor data explorationrdquo in Proceedings of the 6th InternationalConference on Information Communications and Signal Process-ing pp 1ndash5 December 2007

[13] M D Farrell Jr and R M Mersereau ldquoOn the impact of PCAdimension reduction for hyperspectral detection of difficulttargetsrdquo IEEE Geoscience and Remote Sensing Letters vol 2 no2 pp 192ndash195 2005

[14] S Prasad and L M Bruce ldquoLimitations of principal com-ponents analysis for hyperspectral target recognitionrdquo IEEEGeoscience and Remote Sensing Letters vol 5 no 4 pp 625ndash629 2008

[15] J Yu Q Tian T Rui and T S Huang ldquoIntegrating discriminantand descriptive information for dimension reduction and clas-sificationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 17 no 3 pp 372ndash377 2007

[16] J Kong S Wang J Wang L Ma B Fu and Y Lu ldquoAnovel approach for face recognition based on supervisedlocality preserving projection and maximummargin criterionrdquoin Proceedings of the International Conference on ComputerEngineering and Technology (ICCET rsquo09) vol 1 pp 419ndash423Singapore January 2009

[17] M Loog and R P W Duin ldquoLinear dimensionality reductionvia a heteroscedastic extension of LDA the Chernoff criterionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 26 no 6 pp 732ndash739 2004

[18] A C Jensen A Berge and A S Solberg ldquoRegressionapproaches to small sample inverse covariance matrix estima-tion for hyperspectral image classificationrdquo IEEE TransactionsonGeoscience andRemote Sensing vol 46 no 10 pp 2814ndash28222008

[19] J Jin BWang and L Zhang ldquoA novel approach based on Fisherdiscriminant null space for decomposition of mixed pixels inhyperspectral imageryrdquo IEEE Geoscience and Remote SensingLetters vol 7 no 4 pp 699ndash703 2010

[20] J Wen Z Tian X Liu and W Lin ldquoNeighborhood preservingorthogonal pnmf feature extraction for hyperspectral imageclassificationrdquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 6 no 2 pp 759ndash7682013

[21] Y Ren G Zhang G Yu and X Li ldquoLocal and global structurepreserving based feature selectionrdquoNeurocomputing vol 89 pp147ndash157 2012

[22] Z Fan Y Xu and D Zhang ldquoLocal linear discriminant analysisframework using sample neighborsrdquo IEEE Transactions onNeural Networks vol 22 no 7 pp 1119ndash1132 2011

[23] Y Wang S Huang D Liu and B Wang ldquoResearch advanceon band selection-based dimension reduction of hyperspectralremote sensing imagesrdquo in Proceedings of the 2nd InternationalConference on Remote Sensing Environment and TransportationEngineering (RSETE rsquo12) pp 1ndash4 IEEE Nanjing China June2012

[24] B Waske S van der Linden J A Benediktsson A Rabe andP Hostert ldquoSensitivity of support vector machines to randomfeature selection in classification of hyperspectral datardquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 7pp 2880ndash2889 2010

[25] M Sugiyama ldquoLocal fisher discriminant analysis for superviseddimensionality reductionrdquo in Proceedings of the 23rd Interna-tional Conference onMachine Learning (ICML rsquo06) pp 905ndash912ACM June 2006

[26] Y Cheng ldquoMean shift mode seeking and clusteringrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol17 no 8 pp 790ndash799 1995

[27] W Li S Prasad J E Fowler and L M Bruce ldquoLocality-preserving dimensionality reduction and classification forhyperspectral image analysisrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 50 no 4 pp 1185ndash1198 2012

[28] L Zelnik-Manor and P Perona ldquoSelf-tuning spectral cluster-ingrdquo in Proceedings of the 18th Annual Conference on NeuralInformation Processing Systems pp 1601ndash1608 December 2004

[29] W K Wong and H T Zhao ldquoSupervised optimal localitypreserving projectionrdquo Pattern Recognition vol 45 no 1 pp186ndash197 2012

[30] X He and P Niyogi ldquoLocality preserving projectionsrdquo inAdvances in Neural Information Processing Systems 16 SThrunL Saul and B Scholkopf Eds MIT Press Cambridge MassUSA 2004

[31] H Wang S Chen Z Hu and W Zheng ldquoLocality-preservedmaximum information projectionrdquo IEEE Transactions on Neu-ral Networks vol 19 no 4 pp 571ndash585 2008

[32] M Sugiyama ldquoDimensionality reduction ofmultimodal labeleddata by local fisher discriminant analysisrdquo Journal of MachineLearning Research vol 8 pp 1027ndash1061 2007

[33] S Yan D Xu B Zhang H J Zhang Q Yang and S Lin ldquoGraphembedding and extensions a general framework for dimen-sionality reductionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 40ndash51 2007

[34] C K Chui and J Wang ldquoDimensionality reduction of hyper-spectral imagery data for feature classificationrdquo in Handbookof Geomathematics pp 1005ndash1047 Springer Heidelberg Ger-many 2010

[35] C-C Chang and C-J Lin ldquoLIBSVM a Library for supportvector machinesrdquo ACM Transactions on Intelligent Systems andTechnology vol 2 no 3 article 27 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article Subspace Learning via Local Probability ...downloads.hindawi.com/journals/mpe/2015/145136.pdf · and applicable toolkits were engendered one a er another. Hyperspectral

10 Mathematical Problems in Engineering

of which

(119894 119895) = 1 if (119894 119895) isin 119875(119896) (119897

119895) or (119894 119895) isin 119875(119896) (119897

119894)

0 otherwise(42)

Note that 119878119894and 119878

119890are corresponding to ldquowithin-scatter

matrixrdquo and ldquobetween-scatter matrixrdquo of the traditional LDAalternatively The optimal solution of MFA can be achievedby solving the following minimization problem That is

= arg119879

min 119879119879119883(119863 minus119882)119883

119879119879

119879119879119883(119863 minus )119883119879119879

(43)

We know that (43) is also a general eigenvalue decompositionproblem Let 119879PCA indicate the transformation matrix fromthe original space to PCA subspace with certain energyremaining and then the final projection of MFA is output as

119879MFA = 119879PCA (44)

As can be seenMFA constructs twoweightedmatrices119882and according to the intraclass compactness and interclassseparability In LFDA and PD-LFDA only one affinity isconstructed The difference lies in that the ldquoweightrdquo in LFDAand PD-LFDA is in the range of [0 1] according to the levelof difference Yet MFA distributes the same weight to its 119870-nearest neighborhoodsThe optimal solution ofMFA LFDAand PD-LFDA can be attributed to a general eigenvaluedecomposition problem Hence the idea of MFA LFDA andPD-LFDA is approximately similar in a certain interpretationRelationship with other methodologies can be analyzed in ananalogous way

4 Experimental Results

To illustrate the performance of PD-LFDA experimentson a real hyperspectral remote sensing image data setmdashAVIRIS Indian Pine 1992 are conducted in this sectionThe AVIRIS Indian Pines 1992 data set was gathered byNational Aeronautics and Space Administrator (NASA) withAirborne VisibleInfrared Imaging Spectrometer (AVIRIS)sensor over the Indian Pine test site in northwest Indians inJune 1992 This data set consists of 145 times 145 pixels and 224spectral reflectance bands ranging from 04 120583m to 245 120583mwith a spatial resolution of 20m The Indian Pines scene iscomposed of two-third agriculture and one-third forest orother natural perennial vegetation Some other landmarkssuch as dual lane highways rail line low density housing andsmaller roads are also in this image Since the scene was takenin June some main crops for example soybeans and cornare in their early growth stage with less than 5 coveragewhile the no-till min-till and clean-till indicate the amountof previous crop residue remaining The region map can bereferred to Figure 5(a) The 20 water absorption bands (ie[108ndash112 154ndash167] 224) were discarded

In this section performance of different dimensionreduction methods that is PCA LPP LFDA LDA JGLDAand RP [34] is compared with the proposed PD-LFDA

Table 1 Training set in AVIRIS Indian Pines 1992 database

ID Class Name1 Alfalfa (4618 3913)2 Corn-notill (1428136 952)3 Corn-mintill (83087 1048)4 Corn (23734 1434)5 Grass-pasture (48354 1118)6 Grass-trees (73071 973)7 Grass-pasture-mowed (2817 6071)8 Hay-windrowed (47850 1046)9 Oats (2015 7500)10 Soybean-notill (97286 884)11 Soybean-mintill (2455214 872)12 Soybean-clean (59354 911)13 Wheat (20528 1366)14 Woods (1265102 806)15 Buildings-Grass-Trees-Drives (38639 1010)16 Stone-Steel-Towers (9324 2581)Total 102491029 1004Numerical value in this table refers to number of samples number oftraining samples and pc respectively

Classification accuracy is reported via the concrete classifiersGenerally many dimension reduction research papers adopt119870-nearest neighborhood classifier (KNN) and support vectormachine (SVM) classifier to measure the performance of theextracting features after the dimension reduction where theoverall accuracy and kappa coefficient are detailed in thereports Hereby in this paper we also adopt KNN classifierand SVM classifier for performance measurement In KNNclassifier we select the value of 119870 as 1 5 and 9 so thatthree classifiers based on nearest neighborhoods are formedwhich are called 1NN 5NN and 9NN In SVM classifier weseek a hyperplane to separate classes in kernel-induced spacewhere the linear nonseparable classes in the original featurespace can be separated via kernel trick SVM as a robustand successful classifier has been widely used to evaluatethe performance of multifarious methods in many areas Forsimplicity and convenience we use LIBSVM package [35] forexperiments Accuracy of dimension reduced performancewill be reported by classified performance from SVM clas-sifier In the following schedule the feature subspace will becalculated at the first step from training samples by differentdimensional algorithms Table 1 gives a numerical statisticsof training samples corresponding to each class Then thenew sample will be projected into a low subspace by thetransformedmatrix Finally all the new samples are classifiedby SVM classifier

In this experiment a total of 1029 samples were selectedfor training and the remaining samples are used for testingNote that all the labeled samples in database are unbalancedand the available samples of each category differ dramaticallyThe following strategy is imposed for sample division A fixednumber of 15 samples are randomly selected to form thetraining sample yet the absent samples are randomly selected

Mathematical Problems in Engineering 11

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(b) 5NN

7 9 11 13 15Reduced space

0

01

02

03

04

05

06

07

08

09

1

Ove

rall

accu

racy

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(d) Liner SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF-SVM

Figure 2 Overall accuracy by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

12 Mathematical Problems in Engineering

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(b) 5NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(d) Linear SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF SVM

Figure 3 Kappa coefficient by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

Mathematical Problems in Engineering 13

(a) Pseudo 3-channel color image band = [12 79 140]

1

2

2

2

2

2 2

3

3

3

3

3

4

5

5

5

5

6

6 6 6

7

8

9

10

10

10 10

11

11

11

11

11 12

12 12

13 14

14

14 15

15

16

(b) Ground truth

(c) Distribution of training sample

Figure 4 Illustration of sample partition

from the remaining samples Under this strategy the trainingsamples and testing samples are listed in Table 1

Figure 2 shows the overall accuracy of different dimen-sion reduction methods applied to AVIRIS Indian 92AV3Cdata set The neighborhood of KNN classifier is selected as 15 and 9 respectively which produces three classifiers thatis 1NN 5NN and 9NN Three different kernel functionsare adopted for SVM classifier Then derived classifiers arealso used in this experiment that is linear SVM polynomialSVM and RBFSVM It can be deduced from Figures 2(a)sim2(c) that when the embedding space is greater than 5 pro-posed PD-LFDA performs the best while JGLDA performsthe worst The results produced by RP is slightly better thanJGLDA PCA LDA LPP and RP show the similar classifiedresults under KNN classifierThat is the proposed PD-LFDAoutperforms the others Meantime compared with LFDAthe proposed PD-LFDA leads to 2 more improvementson average Moreover it can be observed from (d) that theclassified accuracy increases steadily as the embedded spaceincreases However LDA demonstrates the highest overallaccuracywhen the reduced features vary to 9 while LFDAhas

the significant improvements when the number of reducedfeatures is greater than 9 This phenomenon of Figure 2(d)indicates the instability of linear SVM Nevertheless thesituation reversed for polynomial SVM and RBF SVM inFigures 2(e) and 2(f) wherein the proposed PD-LFDAwins a little improvement against LFDA and has significantimprovement compared with the others Inspired effects ofproposed PD-LFDA algorithm were achieved in all casesFurthermore Table 2 gives the detailed overall accuracyunder different feature dimensions using 3NN 7NN andRBFSVM classifiers which validates the feasibility of theproposed scheme in this paper

Figure 3 displays the results of kappa coefficient obtainedusing the different dimension reduction algorithms underKNN classifier and SVM classifiers The experimental cir-cumstance of Figure 3 is same as that of Figure 2 We canfind that from these results JGLDA performs the worst inmost cases except in Figure 3(e) The proposed PD-LFDAmethod outperforms the other methods and achieves thehighest kappa numerical value in most cases except usingthe linear SVM as the classifier In fact none of them work

14 Mathematical Problems in Engineering

686864717360

(a) PCA-7NN

687564707607

(b) LPP-7NN

792376538609

(c) LFDA-7NN

739070418130

(d) LDA-7NN

619556516209

(e) JGLDA-7NN

670962947164

(f) RP-7NN

799277288570

(g) PCA-RBFSVM

767473558288

(h) LPP-RBFSVM

837581518822

(i) LFDA-RBFSVM

728769288071

(j) LDA-RBFSVM

582453777121

(k) JGLDA-RBFSVM

763173278365

(l) RP-RBFSVM

Figure 5 Classified map generated by different dimension reduction methods where the overall accuracy kappa coefficient and averageclassification accuracy are listed at the top of each map respectively

steady in the case of linear SVM (Figure 3(d)) Note that thesituation is improved in polynomial SVM where the kappanumerical value of the proposed PD-LFDA is significantlybetter than the others All the achievements demonstrate therobustness of our contribution in PD-LFDA Simultaneouslyit is noticeable that LPP exhibits an average kappa levelThe kappa value gained by LPP is not seriously bad and isnot dramatically good The kappa results produced by RPare approximately the same as LPP A significant advantageof RP is the simple construction and computation wherethe accuracy is closed to LPP More details are summarizedin Table 3 It can be concluded that the kappa coefficientof proposed algorithm is higher than the other approacheswhich is more appropriate for the classification of HSI data

The visual results of all methods are presented in Figures5sim6 where the class labels are converted to pseudocolor

image The pseudocolor image of the hyperspectral imagefrom Indian 92AV3C database is shown in Figure 4(a) Theavailable labeled image which represents the truth groundis illustrated in Figure 4(b) where the labels are made byhuman The training samples are selected from the labeledimage represented as points in the image as shown inFigure 4(c) Each label number (ID) corresponds to eachclass name which is indexed in Table 1 In this experimentall the available labeled samples are used for testing whileapproximate 10 of samples are used for training Thesubspace is fixed to 13 (the number here is only used forreference it can be changed) For each experiment thedimension from original feature space is reduced to theobjective dimensionality thereafter the classified maps areinduced by 7NN classifier and RBF-SVM classifier Theoverall accuracy kappa coefficient and average accuracy are

Mathematical Problems in Engineering 15

837981698991

(a) PD-LFDA-7NN

848682798968

(b) D-LFDA-RBFSVM

Figure 6 Classified map of the proposed method

Table 2 Overall accuracy obtained using different dimensions offeature and different classifiers for Indian Pine Scene database ()

Item Dims7 9 11 13 15

3NNPCA 7007 6996 7026 7035 7087LPP 6575 6871 6835 6695 6546LFDA 7513 7895 7963 7992 8009LDA 6415 6606 6616 6596 6536PD-LFDA 7713 8094 816 8218 8263JGLDA 5405 5602 5704 5642 5751RP 6173 6444 6745 6632 6652

7NNPCA 6902 6902 691 6933 6972LPP 6782 7057 7027 6892 6681LFDA 7419 7777 7827 7863 7862LDA 6615 6909 6899 692 6882PD-LFDA 7736 809 8155 8172 8235JGLDA 5665 5811 5809 5812 5914RP 6194 6481 6696 6613 6603

RBF-SVMPCA 8051 7794 7718 7935 8176LPP 7101 7514 7614 7501 7314LFDA 7826 8187 82 8113 7915LDA 677 6834 6877 6642 6792PD-LFDA 7858 8267 8395 8423 8166JGLDA 5686 5606 5663 5689 5879RP 6957 7282 7441 7416 7543

also included at the top of each map respectively Figure 5displays the classifiedmaps of classicmethods in pseudocolorimages and the classified maps of proposed PD-LFDA arepresented in Figure 6 It can be observed from these mapsthat the best performance is achieved by PD-LFDA when7NN classifier is used in this case the overall accuracyis 8379 the kappa coefficient is 8169 and the averageaccuracy is 8991 Moreover the worst algorithm is JGLDAwhose overall accuracy is 6195 the kappa coefficient

Table 3 Kappa coefficient by different dimension reduction meth-ods and different classifiers applied to Indian Pine Scene database()

Item Dims7 9 11 13 15

3NNPCA 6592 658 6612 6624 6681LPP 6084 6422 6382 6219 605LFDA 7145 7586 7664 7697 7718LDA 5913 6126 6137 6118 6048PD-LFDA 7392 7824 7901 7966 8016JGLDA 4778 4992 5082 5006 5119RP 564 5963 6303 6172 6192

7NNPCA 6466 6466 6475 6501 6545LPP 6304 6629 6585 6429 6185LFDA 7047 7457 7515 7557 7557LDA 6132 646 6453 6478 6431PD-LFDA 7417 782 7893 7913 7985JGLDA 5029 519 5161 5159 5263RP 5638 5992 6237 6135 6124

RBF-SVMPCA 7767 747 7393 7636 7900LPP 6677 715 7264 7115 6921LFDA 7486 7915 7924 7822 7584LDA 629 6362 6427 6153 6323PD-LFDA 7551 8011 8159 8187 7899JGLDA 5109 5066 5133 5147 5369RP 653 6908 7085 706 7203

becomes 5651 and the average accuracy is only 6209Other methods such as PCA LPP and RP can produce thecomparable results and no one can outperform the otherHowever LDAoutperformsPCA LPP andRP yielding betterresult Similar conclusions can be achieved in the groupof RBF-SVM Generally proposed PD-LFDA significantlyoutperforms the rest in this experiment which indicates thecorrectness of improvements in proposed PD-LFDA

16 Mathematical Problems in Engineering

Table 4 Performance of dimension reduction on the whole labeled samples ()

Evaluated item MethodologiesPCA LDA LPP LFDA JGLDA RP PD-LFDA

7NNOverall accuracy 6868 739 6875 7923 6195 6709 8379Kappa coefficient 6471 7041 647 7653 5651 6294 8169Average accuracy 737 813 7607 8609 6209 7164 8991

RBF-SVMOverall accuracy 7992 7287 7674 8375 5824 7631 8486Kappa coefficient 7728 6928 7355 8151 5377 7327 8279Average accuracy 857 8071 8288 8822 7121 8365 8968

Finally details of assessment deduced by 7NN and RBF-SVM is summarized in Table 4 where the correspondingoverall accuracy kappa coefficient and average accuracy ofdifferent methods can be located collectively

5 Conclusions

In this paper we have analyzed local Fisher discriminantanalysis (LFDA) and found its weakness By replacing themaximum distance with local variance for the constructionof weight matrix introducing class prior probability into thecomputation of affinitymatrix an improved LFDA algorithmhas been proposed This novel approach is called PD-LFDAbecause the probability distribution (PD) is applied in LFDAalgorithm The proposed approach essentially can increasethe discriminant ability of transformed features in lowdimensional space The pattern found by the new approachis expected to be more accurate and coincides with thecharacter of HSI data and is conducive to classify HSI dataPD-LFDA has been evaluated on a real removing sensingAVIRIS Indian Pine 92AV3C data set We have compared theperformance of the proposed PD-LFDA with that of PCALPP LFDA LDA JGLDA and RP Both the numerical resultsand visual inspection of class maps have been obtained Inthe experiments KNN classifier and SVM classifier havebeen used We have argued that the proposed PD-LFDAexhibits the best performance and serves as a very effectivedimensionality reduction tool for high dimensional datasuch as hyperspectral image (HSI) data

Appendix

Procedure of Proposed Algorithm

The brief description of the algorithm to perform the pro-posed PD-LFDA method is already presented in Section 3The details of the algorithm are provided in Algorithm 2

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the Research of University ofMacau under Grants no MYRG205(Y1-L4)-FST11-TYY noMYRG187(Y1-L3)-FST11-TYY and no SRG010-FST11-TYYand by the National Natural Science Foundation of Chinaunder Grant no 61273244 This research project was alsosupported by the Science andTechnologyDevelopment Fund(FDCT) of Macau under Contract no 100-2012-A3

References

[1] W Li S Prasad Z Ye J E Fowler and M Cui ldquoLocality-preserving discriminant analysis for hyperspectral image clas-sification using local spatial informationrdquo in Proceedings ofthe 32nd IEEE International Geoscience and Remote SensingSymposium (IGARSS rsquo12) pp 4134ndash4137MunichGermany July2012

[2] HND LeM S Kim andD-H Kim ldquoComparison of singularvalue decomposition and principal component analysis appliedto hyperspectral imaging of biofilmrdquo in Proceedings of the IEEEPhotonics Conference (IPC rsquo12) pp 6ndash7 2012

[3] C K Chui and J Wang ldquoRandomized anisotropic transformfor nonlinear dimensionality reductionrdquo GEMmdashInternationalJournal on Geomathematics vol 1 no 1 pp 23ndash50 2010

[4] T V Bandos L Bruzzone and G Camps-Valls ldquoClassificationof hyperspectral images with regularized linear discriminantanalysisrdquo IEEE Transactions on Geoscience and Remote Sensingvol 47 no 3 pp 862ndash873 2009

[5] D Guangjun Z Yongsheng and J Song ldquoDimensionalityreduction of hyperspectral data based on ISOMAP algorithmrdquoin Proceedings of the 8th International Conference on ElectronicMeasurement and Instruments (ICEMI rsquo07) pp 3935ndash3938Xirsquoan China August 2007

[6] X Luo and M-F Jiang ldquoThe application of manifold learn-ing in dimensionality analysis for hyperspectral imageryrdquo inProceedings of the International Conference on Remote SensingEnvironment and Transportation Engineering (RSETE rsquo11) pp4572ndash4575 June 2011

[7] J Khodr and R Younes ldquoDimensionality reduction on hyper-spectral images a comparative review based on artificial datasrdquoin Proceedings of the 4th International Congress on Image andSignal Processing (CISP rsquo11) vol 4 pp 1875ndash1883 October 2011

Mathematical Problems in Engineering 17

[8] J Wen Z Tian H She and W Yan ldquoFeature extraction ofhyperspectral images based on preserving neighborhood dis-criminant embeddingrdquo in Proceedings of the 2nd InternationalConference on Image Analysis and Signal Processing (IASP rsquo10)pp 257ndash262 Zhejiang China April 2010

[9] Y-R Yeh S-Y Huang and Y-J Lee ldquoNonlinear dimensionreduction with kernel sliced inverse regressionrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 21 no 11 pp1590ndash1603 2009

[10] J He L Zhang Q Wang and Z Li ldquoUsing diffusion geometriccoordinates for hyperspectral imagery representationrdquo IEEEGeoscience and Remote Sensing Letters vol 6 no 4 pp 767ndash7712009

[11] J Peng P Zhang and N Riedel ldquoDiscriminant learninganalysisrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 38 no 6 pp 1614ndash1625 2008

[12] F S Tsai and K L Chan ldquoDimensionality reduction techniquesfor data explorationrdquo in Proceedings of the 6th InternationalConference on Information Communications and Signal Process-ing pp 1ndash5 December 2007

[13] M D Farrell Jr and R M Mersereau ldquoOn the impact of PCAdimension reduction for hyperspectral detection of difficulttargetsrdquo IEEE Geoscience and Remote Sensing Letters vol 2 no2 pp 192ndash195 2005

[14] S Prasad and L M Bruce ldquoLimitations of principal com-ponents analysis for hyperspectral target recognitionrdquo IEEEGeoscience and Remote Sensing Letters vol 5 no 4 pp 625ndash629 2008

[15] J Yu Q Tian T Rui and T S Huang ldquoIntegrating discriminantand descriptive information for dimension reduction and clas-sificationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 17 no 3 pp 372ndash377 2007

[16] J Kong S Wang J Wang L Ma B Fu and Y Lu ldquoAnovel approach for face recognition based on supervisedlocality preserving projection and maximummargin criterionrdquoin Proceedings of the International Conference on ComputerEngineering and Technology (ICCET rsquo09) vol 1 pp 419ndash423Singapore January 2009

[17] M Loog and R P W Duin ldquoLinear dimensionality reductionvia a heteroscedastic extension of LDA the Chernoff criterionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 26 no 6 pp 732ndash739 2004

[18] A C Jensen A Berge and A S Solberg ldquoRegressionapproaches to small sample inverse covariance matrix estima-tion for hyperspectral image classificationrdquo IEEE TransactionsonGeoscience andRemote Sensing vol 46 no 10 pp 2814ndash28222008

[19] J Jin BWang and L Zhang ldquoA novel approach based on Fisherdiscriminant null space for decomposition of mixed pixels inhyperspectral imageryrdquo IEEE Geoscience and Remote SensingLetters vol 7 no 4 pp 699ndash703 2010

[20] J Wen Z Tian X Liu and W Lin ldquoNeighborhood preservingorthogonal pnmf feature extraction for hyperspectral imageclassificationrdquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 6 no 2 pp 759ndash7682013

[21] Y Ren G Zhang G Yu and X Li ldquoLocal and global structurepreserving based feature selectionrdquoNeurocomputing vol 89 pp147ndash157 2012

[22] Z Fan Y Xu and D Zhang ldquoLocal linear discriminant analysisframework using sample neighborsrdquo IEEE Transactions onNeural Networks vol 22 no 7 pp 1119ndash1132 2011

[23] Y Wang S Huang D Liu and B Wang ldquoResearch advanceon band selection-based dimension reduction of hyperspectralremote sensing imagesrdquo in Proceedings of the 2nd InternationalConference on Remote Sensing Environment and TransportationEngineering (RSETE rsquo12) pp 1ndash4 IEEE Nanjing China June2012

[24] B Waske S van der Linden J A Benediktsson A Rabe andP Hostert ldquoSensitivity of support vector machines to randomfeature selection in classification of hyperspectral datardquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 7pp 2880ndash2889 2010

[25] M Sugiyama ldquoLocal fisher discriminant analysis for superviseddimensionality reductionrdquo in Proceedings of the 23rd Interna-tional Conference onMachine Learning (ICML rsquo06) pp 905ndash912ACM June 2006

[26] Y Cheng ldquoMean shift mode seeking and clusteringrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol17 no 8 pp 790ndash799 1995

[27] W Li S Prasad J E Fowler and L M Bruce ldquoLocality-preserving dimensionality reduction and classification forhyperspectral image analysisrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 50 no 4 pp 1185ndash1198 2012

[28] L Zelnik-Manor and P Perona ldquoSelf-tuning spectral cluster-ingrdquo in Proceedings of the 18th Annual Conference on NeuralInformation Processing Systems pp 1601ndash1608 December 2004

[29] W K Wong and H T Zhao ldquoSupervised optimal localitypreserving projectionrdquo Pattern Recognition vol 45 no 1 pp186ndash197 2012

[30] X He and P Niyogi ldquoLocality preserving projectionsrdquo inAdvances in Neural Information Processing Systems 16 SThrunL Saul and B Scholkopf Eds MIT Press Cambridge MassUSA 2004

[31] H Wang S Chen Z Hu and W Zheng ldquoLocality-preservedmaximum information projectionrdquo IEEE Transactions on Neu-ral Networks vol 19 no 4 pp 571ndash585 2008

[32] M Sugiyama ldquoDimensionality reduction ofmultimodal labeleddata by local fisher discriminant analysisrdquo Journal of MachineLearning Research vol 8 pp 1027ndash1061 2007

[33] S Yan D Xu B Zhang H J Zhang Q Yang and S Lin ldquoGraphembedding and extensions a general framework for dimen-sionality reductionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 40ndash51 2007

[34] C K Chui and J Wang ldquoDimensionality reduction of hyper-spectral imagery data for feature classificationrdquo in Handbookof Geomathematics pp 1005ndash1047 Springer Heidelberg Ger-many 2010

[35] C-C Chang and C-J Lin ldquoLIBSVM a Library for supportvector machinesrdquo ACM Transactions on Intelligent Systems andTechnology vol 2 no 3 article 27 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article Subspace Learning via Local Probability ...downloads.hindawi.com/journals/mpe/2015/145136.pdf · and applicable toolkits were engendered one a er another. Hyperspectral

Mathematical Problems in Engineering 11

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(b) 5NN

7 9 11 13 15Reduced space

0

01

02

03

04

05

06

07

08

09

1

Ove

rall

accu

racy

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

(d) Liner SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Ove

rall

accu

racy

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF-SVM

Figure 2 Overall accuracy by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

12 Mathematical Problems in Engineering

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(b) 5NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(d) Linear SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF SVM

Figure 3 Kappa coefficient by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

Mathematical Problems in Engineering 13

(a) Pseudo 3-channel color image band = [12 79 140]

1

2

2

2

2

2 2

3

3

3

3

3

4

5

5

5

5

6

6 6 6

7

8

9

10

10

10 10

11

11

11

11

11 12

12 12

13 14

14

14 15

15

16

(b) Ground truth

(c) Distribution of training sample

Figure 4 Illustration of sample partition

from the remaining samples Under this strategy the trainingsamples and testing samples are listed in Table 1

Figure 2 shows the overall accuracy of different dimen-sion reduction methods applied to AVIRIS Indian 92AV3Cdata set The neighborhood of KNN classifier is selected as 15 and 9 respectively which produces three classifiers thatis 1NN 5NN and 9NN Three different kernel functionsare adopted for SVM classifier Then derived classifiers arealso used in this experiment that is linear SVM polynomialSVM and RBFSVM It can be deduced from Figures 2(a)sim2(c) that when the embedding space is greater than 5 pro-posed PD-LFDA performs the best while JGLDA performsthe worst The results produced by RP is slightly better thanJGLDA PCA LDA LPP and RP show the similar classifiedresults under KNN classifierThat is the proposed PD-LFDAoutperforms the others Meantime compared with LFDAthe proposed PD-LFDA leads to 2 more improvementson average Moreover it can be observed from (d) that theclassified accuracy increases steadily as the embedded spaceincreases However LDA demonstrates the highest overallaccuracywhen the reduced features vary to 9 while LFDAhas

the significant improvements when the number of reducedfeatures is greater than 9 This phenomenon of Figure 2(d)indicates the instability of linear SVM Nevertheless thesituation reversed for polynomial SVM and RBF SVM inFigures 2(e) and 2(f) wherein the proposed PD-LFDAwins a little improvement against LFDA and has significantimprovement compared with the others Inspired effects ofproposed PD-LFDA algorithm were achieved in all casesFurthermore Table 2 gives the detailed overall accuracyunder different feature dimensions using 3NN 7NN andRBFSVM classifiers which validates the feasibility of theproposed scheme in this paper

Figure 3 displays the results of kappa coefficient obtainedusing the different dimension reduction algorithms underKNN classifier and SVM classifiers The experimental cir-cumstance of Figure 3 is same as that of Figure 2 We canfind that from these results JGLDA performs the worst inmost cases except in Figure 3(e) The proposed PD-LFDAmethod outperforms the other methods and achieves thehighest kappa numerical value in most cases except usingthe linear SVM as the classifier In fact none of them work

14 Mathematical Problems in Engineering

686864717360

(a) PCA-7NN

687564707607

(b) LPP-7NN

792376538609

(c) LFDA-7NN

739070418130

(d) LDA-7NN

619556516209

(e) JGLDA-7NN

670962947164

(f) RP-7NN

799277288570

(g) PCA-RBFSVM

767473558288

(h) LPP-RBFSVM

837581518822

(i) LFDA-RBFSVM

728769288071

(j) LDA-RBFSVM

582453777121

(k) JGLDA-RBFSVM

763173278365

(l) RP-RBFSVM

Figure 5 Classified map generated by different dimension reduction methods where the overall accuracy kappa coefficient and averageclassification accuracy are listed at the top of each map respectively

steady in the case of linear SVM (Figure 3(d)) Note that thesituation is improved in polynomial SVM where the kappanumerical value of the proposed PD-LFDA is significantlybetter than the others All the achievements demonstrate therobustness of our contribution in PD-LFDA Simultaneouslyit is noticeable that LPP exhibits an average kappa levelThe kappa value gained by LPP is not seriously bad and isnot dramatically good The kappa results produced by RPare approximately the same as LPP A significant advantageof RP is the simple construction and computation wherethe accuracy is closed to LPP More details are summarizedin Table 3 It can be concluded that the kappa coefficientof proposed algorithm is higher than the other approacheswhich is more appropriate for the classification of HSI data

The visual results of all methods are presented in Figures5sim6 where the class labels are converted to pseudocolor

image The pseudocolor image of the hyperspectral imagefrom Indian 92AV3C database is shown in Figure 4(a) Theavailable labeled image which represents the truth groundis illustrated in Figure 4(b) where the labels are made byhuman The training samples are selected from the labeledimage represented as points in the image as shown inFigure 4(c) Each label number (ID) corresponds to eachclass name which is indexed in Table 1 In this experimentall the available labeled samples are used for testing whileapproximate 10 of samples are used for training Thesubspace is fixed to 13 (the number here is only used forreference it can be changed) For each experiment thedimension from original feature space is reduced to theobjective dimensionality thereafter the classified maps areinduced by 7NN classifier and RBF-SVM classifier Theoverall accuracy kappa coefficient and average accuracy are

Mathematical Problems in Engineering 15

837981698991

(a) PD-LFDA-7NN

848682798968

(b) D-LFDA-RBFSVM

Figure 6 Classified map of the proposed method

Table 2 Overall accuracy obtained using different dimensions offeature and different classifiers for Indian Pine Scene database ()

Item Dims7 9 11 13 15

3NNPCA 7007 6996 7026 7035 7087LPP 6575 6871 6835 6695 6546LFDA 7513 7895 7963 7992 8009LDA 6415 6606 6616 6596 6536PD-LFDA 7713 8094 816 8218 8263JGLDA 5405 5602 5704 5642 5751RP 6173 6444 6745 6632 6652

7NNPCA 6902 6902 691 6933 6972LPP 6782 7057 7027 6892 6681LFDA 7419 7777 7827 7863 7862LDA 6615 6909 6899 692 6882PD-LFDA 7736 809 8155 8172 8235JGLDA 5665 5811 5809 5812 5914RP 6194 6481 6696 6613 6603

RBF-SVMPCA 8051 7794 7718 7935 8176LPP 7101 7514 7614 7501 7314LFDA 7826 8187 82 8113 7915LDA 677 6834 6877 6642 6792PD-LFDA 7858 8267 8395 8423 8166JGLDA 5686 5606 5663 5689 5879RP 6957 7282 7441 7416 7543

also included at the top of each map respectively Figure 5displays the classifiedmaps of classicmethods in pseudocolorimages and the classified maps of proposed PD-LFDA arepresented in Figure 6 It can be observed from these mapsthat the best performance is achieved by PD-LFDA when7NN classifier is used in this case the overall accuracyis 8379 the kappa coefficient is 8169 and the averageaccuracy is 8991 Moreover the worst algorithm is JGLDAwhose overall accuracy is 6195 the kappa coefficient

Table 3 Kappa coefficient by different dimension reduction meth-ods and different classifiers applied to Indian Pine Scene database()

Item Dims7 9 11 13 15

3NNPCA 6592 658 6612 6624 6681LPP 6084 6422 6382 6219 605LFDA 7145 7586 7664 7697 7718LDA 5913 6126 6137 6118 6048PD-LFDA 7392 7824 7901 7966 8016JGLDA 4778 4992 5082 5006 5119RP 564 5963 6303 6172 6192

7NNPCA 6466 6466 6475 6501 6545LPP 6304 6629 6585 6429 6185LFDA 7047 7457 7515 7557 7557LDA 6132 646 6453 6478 6431PD-LFDA 7417 782 7893 7913 7985JGLDA 5029 519 5161 5159 5263RP 5638 5992 6237 6135 6124

RBF-SVMPCA 7767 747 7393 7636 7900LPP 6677 715 7264 7115 6921LFDA 7486 7915 7924 7822 7584LDA 629 6362 6427 6153 6323PD-LFDA 7551 8011 8159 8187 7899JGLDA 5109 5066 5133 5147 5369RP 653 6908 7085 706 7203

becomes 5651 and the average accuracy is only 6209Other methods such as PCA LPP and RP can produce thecomparable results and no one can outperform the otherHowever LDAoutperformsPCA LPP andRP yielding betterresult Similar conclusions can be achieved in the groupof RBF-SVM Generally proposed PD-LFDA significantlyoutperforms the rest in this experiment which indicates thecorrectness of improvements in proposed PD-LFDA

16 Mathematical Problems in Engineering

Table 4 Performance of dimension reduction on the whole labeled samples ()

Evaluated item MethodologiesPCA LDA LPP LFDA JGLDA RP PD-LFDA

7NNOverall accuracy 6868 739 6875 7923 6195 6709 8379Kappa coefficient 6471 7041 647 7653 5651 6294 8169Average accuracy 737 813 7607 8609 6209 7164 8991

RBF-SVMOverall accuracy 7992 7287 7674 8375 5824 7631 8486Kappa coefficient 7728 6928 7355 8151 5377 7327 8279Average accuracy 857 8071 8288 8822 7121 8365 8968

Finally details of assessment deduced by 7NN and RBF-SVM is summarized in Table 4 where the correspondingoverall accuracy kappa coefficient and average accuracy ofdifferent methods can be located collectively

5 Conclusions

In this paper we have analyzed local Fisher discriminantanalysis (LFDA) and found its weakness By replacing themaximum distance with local variance for the constructionof weight matrix introducing class prior probability into thecomputation of affinitymatrix an improved LFDA algorithmhas been proposed This novel approach is called PD-LFDAbecause the probability distribution (PD) is applied in LFDAalgorithm The proposed approach essentially can increasethe discriminant ability of transformed features in lowdimensional space The pattern found by the new approachis expected to be more accurate and coincides with thecharacter of HSI data and is conducive to classify HSI dataPD-LFDA has been evaluated on a real removing sensingAVIRIS Indian Pine 92AV3C data set We have compared theperformance of the proposed PD-LFDA with that of PCALPP LFDA LDA JGLDA and RP Both the numerical resultsand visual inspection of class maps have been obtained Inthe experiments KNN classifier and SVM classifier havebeen used We have argued that the proposed PD-LFDAexhibits the best performance and serves as a very effectivedimensionality reduction tool for high dimensional datasuch as hyperspectral image (HSI) data

Appendix

Procedure of Proposed Algorithm

The brief description of the algorithm to perform the pro-posed PD-LFDA method is already presented in Section 3The details of the algorithm are provided in Algorithm 2

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the Research of University ofMacau under Grants no MYRG205(Y1-L4)-FST11-TYY noMYRG187(Y1-L3)-FST11-TYY and no SRG010-FST11-TYYand by the National Natural Science Foundation of Chinaunder Grant no 61273244 This research project was alsosupported by the Science andTechnologyDevelopment Fund(FDCT) of Macau under Contract no 100-2012-A3

References

[1] W Li S Prasad Z Ye J E Fowler and M Cui ldquoLocality-preserving discriminant analysis for hyperspectral image clas-sification using local spatial informationrdquo in Proceedings ofthe 32nd IEEE International Geoscience and Remote SensingSymposium (IGARSS rsquo12) pp 4134ndash4137MunichGermany July2012

[2] HND LeM S Kim andD-H Kim ldquoComparison of singularvalue decomposition and principal component analysis appliedto hyperspectral imaging of biofilmrdquo in Proceedings of the IEEEPhotonics Conference (IPC rsquo12) pp 6ndash7 2012

[3] C K Chui and J Wang ldquoRandomized anisotropic transformfor nonlinear dimensionality reductionrdquo GEMmdashInternationalJournal on Geomathematics vol 1 no 1 pp 23ndash50 2010

[4] T V Bandos L Bruzzone and G Camps-Valls ldquoClassificationof hyperspectral images with regularized linear discriminantanalysisrdquo IEEE Transactions on Geoscience and Remote Sensingvol 47 no 3 pp 862ndash873 2009

[5] D Guangjun Z Yongsheng and J Song ldquoDimensionalityreduction of hyperspectral data based on ISOMAP algorithmrdquoin Proceedings of the 8th International Conference on ElectronicMeasurement and Instruments (ICEMI rsquo07) pp 3935ndash3938Xirsquoan China August 2007

[6] X Luo and M-F Jiang ldquoThe application of manifold learn-ing in dimensionality analysis for hyperspectral imageryrdquo inProceedings of the International Conference on Remote SensingEnvironment and Transportation Engineering (RSETE rsquo11) pp4572ndash4575 June 2011

[7] J Khodr and R Younes ldquoDimensionality reduction on hyper-spectral images a comparative review based on artificial datasrdquoin Proceedings of the 4th International Congress on Image andSignal Processing (CISP rsquo11) vol 4 pp 1875ndash1883 October 2011

Mathematical Problems in Engineering 17

[8] J Wen Z Tian H She and W Yan ldquoFeature extraction ofhyperspectral images based on preserving neighborhood dis-criminant embeddingrdquo in Proceedings of the 2nd InternationalConference on Image Analysis and Signal Processing (IASP rsquo10)pp 257ndash262 Zhejiang China April 2010

[9] Y-R Yeh S-Y Huang and Y-J Lee ldquoNonlinear dimensionreduction with kernel sliced inverse regressionrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 21 no 11 pp1590ndash1603 2009

[10] J He L Zhang Q Wang and Z Li ldquoUsing diffusion geometriccoordinates for hyperspectral imagery representationrdquo IEEEGeoscience and Remote Sensing Letters vol 6 no 4 pp 767ndash7712009

[11] J Peng P Zhang and N Riedel ldquoDiscriminant learninganalysisrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 38 no 6 pp 1614ndash1625 2008

[12] F S Tsai and K L Chan ldquoDimensionality reduction techniquesfor data explorationrdquo in Proceedings of the 6th InternationalConference on Information Communications and Signal Process-ing pp 1ndash5 December 2007

[13] M D Farrell Jr and R M Mersereau ldquoOn the impact of PCAdimension reduction for hyperspectral detection of difficulttargetsrdquo IEEE Geoscience and Remote Sensing Letters vol 2 no2 pp 192ndash195 2005

[14] S Prasad and L M Bruce ldquoLimitations of principal com-ponents analysis for hyperspectral target recognitionrdquo IEEEGeoscience and Remote Sensing Letters vol 5 no 4 pp 625ndash629 2008

[15] J Yu Q Tian T Rui and T S Huang ldquoIntegrating discriminantand descriptive information for dimension reduction and clas-sificationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 17 no 3 pp 372ndash377 2007

[16] J Kong S Wang J Wang L Ma B Fu and Y Lu ldquoAnovel approach for face recognition based on supervisedlocality preserving projection and maximummargin criterionrdquoin Proceedings of the International Conference on ComputerEngineering and Technology (ICCET rsquo09) vol 1 pp 419ndash423Singapore January 2009

[17] M Loog and R P W Duin ldquoLinear dimensionality reductionvia a heteroscedastic extension of LDA the Chernoff criterionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 26 no 6 pp 732ndash739 2004

[18] A C Jensen A Berge and A S Solberg ldquoRegressionapproaches to small sample inverse covariance matrix estima-tion for hyperspectral image classificationrdquo IEEE TransactionsonGeoscience andRemote Sensing vol 46 no 10 pp 2814ndash28222008

[19] J Jin BWang and L Zhang ldquoA novel approach based on Fisherdiscriminant null space for decomposition of mixed pixels inhyperspectral imageryrdquo IEEE Geoscience and Remote SensingLetters vol 7 no 4 pp 699ndash703 2010

[20] J Wen Z Tian X Liu and W Lin ldquoNeighborhood preservingorthogonal pnmf feature extraction for hyperspectral imageclassificationrdquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 6 no 2 pp 759ndash7682013

[21] Y Ren G Zhang G Yu and X Li ldquoLocal and global structurepreserving based feature selectionrdquoNeurocomputing vol 89 pp147ndash157 2012

[22] Z Fan Y Xu and D Zhang ldquoLocal linear discriminant analysisframework using sample neighborsrdquo IEEE Transactions onNeural Networks vol 22 no 7 pp 1119ndash1132 2011

[23] Y Wang S Huang D Liu and B Wang ldquoResearch advanceon band selection-based dimension reduction of hyperspectralremote sensing imagesrdquo in Proceedings of the 2nd InternationalConference on Remote Sensing Environment and TransportationEngineering (RSETE rsquo12) pp 1ndash4 IEEE Nanjing China June2012

[24] B Waske S van der Linden J A Benediktsson A Rabe andP Hostert ldquoSensitivity of support vector machines to randomfeature selection in classification of hyperspectral datardquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 7pp 2880ndash2889 2010

[25] M Sugiyama ldquoLocal fisher discriminant analysis for superviseddimensionality reductionrdquo in Proceedings of the 23rd Interna-tional Conference onMachine Learning (ICML rsquo06) pp 905ndash912ACM June 2006

[26] Y Cheng ldquoMean shift mode seeking and clusteringrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol17 no 8 pp 790ndash799 1995

[27] W Li S Prasad J E Fowler and L M Bruce ldquoLocality-preserving dimensionality reduction and classification forhyperspectral image analysisrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 50 no 4 pp 1185ndash1198 2012

[28] L Zelnik-Manor and P Perona ldquoSelf-tuning spectral cluster-ingrdquo in Proceedings of the 18th Annual Conference on NeuralInformation Processing Systems pp 1601ndash1608 December 2004

[29] W K Wong and H T Zhao ldquoSupervised optimal localitypreserving projectionrdquo Pattern Recognition vol 45 no 1 pp186ndash197 2012

[30] X He and P Niyogi ldquoLocality preserving projectionsrdquo inAdvances in Neural Information Processing Systems 16 SThrunL Saul and B Scholkopf Eds MIT Press Cambridge MassUSA 2004

[31] H Wang S Chen Z Hu and W Zheng ldquoLocality-preservedmaximum information projectionrdquo IEEE Transactions on Neu-ral Networks vol 19 no 4 pp 571ndash585 2008

[32] M Sugiyama ldquoDimensionality reduction ofmultimodal labeleddata by local fisher discriminant analysisrdquo Journal of MachineLearning Research vol 8 pp 1027ndash1061 2007

[33] S Yan D Xu B Zhang H J Zhang Q Yang and S Lin ldquoGraphembedding and extensions a general framework for dimen-sionality reductionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 40ndash51 2007

[34] C K Chui and J Wang ldquoDimensionality reduction of hyper-spectral imagery data for feature classificationrdquo in Handbookof Geomathematics pp 1005ndash1047 Springer Heidelberg Ger-many 2010

[35] C-C Chang and C-J Lin ldquoLIBSVM a Library for supportvector machinesrdquo ACM Transactions on Intelligent Systems andTechnology vol 2 no 3 article 27 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Research Article Subspace Learning via Local Probability ...downloads.hindawi.com/journals/mpe/2015/145136.pdf · and applicable toolkits were engendered one a er another. Hyperspectral

12 Mathematical Problems in Engineering

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(a) 1NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(b) 5NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(c) 9NN

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

(d) Linear SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(e) Polynomial SVM

7 9 11 13 150

01

02

03

04

05

06

07

08

09

1

Reduced space

Kapp

a coe

ffici

ent

PCALPPLFDA

LDAPD-LFDAJGLDA

RP

(f) RBF SVM

Figure 3 Kappa coefficient by different dimension reduction methods and different classifiers applied to AVIRIS Indian Pines database

Mathematical Problems in Engineering 13

(a) Pseudo 3-channel color image band = [12 79 140]

1

2

2

2

2

2 2

3

3

3

3

3

4

5

5

5

5

6

6 6 6

7

8

9

10

10

10 10

11

11

11

11

11 12

12 12

13 14

14

14 15

15

16

(b) Ground truth

(c) Distribution of training sample

Figure 4 Illustration of sample partition

from the remaining samples Under this strategy the trainingsamples and testing samples are listed in Table 1

Figure 2 shows the overall accuracy of different dimen-sion reduction methods applied to AVIRIS Indian 92AV3Cdata set The neighborhood of KNN classifier is selected as 15 and 9 respectively which produces three classifiers thatis 1NN 5NN and 9NN Three different kernel functionsare adopted for SVM classifier Then derived classifiers arealso used in this experiment that is linear SVM polynomialSVM and RBFSVM It can be deduced from Figures 2(a)sim2(c) that when the embedding space is greater than 5 pro-posed PD-LFDA performs the best while JGLDA performsthe worst The results produced by RP is slightly better thanJGLDA PCA LDA LPP and RP show the similar classifiedresults under KNN classifierThat is the proposed PD-LFDAoutperforms the others Meantime compared with LFDAthe proposed PD-LFDA leads to 2 more improvementson average Moreover it can be observed from (d) that theclassified accuracy increases steadily as the embedded spaceincreases However LDA demonstrates the highest overallaccuracywhen the reduced features vary to 9 while LFDAhas

the significant improvements when the number of reducedfeatures is greater than 9 This phenomenon of Figure 2(d)indicates the instability of linear SVM Nevertheless thesituation reversed for polynomial SVM and RBF SVM inFigures 2(e) and 2(f) wherein the proposed PD-LFDAwins a little improvement against LFDA and has significantimprovement compared with the others Inspired effects ofproposed PD-LFDA algorithm were achieved in all casesFurthermore Table 2 gives the detailed overall accuracyunder different feature dimensions using 3NN 7NN andRBFSVM classifiers which validates the feasibility of theproposed scheme in this paper

Figure 3 displays the results of kappa coefficient obtainedusing the different dimension reduction algorithms underKNN classifier and SVM classifiers The experimental cir-cumstance of Figure 3 is same as that of Figure 2 We canfind that from these results JGLDA performs the worst inmost cases except in Figure 3(e) The proposed PD-LFDAmethod outperforms the other methods and achieves thehighest kappa numerical value in most cases except usingthe linear SVM as the classifier In fact none of them work

14 Mathematical Problems in Engineering

686864717360

(a) PCA-7NN

687564707607

(b) LPP-7NN

792376538609

(c) LFDA-7NN

739070418130

(d) LDA-7NN

619556516209

(e) JGLDA-7NN

670962947164

(f) RP-7NN

799277288570

(g) PCA-RBFSVM

767473558288

(h) LPP-RBFSVM

837581518822

(i) LFDA-RBFSVM

728769288071

(j) LDA-RBFSVM

582453777121

(k) JGLDA-RBFSVM

763173278365

(l) RP-RBFSVM

Figure 5 Classified map generated by different dimension reduction methods where the overall accuracy kappa coefficient and averageclassification accuracy are listed at the top of each map respectively

steady in the case of linear SVM (Figure 3(d)) Note that thesituation is improved in polynomial SVM where the kappanumerical value of the proposed PD-LFDA is significantlybetter than the others All the achievements demonstrate therobustness of our contribution in PD-LFDA Simultaneouslyit is noticeable that LPP exhibits an average kappa levelThe kappa value gained by LPP is not seriously bad and isnot dramatically good The kappa results produced by RPare approximately the same as LPP A significant advantageof RP is the simple construction and computation wherethe accuracy is closed to LPP More details are summarizedin Table 3 It can be concluded that the kappa coefficientof proposed algorithm is higher than the other approacheswhich is more appropriate for the classification of HSI data

The visual results of all methods are presented in Figures5sim6 where the class labels are converted to pseudocolor

image The pseudocolor image of the hyperspectral imagefrom Indian 92AV3C database is shown in Figure 4(a) Theavailable labeled image which represents the truth groundis illustrated in Figure 4(b) where the labels are made byhuman The training samples are selected from the labeledimage represented as points in the image as shown inFigure 4(c) Each label number (ID) corresponds to eachclass name which is indexed in Table 1 In this experimentall the available labeled samples are used for testing whileapproximate 10 of samples are used for training Thesubspace is fixed to 13 (the number here is only used forreference it can be changed) For each experiment thedimension from original feature space is reduced to theobjective dimensionality thereafter the classified maps areinduced by 7NN classifier and RBF-SVM classifier Theoverall accuracy kappa coefficient and average accuracy are

Mathematical Problems in Engineering 15

837981698991

(a) PD-LFDA-7NN

848682798968

(b) D-LFDA-RBFSVM

Figure 6 Classified map of the proposed method

Table 2 Overall accuracy obtained using different dimensions offeature and different classifiers for Indian Pine Scene database ()

Item Dims7 9 11 13 15

3NNPCA 7007 6996 7026 7035 7087LPP 6575 6871 6835 6695 6546LFDA 7513 7895 7963 7992 8009LDA 6415 6606 6616 6596 6536PD-LFDA 7713 8094 816 8218 8263JGLDA 5405 5602 5704 5642 5751RP 6173 6444 6745 6632 6652

7NNPCA 6902 6902 691 6933 6972LPP 6782 7057 7027 6892 6681LFDA 7419 7777 7827 7863 7862LDA 6615 6909 6899 692 6882PD-LFDA 7736 809 8155 8172 8235JGLDA 5665 5811 5809 5812 5914RP 6194 6481 6696 6613 6603

RBF-SVMPCA 8051 7794 7718 7935 8176LPP 7101 7514 7614 7501 7314LFDA 7826 8187 82 8113 7915LDA 677 6834 6877 6642 6792PD-LFDA 7858 8267 8395 8423 8166JGLDA 5686 5606 5663 5689 5879RP 6957 7282 7441 7416 7543

also included at the top of each map respectively Figure 5displays the classifiedmaps of classicmethods in pseudocolorimages and the classified maps of proposed PD-LFDA arepresented in Figure 6 It can be observed from these mapsthat the best performance is achieved by PD-LFDA when7NN classifier is used in this case the overall accuracyis 8379 the kappa coefficient is 8169 and the averageaccuracy is 8991 Moreover the worst algorithm is JGLDAwhose overall accuracy is 6195 the kappa coefficient

Table 3 Kappa coefficient by different dimension reduction meth-ods and different classifiers applied to Indian Pine Scene database()

Item Dims7 9 11 13 15

3NNPCA 6592 658 6612 6624 6681LPP 6084 6422 6382 6219 605LFDA 7145 7586 7664 7697 7718LDA 5913 6126 6137 6118 6048PD-LFDA 7392 7824 7901 7966 8016JGLDA 4778 4992 5082 5006 5119RP 564 5963 6303 6172 6192

7NNPCA 6466 6466 6475 6501 6545LPP 6304 6629 6585 6429 6185LFDA 7047 7457 7515 7557 7557LDA 6132 646 6453 6478 6431PD-LFDA 7417 782 7893 7913 7985JGLDA 5029 519 5161 5159 5263RP 5638 5992 6237 6135 6124

RBF-SVMPCA 7767 747 7393 7636 7900LPP 6677 715 7264 7115 6921LFDA 7486 7915 7924 7822 7584LDA 629 6362 6427 6153 6323PD-LFDA 7551 8011 8159 8187 7899JGLDA 5109 5066 5133 5147 5369RP 653 6908 7085 706 7203

becomes 5651 and the average accuracy is only 6209Other methods such as PCA LPP and RP can produce thecomparable results and no one can outperform the otherHowever LDAoutperformsPCA LPP andRP yielding betterresult Similar conclusions can be achieved in the groupof RBF-SVM Generally proposed PD-LFDA significantlyoutperforms the rest in this experiment which indicates thecorrectness of improvements in proposed PD-LFDA

16 Mathematical Problems in Engineering

Table 4 Performance of dimension reduction on the whole labeled samples ()

Evaluated item MethodologiesPCA LDA LPP LFDA JGLDA RP PD-LFDA

7NNOverall accuracy 6868 739 6875 7923 6195 6709 8379Kappa coefficient 6471 7041 647 7653 5651 6294 8169Average accuracy 737 813 7607 8609 6209 7164 8991

RBF-SVMOverall accuracy 7992 7287 7674 8375 5824 7631 8486Kappa coefficient 7728 6928 7355 8151 5377 7327 8279Average accuracy 857 8071 8288 8822 7121 8365 8968

Finally details of assessment deduced by 7NN and RBF-SVM is summarized in Table 4 where the correspondingoverall accuracy kappa coefficient and average accuracy ofdifferent methods can be located collectively

5 Conclusions

In this paper we have analyzed local Fisher discriminantanalysis (LFDA) and found its weakness By replacing themaximum distance with local variance for the constructionof weight matrix introducing class prior probability into thecomputation of affinitymatrix an improved LFDA algorithmhas been proposed This novel approach is called PD-LFDAbecause the probability distribution (PD) is applied in LFDAalgorithm The proposed approach essentially can increasethe discriminant ability of transformed features in lowdimensional space The pattern found by the new approachis expected to be more accurate and coincides with thecharacter of HSI data and is conducive to classify HSI dataPD-LFDA has been evaluated on a real removing sensingAVIRIS Indian Pine 92AV3C data set We have compared theperformance of the proposed PD-LFDA with that of PCALPP LFDA LDA JGLDA and RP Both the numerical resultsand visual inspection of class maps have been obtained Inthe experiments KNN classifier and SVM classifier havebeen used We have argued that the proposed PD-LFDAexhibits the best performance and serves as a very effectivedimensionality reduction tool for high dimensional datasuch as hyperspectral image (HSI) data

Appendix

Procedure of Proposed Algorithm

The brief description of the algorithm to perform the pro-posed PD-LFDA method is already presented in Section 3The details of the algorithm are provided in Algorithm 2

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the Research of University ofMacau under Grants no MYRG205(Y1-L4)-FST11-TYY noMYRG187(Y1-L3)-FST11-TYY and no SRG010-FST11-TYYand by the National Natural Science Foundation of Chinaunder Grant no 61273244 This research project was alsosupported by the Science andTechnologyDevelopment Fund(FDCT) of Macau under Contract no 100-2012-A3

References

[1] W Li S Prasad Z Ye J E Fowler and M Cui ldquoLocality-preserving discriminant analysis for hyperspectral image clas-sification using local spatial informationrdquo in Proceedings ofthe 32nd IEEE International Geoscience and Remote SensingSymposium (IGARSS rsquo12) pp 4134ndash4137MunichGermany July2012

[2] HND LeM S Kim andD-H Kim ldquoComparison of singularvalue decomposition and principal component analysis appliedto hyperspectral imaging of biofilmrdquo in Proceedings of the IEEEPhotonics Conference (IPC rsquo12) pp 6ndash7 2012

[3] C K Chui and J Wang ldquoRandomized anisotropic transformfor nonlinear dimensionality reductionrdquo GEMmdashInternationalJournal on Geomathematics vol 1 no 1 pp 23ndash50 2010

[4] T V Bandos L Bruzzone and G Camps-Valls ldquoClassificationof hyperspectral images with regularized linear discriminantanalysisrdquo IEEE Transactions on Geoscience and Remote Sensingvol 47 no 3 pp 862ndash873 2009

[5] D Guangjun Z Yongsheng and J Song ldquoDimensionalityreduction of hyperspectral data based on ISOMAP algorithmrdquoin Proceedings of the 8th International Conference on ElectronicMeasurement and Instruments (ICEMI rsquo07) pp 3935ndash3938Xirsquoan China August 2007

[6] X Luo and M-F Jiang ldquoThe application of manifold learn-ing in dimensionality analysis for hyperspectral imageryrdquo inProceedings of the International Conference on Remote SensingEnvironment and Transportation Engineering (RSETE rsquo11) pp4572ndash4575 June 2011

[7] J Khodr and R Younes ldquoDimensionality reduction on hyper-spectral images a comparative review based on artificial datasrdquoin Proceedings of the 4th International Congress on Image andSignal Processing (CISP rsquo11) vol 4 pp 1875ndash1883 October 2011

Mathematical Problems in Engineering 17

[8] J Wen Z Tian H She and W Yan ldquoFeature extraction ofhyperspectral images based on preserving neighborhood dis-criminant embeddingrdquo in Proceedings of the 2nd InternationalConference on Image Analysis and Signal Processing (IASP rsquo10)pp 257ndash262 Zhejiang China April 2010

[9] Y-R Yeh S-Y Huang and Y-J Lee ldquoNonlinear dimensionreduction with kernel sliced inverse regressionrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 21 no 11 pp1590ndash1603 2009

[10] J He L Zhang Q Wang and Z Li ldquoUsing diffusion geometriccoordinates for hyperspectral imagery representationrdquo IEEEGeoscience and Remote Sensing Letters vol 6 no 4 pp 767ndash7712009

[11] J Peng P Zhang and N Riedel ldquoDiscriminant learninganalysisrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 38 no 6 pp 1614ndash1625 2008

[12] F S Tsai and K L Chan ldquoDimensionality reduction techniquesfor data explorationrdquo in Proceedings of the 6th InternationalConference on Information Communications and Signal Process-ing pp 1ndash5 December 2007

[13] M D Farrell Jr and R M Mersereau ldquoOn the impact of PCAdimension reduction for hyperspectral detection of difficulttargetsrdquo IEEE Geoscience and Remote Sensing Letters vol 2 no2 pp 192ndash195 2005

[14] S Prasad and L M Bruce ldquoLimitations of principal com-ponents analysis for hyperspectral target recognitionrdquo IEEEGeoscience and Remote Sensing Letters vol 5 no 4 pp 625ndash629 2008

[15] J Yu Q Tian T Rui and T S Huang ldquoIntegrating discriminantand descriptive information for dimension reduction and clas-sificationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 17 no 3 pp 372ndash377 2007

[16] J Kong S Wang J Wang L Ma B Fu and Y Lu ldquoAnovel approach for face recognition based on supervisedlocality preserving projection and maximummargin criterionrdquoin Proceedings of the International Conference on ComputerEngineering and Technology (ICCET rsquo09) vol 1 pp 419ndash423Singapore January 2009

[17] M Loog and R P W Duin ldquoLinear dimensionality reductionvia a heteroscedastic extension of LDA the Chernoff criterionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 26 no 6 pp 732ndash739 2004

[18] A C Jensen A Berge and A S Solberg ldquoRegressionapproaches to small sample inverse covariance matrix estima-tion for hyperspectral image classificationrdquo IEEE TransactionsonGeoscience andRemote Sensing vol 46 no 10 pp 2814ndash28222008

[19] J Jin BWang and L Zhang ldquoA novel approach based on Fisherdiscriminant null space for decomposition of mixed pixels inhyperspectral imageryrdquo IEEE Geoscience and Remote SensingLetters vol 7 no 4 pp 699ndash703 2010

[20] J Wen Z Tian X Liu and W Lin ldquoNeighborhood preservingorthogonal pnmf feature extraction for hyperspectral imageclassificationrdquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 6 no 2 pp 759ndash7682013

[21] Y Ren G Zhang G Yu and X Li ldquoLocal and global structurepreserving based feature selectionrdquoNeurocomputing vol 89 pp147ndash157 2012

[22] Z Fan Y Xu and D Zhang ldquoLocal linear discriminant analysisframework using sample neighborsrdquo IEEE Transactions onNeural Networks vol 22 no 7 pp 1119ndash1132 2011

[23] Y Wang S Huang D Liu and B Wang ldquoResearch advanceon band selection-based dimension reduction of hyperspectralremote sensing imagesrdquo in Proceedings of the 2nd InternationalConference on Remote Sensing Environment and TransportationEngineering (RSETE rsquo12) pp 1ndash4 IEEE Nanjing China June2012

[24] B Waske S van der Linden J A Benediktsson A Rabe andP Hostert ldquoSensitivity of support vector machines to randomfeature selection in classification of hyperspectral datardquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 7pp 2880ndash2889 2010

[25] M Sugiyama ldquoLocal fisher discriminant analysis for superviseddimensionality reductionrdquo in Proceedings of the 23rd Interna-tional Conference onMachine Learning (ICML rsquo06) pp 905ndash912ACM June 2006

[26] Y Cheng ldquoMean shift mode seeking and clusteringrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol17 no 8 pp 790ndash799 1995

[27] W Li S Prasad J E Fowler and L M Bruce ldquoLocality-preserving dimensionality reduction and classification forhyperspectral image analysisrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 50 no 4 pp 1185ndash1198 2012

[28] L Zelnik-Manor and P Perona ldquoSelf-tuning spectral cluster-ingrdquo in Proceedings of the 18th Annual Conference on NeuralInformation Processing Systems pp 1601ndash1608 December 2004

[29] W K Wong and H T Zhao ldquoSupervised optimal localitypreserving projectionrdquo Pattern Recognition vol 45 no 1 pp186ndash197 2012

[30] X He and P Niyogi ldquoLocality preserving projectionsrdquo inAdvances in Neural Information Processing Systems 16 SThrunL Saul and B Scholkopf Eds MIT Press Cambridge MassUSA 2004

[31] H Wang S Chen Z Hu and W Zheng ldquoLocality-preservedmaximum information projectionrdquo IEEE Transactions on Neu-ral Networks vol 19 no 4 pp 571ndash585 2008

[32] M Sugiyama ldquoDimensionality reduction ofmultimodal labeleddata by local fisher discriminant analysisrdquo Journal of MachineLearning Research vol 8 pp 1027ndash1061 2007

[33] S Yan D Xu B Zhang H J Zhang Q Yang and S Lin ldquoGraphembedding and extensions a general framework for dimen-sionality reductionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 40ndash51 2007

[34] C K Chui and J Wang ldquoDimensionality reduction of hyper-spectral imagery data for feature classificationrdquo in Handbookof Geomathematics pp 1005ndash1047 Springer Heidelberg Ger-many 2010

[35] C-C Chang and C-J Lin ldquoLIBSVM a Library for supportvector machinesrdquo ACM Transactions on Intelligent Systems andTechnology vol 2 no 3 article 27 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 13: Research Article Subspace Learning via Local Probability ...downloads.hindawi.com/journals/mpe/2015/145136.pdf · and applicable toolkits were engendered one a er another. Hyperspectral

Mathematical Problems in Engineering 13

(a) Pseudo 3-channel color image band = [12 79 140]

1

2

2

2

2

2 2

3

3

3

3

3

4

5

5

5

5

6

6 6 6

7

8

9

10

10

10 10

11

11

11

11

11 12

12 12

13 14

14

14 15

15

16

(b) Ground truth

(c) Distribution of training sample

Figure 4 Illustration of sample partition

from the remaining samples Under this strategy the trainingsamples and testing samples are listed in Table 1

Figure 2 shows the overall accuracy of different dimen-sion reduction methods applied to AVIRIS Indian 92AV3Cdata set The neighborhood of KNN classifier is selected as 15 and 9 respectively which produces three classifiers thatis 1NN 5NN and 9NN Three different kernel functionsare adopted for SVM classifier Then derived classifiers arealso used in this experiment that is linear SVM polynomialSVM and RBFSVM It can be deduced from Figures 2(a)sim2(c) that when the embedding space is greater than 5 pro-posed PD-LFDA performs the best while JGLDA performsthe worst The results produced by RP is slightly better thanJGLDA PCA LDA LPP and RP show the similar classifiedresults under KNN classifierThat is the proposed PD-LFDAoutperforms the others Meantime compared with LFDAthe proposed PD-LFDA leads to 2 more improvementson average Moreover it can be observed from (d) that theclassified accuracy increases steadily as the embedded spaceincreases However LDA demonstrates the highest overallaccuracywhen the reduced features vary to 9 while LFDAhas

the significant improvements when the number of reducedfeatures is greater than 9 This phenomenon of Figure 2(d)indicates the instability of linear SVM Nevertheless thesituation reversed for polynomial SVM and RBF SVM inFigures 2(e) and 2(f) wherein the proposed PD-LFDAwins a little improvement against LFDA and has significantimprovement compared with the others Inspired effects ofproposed PD-LFDA algorithm were achieved in all casesFurthermore Table 2 gives the detailed overall accuracyunder different feature dimensions using 3NN 7NN andRBFSVM classifiers which validates the feasibility of theproposed scheme in this paper

Figure 3 displays the results of kappa coefficient obtainedusing the different dimension reduction algorithms underKNN classifier and SVM classifiers The experimental cir-cumstance of Figure 3 is same as that of Figure 2 We canfind that from these results JGLDA performs the worst inmost cases except in Figure 3(e) The proposed PD-LFDAmethod outperforms the other methods and achieves thehighest kappa numerical value in most cases except usingthe linear SVM as the classifier In fact none of them work

14 Mathematical Problems in Engineering

686864717360

(a) PCA-7NN

687564707607

(b) LPP-7NN

792376538609

(c) LFDA-7NN

739070418130

(d) LDA-7NN

619556516209

(e) JGLDA-7NN

670962947164

(f) RP-7NN

799277288570

(g) PCA-RBFSVM

767473558288

(h) LPP-RBFSVM

837581518822

(i) LFDA-RBFSVM

728769288071

(j) LDA-RBFSVM

582453777121

(k) JGLDA-RBFSVM

763173278365

(l) RP-RBFSVM

Figure 5 Classified map generated by different dimension reduction methods where the overall accuracy kappa coefficient and averageclassification accuracy are listed at the top of each map respectively

steady in the case of linear SVM (Figure 3(d)) Note that thesituation is improved in polynomial SVM where the kappanumerical value of the proposed PD-LFDA is significantlybetter than the others All the achievements demonstrate therobustness of our contribution in PD-LFDA Simultaneouslyit is noticeable that LPP exhibits an average kappa levelThe kappa value gained by LPP is not seriously bad and isnot dramatically good The kappa results produced by RPare approximately the same as LPP A significant advantageof RP is the simple construction and computation wherethe accuracy is closed to LPP More details are summarizedin Table 3 It can be concluded that the kappa coefficientof proposed algorithm is higher than the other approacheswhich is more appropriate for the classification of HSI data

The visual results of all methods are presented in Figures5sim6 where the class labels are converted to pseudocolor

image The pseudocolor image of the hyperspectral imagefrom Indian 92AV3C database is shown in Figure 4(a) Theavailable labeled image which represents the truth groundis illustrated in Figure 4(b) where the labels are made byhuman The training samples are selected from the labeledimage represented as points in the image as shown inFigure 4(c) Each label number (ID) corresponds to eachclass name which is indexed in Table 1 In this experimentall the available labeled samples are used for testing whileapproximate 10 of samples are used for training Thesubspace is fixed to 13 (the number here is only used forreference it can be changed) For each experiment thedimension from original feature space is reduced to theobjective dimensionality thereafter the classified maps areinduced by 7NN classifier and RBF-SVM classifier Theoverall accuracy kappa coefficient and average accuracy are

Mathematical Problems in Engineering 15

837981698991

(a) PD-LFDA-7NN

848682798968

(b) D-LFDA-RBFSVM

Figure 6 Classified map of the proposed method

Table 2 Overall accuracy obtained using different dimensions offeature and different classifiers for Indian Pine Scene database ()

Item Dims7 9 11 13 15

3NNPCA 7007 6996 7026 7035 7087LPP 6575 6871 6835 6695 6546LFDA 7513 7895 7963 7992 8009LDA 6415 6606 6616 6596 6536PD-LFDA 7713 8094 816 8218 8263JGLDA 5405 5602 5704 5642 5751RP 6173 6444 6745 6632 6652

7NNPCA 6902 6902 691 6933 6972LPP 6782 7057 7027 6892 6681LFDA 7419 7777 7827 7863 7862LDA 6615 6909 6899 692 6882PD-LFDA 7736 809 8155 8172 8235JGLDA 5665 5811 5809 5812 5914RP 6194 6481 6696 6613 6603

RBF-SVMPCA 8051 7794 7718 7935 8176LPP 7101 7514 7614 7501 7314LFDA 7826 8187 82 8113 7915LDA 677 6834 6877 6642 6792PD-LFDA 7858 8267 8395 8423 8166JGLDA 5686 5606 5663 5689 5879RP 6957 7282 7441 7416 7543

also included at the top of each map respectively Figure 5displays the classifiedmaps of classicmethods in pseudocolorimages and the classified maps of proposed PD-LFDA arepresented in Figure 6 It can be observed from these mapsthat the best performance is achieved by PD-LFDA when7NN classifier is used in this case the overall accuracyis 8379 the kappa coefficient is 8169 and the averageaccuracy is 8991 Moreover the worst algorithm is JGLDAwhose overall accuracy is 6195 the kappa coefficient

Table 3 Kappa coefficient by different dimension reduction meth-ods and different classifiers applied to Indian Pine Scene database()

Item Dims7 9 11 13 15

3NNPCA 6592 658 6612 6624 6681LPP 6084 6422 6382 6219 605LFDA 7145 7586 7664 7697 7718LDA 5913 6126 6137 6118 6048PD-LFDA 7392 7824 7901 7966 8016JGLDA 4778 4992 5082 5006 5119RP 564 5963 6303 6172 6192

7NNPCA 6466 6466 6475 6501 6545LPP 6304 6629 6585 6429 6185LFDA 7047 7457 7515 7557 7557LDA 6132 646 6453 6478 6431PD-LFDA 7417 782 7893 7913 7985JGLDA 5029 519 5161 5159 5263RP 5638 5992 6237 6135 6124

RBF-SVMPCA 7767 747 7393 7636 7900LPP 6677 715 7264 7115 6921LFDA 7486 7915 7924 7822 7584LDA 629 6362 6427 6153 6323PD-LFDA 7551 8011 8159 8187 7899JGLDA 5109 5066 5133 5147 5369RP 653 6908 7085 706 7203

becomes 5651 and the average accuracy is only 6209Other methods such as PCA LPP and RP can produce thecomparable results and no one can outperform the otherHowever LDAoutperformsPCA LPP andRP yielding betterresult Similar conclusions can be achieved in the groupof RBF-SVM Generally proposed PD-LFDA significantlyoutperforms the rest in this experiment which indicates thecorrectness of improvements in proposed PD-LFDA

16 Mathematical Problems in Engineering

Table 4 Performance of dimension reduction on the whole labeled samples ()

Evaluated item MethodologiesPCA LDA LPP LFDA JGLDA RP PD-LFDA

7NNOverall accuracy 6868 739 6875 7923 6195 6709 8379Kappa coefficient 6471 7041 647 7653 5651 6294 8169Average accuracy 737 813 7607 8609 6209 7164 8991

RBF-SVMOverall accuracy 7992 7287 7674 8375 5824 7631 8486Kappa coefficient 7728 6928 7355 8151 5377 7327 8279Average accuracy 857 8071 8288 8822 7121 8365 8968

Finally details of assessment deduced by 7NN and RBF-SVM is summarized in Table 4 where the correspondingoverall accuracy kappa coefficient and average accuracy ofdifferent methods can be located collectively

5 Conclusions

In this paper we have analyzed local Fisher discriminantanalysis (LFDA) and found its weakness By replacing themaximum distance with local variance for the constructionof weight matrix introducing class prior probability into thecomputation of affinitymatrix an improved LFDA algorithmhas been proposed This novel approach is called PD-LFDAbecause the probability distribution (PD) is applied in LFDAalgorithm The proposed approach essentially can increasethe discriminant ability of transformed features in lowdimensional space The pattern found by the new approachis expected to be more accurate and coincides with thecharacter of HSI data and is conducive to classify HSI dataPD-LFDA has been evaluated on a real removing sensingAVIRIS Indian Pine 92AV3C data set We have compared theperformance of the proposed PD-LFDA with that of PCALPP LFDA LDA JGLDA and RP Both the numerical resultsand visual inspection of class maps have been obtained Inthe experiments KNN classifier and SVM classifier havebeen used We have argued that the proposed PD-LFDAexhibits the best performance and serves as a very effectivedimensionality reduction tool for high dimensional datasuch as hyperspectral image (HSI) data

Appendix

Procedure of Proposed Algorithm

The brief description of the algorithm to perform the pro-posed PD-LFDA method is already presented in Section 3The details of the algorithm are provided in Algorithm 2

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the Research of University ofMacau under Grants no MYRG205(Y1-L4)-FST11-TYY noMYRG187(Y1-L3)-FST11-TYY and no SRG010-FST11-TYYand by the National Natural Science Foundation of Chinaunder Grant no 61273244 This research project was alsosupported by the Science andTechnologyDevelopment Fund(FDCT) of Macau under Contract no 100-2012-A3

References

[1] W Li S Prasad Z Ye J E Fowler and M Cui ldquoLocality-preserving discriminant analysis for hyperspectral image clas-sification using local spatial informationrdquo in Proceedings ofthe 32nd IEEE International Geoscience and Remote SensingSymposium (IGARSS rsquo12) pp 4134ndash4137MunichGermany July2012

[2] HND LeM S Kim andD-H Kim ldquoComparison of singularvalue decomposition and principal component analysis appliedto hyperspectral imaging of biofilmrdquo in Proceedings of the IEEEPhotonics Conference (IPC rsquo12) pp 6ndash7 2012

[3] C K Chui and J Wang ldquoRandomized anisotropic transformfor nonlinear dimensionality reductionrdquo GEMmdashInternationalJournal on Geomathematics vol 1 no 1 pp 23ndash50 2010

[4] T V Bandos L Bruzzone and G Camps-Valls ldquoClassificationof hyperspectral images with regularized linear discriminantanalysisrdquo IEEE Transactions on Geoscience and Remote Sensingvol 47 no 3 pp 862ndash873 2009

[5] D Guangjun Z Yongsheng and J Song ldquoDimensionalityreduction of hyperspectral data based on ISOMAP algorithmrdquoin Proceedings of the 8th International Conference on ElectronicMeasurement and Instruments (ICEMI rsquo07) pp 3935ndash3938Xirsquoan China August 2007

[6] X Luo and M-F Jiang ldquoThe application of manifold learn-ing in dimensionality analysis for hyperspectral imageryrdquo inProceedings of the International Conference on Remote SensingEnvironment and Transportation Engineering (RSETE rsquo11) pp4572ndash4575 June 2011

[7] J Khodr and R Younes ldquoDimensionality reduction on hyper-spectral images a comparative review based on artificial datasrdquoin Proceedings of the 4th International Congress on Image andSignal Processing (CISP rsquo11) vol 4 pp 1875ndash1883 October 2011

Mathematical Problems in Engineering 17

[8] J Wen Z Tian H She and W Yan ldquoFeature extraction ofhyperspectral images based on preserving neighborhood dis-criminant embeddingrdquo in Proceedings of the 2nd InternationalConference on Image Analysis and Signal Processing (IASP rsquo10)pp 257ndash262 Zhejiang China April 2010

[9] Y-R Yeh S-Y Huang and Y-J Lee ldquoNonlinear dimensionreduction with kernel sliced inverse regressionrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 21 no 11 pp1590ndash1603 2009

[10] J He L Zhang Q Wang and Z Li ldquoUsing diffusion geometriccoordinates for hyperspectral imagery representationrdquo IEEEGeoscience and Remote Sensing Letters vol 6 no 4 pp 767ndash7712009

[11] J Peng P Zhang and N Riedel ldquoDiscriminant learninganalysisrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 38 no 6 pp 1614ndash1625 2008

[12] F S Tsai and K L Chan ldquoDimensionality reduction techniquesfor data explorationrdquo in Proceedings of the 6th InternationalConference on Information Communications and Signal Process-ing pp 1ndash5 December 2007

[13] M D Farrell Jr and R M Mersereau ldquoOn the impact of PCAdimension reduction for hyperspectral detection of difficulttargetsrdquo IEEE Geoscience and Remote Sensing Letters vol 2 no2 pp 192ndash195 2005

[14] S Prasad and L M Bruce ldquoLimitations of principal com-ponents analysis for hyperspectral target recognitionrdquo IEEEGeoscience and Remote Sensing Letters vol 5 no 4 pp 625ndash629 2008

[15] J Yu Q Tian T Rui and T S Huang ldquoIntegrating discriminantand descriptive information for dimension reduction and clas-sificationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 17 no 3 pp 372ndash377 2007

[16] J Kong S Wang J Wang L Ma B Fu and Y Lu ldquoAnovel approach for face recognition based on supervisedlocality preserving projection and maximummargin criterionrdquoin Proceedings of the International Conference on ComputerEngineering and Technology (ICCET rsquo09) vol 1 pp 419ndash423Singapore January 2009

[17] M Loog and R P W Duin ldquoLinear dimensionality reductionvia a heteroscedastic extension of LDA the Chernoff criterionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 26 no 6 pp 732ndash739 2004

[18] A C Jensen A Berge and A S Solberg ldquoRegressionapproaches to small sample inverse covariance matrix estima-tion for hyperspectral image classificationrdquo IEEE TransactionsonGeoscience andRemote Sensing vol 46 no 10 pp 2814ndash28222008

[19] J Jin BWang and L Zhang ldquoA novel approach based on Fisherdiscriminant null space for decomposition of mixed pixels inhyperspectral imageryrdquo IEEE Geoscience and Remote SensingLetters vol 7 no 4 pp 699ndash703 2010

[20] J Wen Z Tian X Liu and W Lin ldquoNeighborhood preservingorthogonal pnmf feature extraction for hyperspectral imageclassificationrdquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 6 no 2 pp 759ndash7682013

[21] Y Ren G Zhang G Yu and X Li ldquoLocal and global structurepreserving based feature selectionrdquoNeurocomputing vol 89 pp147ndash157 2012

[22] Z Fan Y Xu and D Zhang ldquoLocal linear discriminant analysisframework using sample neighborsrdquo IEEE Transactions onNeural Networks vol 22 no 7 pp 1119ndash1132 2011

[23] Y Wang S Huang D Liu and B Wang ldquoResearch advanceon band selection-based dimension reduction of hyperspectralremote sensing imagesrdquo in Proceedings of the 2nd InternationalConference on Remote Sensing Environment and TransportationEngineering (RSETE rsquo12) pp 1ndash4 IEEE Nanjing China June2012

[24] B Waske S van der Linden J A Benediktsson A Rabe andP Hostert ldquoSensitivity of support vector machines to randomfeature selection in classification of hyperspectral datardquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 7pp 2880ndash2889 2010

[25] M Sugiyama ldquoLocal fisher discriminant analysis for superviseddimensionality reductionrdquo in Proceedings of the 23rd Interna-tional Conference onMachine Learning (ICML rsquo06) pp 905ndash912ACM June 2006

[26] Y Cheng ldquoMean shift mode seeking and clusteringrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol17 no 8 pp 790ndash799 1995

[27] W Li S Prasad J E Fowler and L M Bruce ldquoLocality-preserving dimensionality reduction and classification forhyperspectral image analysisrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 50 no 4 pp 1185ndash1198 2012

[28] L Zelnik-Manor and P Perona ldquoSelf-tuning spectral cluster-ingrdquo in Proceedings of the 18th Annual Conference on NeuralInformation Processing Systems pp 1601ndash1608 December 2004

[29] W K Wong and H T Zhao ldquoSupervised optimal localitypreserving projectionrdquo Pattern Recognition vol 45 no 1 pp186ndash197 2012

[30] X He and P Niyogi ldquoLocality preserving projectionsrdquo inAdvances in Neural Information Processing Systems 16 SThrunL Saul and B Scholkopf Eds MIT Press Cambridge MassUSA 2004

[31] H Wang S Chen Z Hu and W Zheng ldquoLocality-preservedmaximum information projectionrdquo IEEE Transactions on Neu-ral Networks vol 19 no 4 pp 571ndash585 2008

[32] M Sugiyama ldquoDimensionality reduction ofmultimodal labeleddata by local fisher discriminant analysisrdquo Journal of MachineLearning Research vol 8 pp 1027ndash1061 2007

[33] S Yan D Xu B Zhang H J Zhang Q Yang and S Lin ldquoGraphembedding and extensions a general framework for dimen-sionality reductionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 40ndash51 2007

[34] C K Chui and J Wang ldquoDimensionality reduction of hyper-spectral imagery data for feature classificationrdquo in Handbookof Geomathematics pp 1005ndash1047 Springer Heidelberg Ger-many 2010

[35] C-C Chang and C-J Lin ldquoLIBSVM a Library for supportvector machinesrdquo ACM Transactions on Intelligent Systems andTechnology vol 2 no 3 article 27 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 14: Research Article Subspace Learning via Local Probability ...downloads.hindawi.com/journals/mpe/2015/145136.pdf · and applicable toolkits were engendered one a er another. Hyperspectral

14 Mathematical Problems in Engineering

686864717360

(a) PCA-7NN

687564707607

(b) LPP-7NN

792376538609

(c) LFDA-7NN

739070418130

(d) LDA-7NN

619556516209

(e) JGLDA-7NN

670962947164

(f) RP-7NN

799277288570

(g) PCA-RBFSVM

767473558288

(h) LPP-RBFSVM

837581518822

(i) LFDA-RBFSVM

728769288071

(j) LDA-RBFSVM

582453777121

(k) JGLDA-RBFSVM

763173278365

(l) RP-RBFSVM

Figure 5 Classified map generated by different dimension reduction methods where the overall accuracy kappa coefficient and averageclassification accuracy are listed at the top of each map respectively

steady in the case of linear SVM (Figure 3(d)) Note that thesituation is improved in polynomial SVM where the kappanumerical value of the proposed PD-LFDA is significantlybetter than the others All the achievements demonstrate therobustness of our contribution in PD-LFDA Simultaneouslyit is noticeable that LPP exhibits an average kappa levelThe kappa value gained by LPP is not seriously bad and isnot dramatically good The kappa results produced by RPare approximately the same as LPP A significant advantageof RP is the simple construction and computation wherethe accuracy is closed to LPP More details are summarizedin Table 3 It can be concluded that the kappa coefficientof proposed algorithm is higher than the other approacheswhich is more appropriate for the classification of HSI data

The visual results of all methods are presented in Figures5sim6 where the class labels are converted to pseudocolor

image The pseudocolor image of the hyperspectral imagefrom Indian 92AV3C database is shown in Figure 4(a) Theavailable labeled image which represents the truth groundis illustrated in Figure 4(b) where the labels are made byhuman The training samples are selected from the labeledimage represented as points in the image as shown inFigure 4(c) Each label number (ID) corresponds to eachclass name which is indexed in Table 1 In this experimentall the available labeled samples are used for testing whileapproximate 10 of samples are used for training Thesubspace is fixed to 13 (the number here is only used forreference it can be changed) For each experiment thedimension from original feature space is reduced to theobjective dimensionality thereafter the classified maps areinduced by 7NN classifier and RBF-SVM classifier Theoverall accuracy kappa coefficient and average accuracy are

Mathematical Problems in Engineering 15

837981698991

(a) PD-LFDA-7NN

848682798968

(b) D-LFDA-RBFSVM

Figure 6 Classified map of the proposed method

Table 2 Overall accuracy obtained using different dimensions offeature and different classifiers for Indian Pine Scene database ()

Item Dims7 9 11 13 15

3NNPCA 7007 6996 7026 7035 7087LPP 6575 6871 6835 6695 6546LFDA 7513 7895 7963 7992 8009LDA 6415 6606 6616 6596 6536PD-LFDA 7713 8094 816 8218 8263JGLDA 5405 5602 5704 5642 5751RP 6173 6444 6745 6632 6652

7NNPCA 6902 6902 691 6933 6972LPP 6782 7057 7027 6892 6681LFDA 7419 7777 7827 7863 7862LDA 6615 6909 6899 692 6882PD-LFDA 7736 809 8155 8172 8235JGLDA 5665 5811 5809 5812 5914RP 6194 6481 6696 6613 6603

RBF-SVMPCA 8051 7794 7718 7935 8176LPP 7101 7514 7614 7501 7314LFDA 7826 8187 82 8113 7915LDA 677 6834 6877 6642 6792PD-LFDA 7858 8267 8395 8423 8166JGLDA 5686 5606 5663 5689 5879RP 6957 7282 7441 7416 7543

also included at the top of each map respectively Figure 5displays the classifiedmaps of classicmethods in pseudocolorimages and the classified maps of proposed PD-LFDA arepresented in Figure 6 It can be observed from these mapsthat the best performance is achieved by PD-LFDA when7NN classifier is used in this case the overall accuracyis 8379 the kappa coefficient is 8169 and the averageaccuracy is 8991 Moreover the worst algorithm is JGLDAwhose overall accuracy is 6195 the kappa coefficient

Table 3 Kappa coefficient by different dimension reduction meth-ods and different classifiers applied to Indian Pine Scene database()

Item Dims7 9 11 13 15

3NNPCA 6592 658 6612 6624 6681LPP 6084 6422 6382 6219 605LFDA 7145 7586 7664 7697 7718LDA 5913 6126 6137 6118 6048PD-LFDA 7392 7824 7901 7966 8016JGLDA 4778 4992 5082 5006 5119RP 564 5963 6303 6172 6192

7NNPCA 6466 6466 6475 6501 6545LPP 6304 6629 6585 6429 6185LFDA 7047 7457 7515 7557 7557LDA 6132 646 6453 6478 6431PD-LFDA 7417 782 7893 7913 7985JGLDA 5029 519 5161 5159 5263RP 5638 5992 6237 6135 6124

RBF-SVMPCA 7767 747 7393 7636 7900LPP 6677 715 7264 7115 6921LFDA 7486 7915 7924 7822 7584LDA 629 6362 6427 6153 6323PD-LFDA 7551 8011 8159 8187 7899JGLDA 5109 5066 5133 5147 5369RP 653 6908 7085 706 7203

becomes 5651 and the average accuracy is only 6209Other methods such as PCA LPP and RP can produce thecomparable results and no one can outperform the otherHowever LDAoutperformsPCA LPP andRP yielding betterresult Similar conclusions can be achieved in the groupof RBF-SVM Generally proposed PD-LFDA significantlyoutperforms the rest in this experiment which indicates thecorrectness of improvements in proposed PD-LFDA

16 Mathematical Problems in Engineering

Table 4 Performance of dimension reduction on the whole labeled samples ()

Evaluated item MethodologiesPCA LDA LPP LFDA JGLDA RP PD-LFDA

7NNOverall accuracy 6868 739 6875 7923 6195 6709 8379Kappa coefficient 6471 7041 647 7653 5651 6294 8169Average accuracy 737 813 7607 8609 6209 7164 8991

RBF-SVMOverall accuracy 7992 7287 7674 8375 5824 7631 8486Kappa coefficient 7728 6928 7355 8151 5377 7327 8279Average accuracy 857 8071 8288 8822 7121 8365 8968

Finally details of assessment deduced by 7NN and RBF-SVM is summarized in Table 4 where the correspondingoverall accuracy kappa coefficient and average accuracy ofdifferent methods can be located collectively

5 Conclusions

In this paper we have analyzed local Fisher discriminantanalysis (LFDA) and found its weakness By replacing themaximum distance with local variance for the constructionof weight matrix introducing class prior probability into thecomputation of affinitymatrix an improved LFDA algorithmhas been proposed This novel approach is called PD-LFDAbecause the probability distribution (PD) is applied in LFDAalgorithm The proposed approach essentially can increasethe discriminant ability of transformed features in lowdimensional space The pattern found by the new approachis expected to be more accurate and coincides with thecharacter of HSI data and is conducive to classify HSI dataPD-LFDA has been evaluated on a real removing sensingAVIRIS Indian Pine 92AV3C data set We have compared theperformance of the proposed PD-LFDA with that of PCALPP LFDA LDA JGLDA and RP Both the numerical resultsand visual inspection of class maps have been obtained Inthe experiments KNN classifier and SVM classifier havebeen used We have argued that the proposed PD-LFDAexhibits the best performance and serves as a very effectivedimensionality reduction tool for high dimensional datasuch as hyperspectral image (HSI) data

Appendix

Procedure of Proposed Algorithm

The brief description of the algorithm to perform the pro-posed PD-LFDA method is already presented in Section 3The details of the algorithm are provided in Algorithm 2

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the Research of University ofMacau under Grants no MYRG205(Y1-L4)-FST11-TYY noMYRG187(Y1-L3)-FST11-TYY and no SRG010-FST11-TYYand by the National Natural Science Foundation of Chinaunder Grant no 61273244 This research project was alsosupported by the Science andTechnologyDevelopment Fund(FDCT) of Macau under Contract no 100-2012-A3

References

[1] W Li S Prasad Z Ye J E Fowler and M Cui ldquoLocality-preserving discriminant analysis for hyperspectral image clas-sification using local spatial informationrdquo in Proceedings ofthe 32nd IEEE International Geoscience and Remote SensingSymposium (IGARSS rsquo12) pp 4134ndash4137MunichGermany July2012

[2] HND LeM S Kim andD-H Kim ldquoComparison of singularvalue decomposition and principal component analysis appliedto hyperspectral imaging of biofilmrdquo in Proceedings of the IEEEPhotonics Conference (IPC rsquo12) pp 6ndash7 2012

[3] C K Chui and J Wang ldquoRandomized anisotropic transformfor nonlinear dimensionality reductionrdquo GEMmdashInternationalJournal on Geomathematics vol 1 no 1 pp 23ndash50 2010

[4] T V Bandos L Bruzzone and G Camps-Valls ldquoClassificationof hyperspectral images with regularized linear discriminantanalysisrdquo IEEE Transactions on Geoscience and Remote Sensingvol 47 no 3 pp 862ndash873 2009

[5] D Guangjun Z Yongsheng and J Song ldquoDimensionalityreduction of hyperspectral data based on ISOMAP algorithmrdquoin Proceedings of the 8th International Conference on ElectronicMeasurement and Instruments (ICEMI rsquo07) pp 3935ndash3938Xirsquoan China August 2007

[6] X Luo and M-F Jiang ldquoThe application of manifold learn-ing in dimensionality analysis for hyperspectral imageryrdquo inProceedings of the International Conference on Remote SensingEnvironment and Transportation Engineering (RSETE rsquo11) pp4572ndash4575 June 2011

[7] J Khodr and R Younes ldquoDimensionality reduction on hyper-spectral images a comparative review based on artificial datasrdquoin Proceedings of the 4th International Congress on Image andSignal Processing (CISP rsquo11) vol 4 pp 1875ndash1883 October 2011

Mathematical Problems in Engineering 17

[8] J Wen Z Tian H She and W Yan ldquoFeature extraction ofhyperspectral images based on preserving neighborhood dis-criminant embeddingrdquo in Proceedings of the 2nd InternationalConference on Image Analysis and Signal Processing (IASP rsquo10)pp 257ndash262 Zhejiang China April 2010

[9] Y-R Yeh S-Y Huang and Y-J Lee ldquoNonlinear dimensionreduction with kernel sliced inverse regressionrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 21 no 11 pp1590ndash1603 2009

[10] J He L Zhang Q Wang and Z Li ldquoUsing diffusion geometriccoordinates for hyperspectral imagery representationrdquo IEEEGeoscience and Remote Sensing Letters vol 6 no 4 pp 767ndash7712009

[11] J Peng P Zhang and N Riedel ldquoDiscriminant learninganalysisrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 38 no 6 pp 1614ndash1625 2008

[12] F S Tsai and K L Chan ldquoDimensionality reduction techniquesfor data explorationrdquo in Proceedings of the 6th InternationalConference on Information Communications and Signal Process-ing pp 1ndash5 December 2007

[13] M D Farrell Jr and R M Mersereau ldquoOn the impact of PCAdimension reduction for hyperspectral detection of difficulttargetsrdquo IEEE Geoscience and Remote Sensing Letters vol 2 no2 pp 192ndash195 2005

[14] S Prasad and L M Bruce ldquoLimitations of principal com-ponents analysis for hyperspectral target recognitionrdquo IEEEGeoscience and Remote Sensing Letters vol 5 no 4 pp 625ndash629 2008

[15] J Yu Q Tian T Rui and T S Huang ldquoIntegrating discriminantand descriptive information for dimension reduction and clas-sificationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 17 no 3 pp 372ndash377 2007

[16] J Kong S Wang J Wang L Ma B Fu and Y Lu ldquoAnovel approach for face recognition based on supervisedlocality preserving projection and maximummargin criterionrdquoin Proceedings of the International Conference on ComputerEngineering and Technology (ICCET rsquo09) vol 1 pp 419ndash423Singapore January 2009

[17] M Loog and R P W Duin ldquoLinear dimensionality reductionvia a heteroscedastic extension of LDA the Chernoff criterionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 26 no 6 pp 732ndash739 2004

[18] A C Jensen A Berge and A S Solberg ldquoRegressionapproaches to small sample inverse covariance matrix estima-tion for hyperspectral image classificationrdquo IEEE TransactionsonGeoscience andRemote Sensing vol 46 no 10 pp 2814ndash28222008

[19] J Jin BWang and L Zhang ldquoA novel approach based on Fisherdiscriminant null space for decomposition of mixed pixels inhyperspectral imageryrdquo IEEE Geoscience and Remote SensingLetters vol 7 no 4 pp 699ndash703 2010

[20] J Wen Z Tian X Liu and W Lin ldquoNeighborhood preservingorthogonal pnmf feature extraction for hyperspectral imageclassificationrdquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 6 no 2 pp 759ndash7682013

[21] Y Ren G Zhang G Yu and X Li ldquoLocal and global structurepreserving based feature selectionrdquoNeurocomputing vol 89 pp147ndash157 2012

[22] Z Fan Y Xu and D Zhang ldquoLocal linear discriminant analysisframework using sample neighborsrdquo IEEE Transactions onNeural Networks vol 22 no 7 pp 1119ndash1132 2011

[23] Y Wang S Huang D Liu and B Wang ldquoResearch advanceon band selection-based dimension reduction of hyperspectralremote sensing imagesrdquo in Proceedings of the 2nd InternationalConference on Remote Sensing Environment and TransportationEngineering (RSETE rsquo12) pp 1ndash4 IEEE Nanjing China June2012

[24] B Waske S van der Linden J A Benediktsson A Rabe andP Hostert ldquoSensitivity of support vector machines to randomfeature selection in classification of hyperspectral datardquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 7pp 2880ndash2889 2010

[25] M Sugiyama ldquoLocal fisher discriminant analysis for superviseddimensionality reductionrdquo in Proceedings of the 23rd Interna-tional Conference onMachine Learning (ICML rsquo06) pp 905ndash912ACM June 2006

[26] Y Cheng ldquoMean shift mode seeking and clusteringrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol17 no 8 pp 790ndash799 1995

[27] W Li S Prasad J E Fowler and L M Bruce ldquoLocality-preserving dimensionality reduction and classification forhyperspectral image analysisrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 50 no 4 pp 1185ndash1198 2012

[28] L Zelnik-Manor and P Perona ldquoSelf-tuning spectral cluster-ingrdquo in Proceedings of the 18th Annual Conference on NeuralInformation Processing Systems pp 1601ndash1608 December 2004

[29] W K Wong and H T Zhao ldquoSupervised optimal localitypreserving projectionrdquo Pattern Recognition vol 45 no 1 pp186ndash197 2012

[30] X He and P Niyogi ldquoLocality preserving projectionsrdquo inAdvances in Neural Information Processing Systems 16 SThrunL Saul and B Scholkopf Eds MIT Press Cambridge MassUSA 2004

[31] H Wang S Chen Z Hu and W Zheng ldquoLocality-preservedmaximum information projectionrdquo IEEE Transactions on Neu-ral Networks vol 19 no 4 pp 571ndash585 2008

[32] M Sugiyama ldquoDimensionality reduction ofmultimodal labeleddata by local fisher discriminant analysisrdquo Journal of MachineLearning Research vol 8 pp 1027ndash1061 2007

[33] S Yan D Xu B Zhang H J Zhang Q Yang and S Lin ldquoGraphembedding and extensions a general framework for dimen-sionality reductionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 40ndash51 2007

[34] C K Chui and J Wang ldquoDimensionality reduction of hyper-spectral imagery data for feature classificationrdquo in Handbookof Geomathematics pp 1005ndash1047 Springer Heidelberg Ger-many 2010

[35] C-C Chang and C-J Lin ldquoLIBSVM a Library for supportvector machinesrdquo ACM Transactions on Intelligent Systems andTechnology vol 2 no 3 article 27 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 15: Research Article Subspace Learning via Local Probability ...downloads.hindawi.com/journals/mpe/2015/145136.pdf · and applicable toolkits were engendered one a er another. Hyperspectral

Mathematical Problems in Engineering 15

837981698991

(a) PD-LFDA-7NN

848682798968

(b) D-LFDA-RBFSVM

Figure 6 Classified map of the proposed method

Table 2 Overall accuracy obtained using different dimensions offeature and different classifiers for Indian Pine Scene database ()

Item Dims7 9 11 13 15

3NNPCA 7007 6996 7026 7035 7087LPP 6575 6871 6835 6695 6546LFDA 7513 7895 7963 7992 8009LDA 6415 6606 6616 6596 6536PD-LFDA 7713 8094 816 8218 8263JGLDA 5405 5602 5704 5642 5751RP 6173 6444 6745 6632 6652

7NNPCA 6902 6902 691 6933 6972LPP 6782 7057 7027 6892 6681LFDA 7419 7777 7827 7863 7862LDA 6615 6909 6899 692 6882PD-LFDA 7736 809 8155 8172 8235JGLDA 5665 5811 5809 5812 5914RP 6194 6481 6696 6613 6603

RBF-SVMPCA 8051 7794 7718 7935 8176LPP 7101 7514 7614 7501 7314LFDA 7826 8187 82 8113 7915LDA 677 6834 6877 6642 6792PD-LFDA 7858 8267 8395 8423 8166JGLDA 5686 5606 5663 5689 5879RP 6957 7282 7441 7416 7543

also included at the top of each map respectively Figure 5displays the classifiedmaps of classicmethods in pseudocolorimages and the classified maps of proposed PD-LFDA arepresented in Figure 6 It can be observed from these mapsthat the best performance is achieved by PD-LFDA when7NN classifier is used in this case the overall accuracyis 8379 the kappa coefficient is 8169 and the averageaccuracy is 8991 Moreover the worst algorithm is JGLDAwhose overall accuracy is 6195 the kappa coefficient

Table 3 Kappa coefficient by different dimension reduction meth-ods and different classifiers applied to Indian Pine Scene database()

Item Dims7 9 11 13 15

3NNPCA 6592 658 6612 6624 6681LPP 6084 6422 6382 6219 605LFDA 7145 7586 7664 7697 7718LDA 5913 6126 6137 6118 6048PD-LFDA 7392 7824 7901 7966 8016JGLDA 4778 4992 5082 5006 5119RP 564 5963 6303 6172 6192

7NNPCA 6466 6466 6475 6501 6545LPP 6304 6629 6585 6429 6185LFDA 7047 7457 7515 7557 7557LDA 6132 646 6453 6478 6431PD-LFDA 7417 782 7893 7913 7985JGLDA 5029 519 5161 5159 5263RP 5638 5992 6237 6135 6124

RBF-SVMPCA 7767 747 7393 7636 7900LPP 6677 715 7264 7115 6921LFDA 7486 7915 7924 7822 7584LDA 629 6362 6427 6153 6323PD-LFDA 7551 8011 8159 8187 7899JGLDA 5109 5066 5133 5147 5369RP 653 6908 7085 706 7203

becomes 5651 and the average accuracy is only 6209Other methods such as PCA LPP and RP can produce thecomparable results and no one can outperform the otherHowever LDAoutperformsPCA LPP andRP yielding betterresult Similar conclusions can be achieved in the groupof RBF-SVM Generally proposed PD-LFDA significantlyoutperforms the rest in this experiment which indicates thecorrectness of improvements in proposed PD-LFDA

16 Mathematical Problems in Engineering

Table 4 Performance of dimension reduction on the whole labeled samples ()

Evaluated item MethodologiesPCA LDA LPP LFDA JGLDA RP PD-LFDA

7NNOverall accuracy 6868 739 6875 7923 6195 6709 8379Kappa coefficient 6471 7041 647 7653 5651 6294 8169Average accuracy 737 813 7607 8609 6209 7164 8991

RBF-SVMOverall accuracy 7992 7287 7674 8375 5824 7631 8486Kappa coefficient 7728 6928 7355 8151 5377 7327 8279Average accuracy 857 8071 8288 8822 7121 8365 8968

Finally details of assessment deduced by 7NN and RBF-SVM is summarized in Table 4 where the correspondingoverall accuracy kappa coefficient and average accuracy ofdifferent methods can be located collectively

5 Conclusions

In this paper we have analyzed local Fisher discriminantanalysis (LFDA) and found its weakness By replacing themaximum distance with local variance for the constructionof weight matrix introducing class prior probability into thecomputation of affinitymatrix an improved LFDA algorithmhas been proposed This novel approach is called PD-LFDAbecause the probability distribution (PD) is applied in LFDAalgorithm The proposed approach essentially can increasethe discriminant ability of transformed features in lowdimensional space The pattern found by the new approachis expected to be more accurate and coincides with thecharacter of HSI data and is conducive to classify HSI dataPD-LFDA has been evaluated on a real removing sensingAVIRIS Indian Pine 92AV3C data set We have compared theperformance of the proposed PD-LFDA with that of PCALPP LFDA LDA JGLDA and RP Both the numerical resultsand visual inspection of class maps have been obtained Inthe experiments KNN classifier and SVM classifier havebeen used We have argued that the proposed PD-LFDAexhibits the best performance and serves as a very effectivedimensionality reduction tool for high dimensional datasuch as hyperspectral image (HSI) data

Appendix

Procedure of Proposed Algorithm

The brief description of the algorithm to perform the pro-posed PD-LFDA method is already presented in Section 3The details of the algorithm are provided in Algorithm 2

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the Research of University ofMacau under Grants no MYRG205(Y1-L4)-FST11-TYY noMYRG187(Y1-L3)-FST11-TYY and no SRG010-FST11-TYYand by the National Natural Science Foundation of Chinaunder Grant no 61273244 This research project was alsosupported by the Science andTechnologyDevelopment Fund(FDCT) of Macau under Contract no 100-2012-A3

References

[1] W Li S Prasad Z Ye J E Fowler and M Cui ldquoLocality-preserving discriminant analysis for hyperspectral image clas-sification using local spatial informationrdquo in Proceedings ofthe 32nd IEEE International Geoscience and Remote SensingSymposium (IGARSS rsquo12) pp 4134ndash4137MunichGermany July2012

[2] HND LeM S Kim andD-H Kim ldquoComparison of singularvalue decomposition and principal component analysis appliedto hyperspectral imaging of biofilmrdquo in Proceedings of the IEEEPhotonics Conference (IPC rsquo12) pp 6ndash7 2012

[3] C K Chui and J Wang ldquoRandomized anisotropic transformfor nonlinear dimensionality reductionrdquo GEMmdashInternationalJournal on Geomathematics vol 1 no 1 pp 23ndash50 2010

[4] T V Bandos L Bruzzone and G Camps-Valls ldquoClassificationof hyperspectral images with regularized linear discriminantanalysisrdquo IEEE Transactions on Geoscience and Remote Sensingvol 47 no 3 pp 862ndash873 2009

[5] D Guangjun Z Yongsheng and J Song ldquoDimensionalityreduction of hyperspectral data based on ISOMAP algorithmrdquoin Proceedings of the 8th International Conference on ElectronicMeasurement and Instruments (ICEMI rsquo07) pp 3935ndash3938Xirsquoan China August 2007

[6] X Luo and M-F Jiang ldquoThe application of manifold learn-ing in dimensionality analysis for hyperspectral imageryrdquo inProceedings of the International Conference on Remote SensingEnvironment and Transportation Engineering (RSETE rsquo11) pp4572ndash4575 June 2011

[7] J Khodr and R Younes ldquoDimensionality reduction on hyper-spectral images a comparative review based on artificial datasrdquoin Proceedings of the 4th International Congress on Image andSignal Processing (CISP rsquo11) vol 4 pp 1875ndash1883 October 2011

Mathematical Problems in Engineering 17

[8] J Wen Z Tian H She and W Yan ldquoFeature extraction ofhyperspectral images based on preserving neighborhood dis-criminant embeddingrdquo in Proceedings of the 2nd InternationalConference on Image Analysis and Signal Processing (IASP rsquo10)pp 257ndash262 Zhejiang China April 2010

[9] Y-R Yeh S-Y Huang and Y-J Lee ldquoNonlinear dimensionreduction with kernel sliced inverse regressionrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 21 no 11 pp1590ndash1603 2009

[10] J He L Zhang Q Wang and Z Li ldquoUsing diffusion geometriccoordinates for hyperspectral imagery representationrdquo IEEEGeoscience and Remote Sensing Letters vol 6 no 4 pp 767ndash7712009

[11] J Peng P Zhang and N Riedel ldquoDiscriminant learninganalysisrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 38 no 6 pp 1614ndash1625 2008

[12] F S Tsai and K L Chan ldquoDimensionality reduction techniquesfor data explorationrdquo in Proceedings of the 6th InternationalConference on Information Communications and Signal Process-ing pp 1ndash5 December 2007

[13] M D Farrell Jr and R M Mersereau ldquoOn the impact of PCAdimension reduction for hyperspectral detection of difficulttargetsrdquo IEEE Geoscience and Remote Sensing Letters vol 2 no2 pp 192ndash195 2005

[14] S Prasad and L M Bruce ldquoLimitations of principal com-ponents analysis for hyperspectral target recognitionrdquo IEEEGeoscience and Remote Sensing Letters vol 5 no 4 pp 625ndash629 2008

[15] J Yu Q Tian T Rui and T S Huang ldquoIntegrating discriminantand descriptive information for dimension reduction and clas-sificationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 17 no 3 pp 372ndash377 2007

[16] J Kong S Wang J Wang L Ma B Fu and Y Lu ldquoAnovel approach for face recognition based on supervisedlocality preserving projection and maximummargin criterionrdquoin Proceedings of the International Conference on ComputerEngineering and Technology (ICCET rsquo09) vol 1 pp 419ndash423Singapore January 2009

[17] M Loog and R P W Duin ldquoLinear dimensionality reductionvia a heteroscedastic extension of LDA the Chernoff criterionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 26 no 6 pp 732ndash739 2004

[18] A C Jensen A Berge and A S Solberg ldquoRegressionapproaches to small sample inverse covariance matrix estima-tion for hyperspectral image classificationrdquo IEEE TransactionsonGeoscience andRemote Sensing vol 46 no 10 pp 2814ndash28222008

[19] J Jin BWang and L Zhang ldquoA novel approach based on Fisherdiscriminant null space for decomposition of mixed pixels inhyperspectral imageryrdquo IEEE Geoscience and Remote SensingLetters vol 7 no 4 pp 699ndash703 2010

[20] J Wen Z Tian X Liu and W Lin ldquoNeighborhood preservingorthogonal pnmf feature extraction for hyperspectral imageclassificationrdquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 6 no 2 pp 759ndash7682013

[21] Y Ren G Zhang G Yu and X Li ldquoLocal and global structurepreserving based feature selectionrdquoNeurocomputing vol 89 pp147ndash157 2012

[22] Z Fan Y Xu and D Zhang ldquoLocal linear discriminant analysisframework using sample neighborsrdquo IEEE Transactions onNeural Networks vol 22 no 7 pp 1119ndash1132 2011

[23] Y Wang S Huang D Liu and B Wang ldquoResearch advanceon band selection-based dimension reduction of hyperspectralremote sensing imagesrdquo in Proceedings of the 2nd InternationalConference on Remote Sensing Environment and TransportationEngineering (RSETE rsquo12) pp 1ndash4 IEEE Nanjing China June2012

[24] B Waske S van der Linden J A Benediktsson A Rabe andP Hostert ldquoSensitivity of support vector machines to randomfeature selection in classification of hyperspectral datardquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 7pp 2880ndash2889 2010

[25] M Sugiyama ldquoLocal fisher discriminant analysis for superviseddimensionality reductionrdquo in Proceedings of the 23rd Interna-tional Conference onMachine Learning (ICML rsquo06) pp 905ndash912ACM June 2006

[26] Y Cheng ldquoMean shift mode seeking and clusteringrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol17 no 8 pp 790ndash799 1995

[27] W Li S Prasad J E Fowler and L M Bruce ldquoLocality-preserving dimensionality reduction and classification forhyperspectral image analysisrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 50 no 4 pp 1185ndash1198 2012

[28] L Zelnik-Manor and P Perona ldquoSelf-tuning spectral cluster-ingrdquo in Proceedings of the 18th Annual Conference on NeuralInformation Processing Systems pp 1601ndash1608 December 2004

[29] W K Wong and H T Zhao ldquoSupervised optimal localitypreserving projectionrdquo Pattern Recognition vol 45 no 1 pp186ndash197 2012

[30] X He and P Niyogi ldquoLocality preserving projectionsrdquo inAdvances in Neural Information Processing Systems 16 SThrunL Saul and B Scholkopf Eds MIT Press Cambridge MassUSA 2004

[31] H Wang S Chen Z Hu and W Zheng ldquoLocality-preservedmaximum information projectionrdquo IEEE Transactions on Neu-ral Networks vol 19 no 4 pp 571ndash585 2008

[32] M Sugiyama ldquoDimensionality reduction ofmultimodal labeleddata by local fisher discriminant analysisrdquo Journal of MachineLearning Research vol 8 pp 1027ndash1061 2007

[33] S Yan D Xu B Zhang H J Zhang Q Yang and S Lin ldquoGraphembedding and extensions a general framework for dimen-sionality reductionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 40ndash51 2007

[34] C K Chui and J Wang ldquoDimensionality reduction of hyper-spectral imagery data for feature classificationrdquo in Handbookof Geomathematics pp 1005ndash1047 Springer Heidelberg Ger-many 2010

[35] C-C Chang and C-J Lin ldquoLIBSVM a Library for supportvector machinesrdquo ACM Transactions on Intelligent Systems andTechnology vol 2 no 3 article 27 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 16: Research Article Subspace Learning via Local Probability ...downloads.hindawi.com/journals/mpe/2015/145136.pdf · and applicable toolkits were engendered one a er another. Hyperspectral

16 Mathematical Problems in Engineering

Table 4 Performance of dimension reduction on the whole labeled samples ()

Evaluated item MethodologiesPCA LDA LPP LFDA JGLDA RP PD-LFDA

7NNOverall accuracy 6868 739 6875 7923 6195 6709 8379Kappa coefficient 6471 7041 647 7653 5651 6294 8169Average accuracy 737 813 7607 8609 6209 7164 8991

RBF-SVMOverall accuracy 7992 7287 7674 8375 5824 7631 8486Kappa coefficient 7728 6928 7355 8151 5377 7327 8279Average accuracy 857 8071 8288 8822 7121 8365 8968

Finally details of assessment deduced by 7NN and RBF-SVM is summarized in Table 4 where the correspondingoverall accuracy kappa coefficient and average accuracy ofdifferent methods can be located collectively

5 Conclusions

In this paper we have analyzed local Fisher discriminantanalysis (LFDA) and found its weakness By replacing themaximum distance with local variance for the constructionof weight matrix introducing class prior probability into thecomputation of affinitymatrix an improved LFDA algorithmhas been proposed This novel approach is called PD-LFDAbecause the probability distribution (PD) is applied in LFDAalgorithm The proposed approach essentially can increasethe discriminant ability of transformed features in lowdimensional space The pattern found by the new approachis expected to be more accurate and coincides with thecharacter of HSI data and is conducive to classify HSI dataPD-LFDA has been evaluated on a real removing sensingAVIRIS Indian Pine 92AV3C data set We have compared theperformance of the proposed PD-LFDA with that of PCALPP LFDA LDA JGLDA and RP Both the numerical resultsand visual inspection of class maps have been obtained Inthe experiments KNN classifier and SVM classifier havebeen used We have argued that the proposed PD-LFDAexhibits the best performance and serves as a very effectivedimensionality reduction tool for high dimensional datasuch as hyperspectral image (HSI) data

Appendix

Procedure of Proposed Algorithm

The brief description of the algorithm to perform the pro-posed PD-LFDA method is already presented in Section 3The details of the algorithm are provided in Algorithm 2

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the Research of University ofMacau under Grants no MYRG205(Y1-L4)-FST11-TYY noMYRG187(Y1-L3)-FST11-TYY and no SRG010-FST11-TYYand by the National Natural Science Foundation of Chinaunder Grant no 61273244 This research project was alsosupported by the Science andTechnologyDevelopment Fund(FDCT) of Macau under Contract no 100-2012-A3

References

[1] W Li S Prasad Z Ye J E Fowler and M Cui ldquoLocality-preserving discriminant analysis for hyperspectral image clas-sification using local spatial informationrdquo in Proceedings ofthe 32nd IEEE International Geoscience and Remote SensingSymposium (IGARSS rsquo12) pp 4134ndash4137MunichGermany July2012

[2] HND LeM S Kim andD-H Kim ldquoComparison of singularvalue decomposition and principal component analysis appliedto hyperspectral imaging of biofilmrdquo in Proceedings of the IEEEPhotonics Conference (IPC rsquo12) pp 6ndash7 2012

[3] C K Chui and J Wang ldquoRandomized anisotropic transformfor nonlinear dimensionality reductionrdquo GEMmdashInternationalJournal on Geomathematics vol 1 no 1 pp 23ndash50 2010

[4] T V Bandos L Bruzzone and G Camps-Valls ldquoClassificationof hyperspectral images with regularized linear discriminantanalysisrdquo IEEE Transactions on Geoscience and Remote Sensingvol 47 no 3 pp 862ndash873 2009

[5] D Guangjun Z Yongsheng and J Song ldquoDimensionalityreduction of hyperspectral data based on ISOMAP algorithmrdquoin Proceedings of the 8th International Conference on ElectronicMeasurement and Instruments (ICEMI rsquo07) pp 3935ndash3938Xirsquoan China August 2007

[6] X Luo and M-F Jiang ldquoThe application of manifold learn-ing in dimensionality analysis for hyperspectral imageryrdquo inProceedings of the International Conference on Remote SensingEnvironment and Transportation Engineering (RSETE rsquo11) pp4572ndash4575 June 2011

[7] J Khodr and R Younes ldquoDimensionality reduction on hyper-spectral images a comparative review based on artificial datasrdquoin Proceedings of the 4th International Congress on Image andSignal Processing (CISP rsquo11) vol 4 pp 1875ndash1883 October 2011

Mathematical Problems in Engineering 17

[8] J Wen Z Tian H She and W Yan ldquoFeature extraction ofhyperspectral images based on preserving neighborhood dis-criminant embeddingrdquo in Proceedings of the 2nd InternationalConference on Image Analysis and Signal Processing (IASP rsquo10)pp 257ndash262 Zhejiang China April 2010

[9] Y-R Yeh S-Y Huang and Y-J Lee ldquoNonlinear dimensionreduction with kernel sliced inverse regressionrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 21 no 11 pp1590ndash1603 2009

[10] J He L Zhang Q Wang and Z Li ldquoUsing diffusion geometriccoordinates for hyperspectral imagery representationrdquo IEEEGeoscience and Remote Sensing Letters vol 6 no 4 pp 767ndash7712009

[11] J Peng P Zhang and N Riedel ldquoDiscriminant learninganalysisrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 38 no 6 pp 1614ndash1625 2008

[12] F S Tsai and K L Chan ldquoDimensionality reduction techniquesfor data explorationrdquo in Proceedings of the 6th InternationalConference on Information Communications and Signal Process-ing pp 1ndash5 December 2007

[13] M D Farrell Jr and R M Mersereau ldquoOn the impact of PCAdimension reduction for hyperspectral detection of difficulttargetsrdquo IEEE Geoscience and Remote Sensing Letters vol 2 no2 pp 192ndash195 2005

[14] S Prasad and L M Bruce ldquoLimitations of principal com-ponents analysis for hyperspectral target recognitionrdquo IEEEGeoscience and Remote Sensing Letters vol 5 no 4 pp 625ndash629 2008

[15] J Yu Q Tian T Rui and T S Huang ldquoIntegrating discriminantand descriptive information for dimension reduction and clas-sificationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 17 no 3 pp 372ndash377 2007

[16] J Kong S Wang J Wang L Ma B Fu and Y Lu ldquoAnovel approach for face recognition based on supervisedlocality preserving projection and maximummargin criterionrdquoin Proceedings of the International Conference on ComputerEngineering and Technology (ICCET rsquo09) vol 1 pp 419ndash423Singapore January 2009

[17] M Loog and R P W Duin ldquoLinear dimensionality reductionvia a heteroscedastic extension of LDA the Chernoff criterionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 26 no 6 pp 732ndash739 2004

[18] A C Jensen A Berge and A S Solberg ldquoRegressionapproaches to small sample inverse covariance matrix estima-tion for hyperspectral image classificationrdquo IEEE TransactionsonGeoscience andRemote Sensing vol 46 no 10 pp 2814ndash28222008

[19] J Jin BWang and L Zhang ldquoA novel approach based on Fisherdiscriminant null space for decomposition of mixed pixels inhyperspectral imageryrdquo IEEE Geoscience and Remote SensingLetters vol 7 no 4 pp 699ndash703 2010

[20] J Wen Z Tian X Liu and W Lin ldquoNeighborhood preservingorthogonal pnmf feature extraction for hyperspectral imageclassificationrdquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 6 no 2 pp 759ndash7682013

[21] Y Ren G Zhang G Yu and X Li ldquoLocal and global structurepreserving based feature selectionrdquoNeurocomputing vol 89 pp147ndash157 2012

[22] Z Fan Y Xu and D Zhang ldquoLocal linear discriminant analysisframework using sample neighborsrdquo IEEE Transactions onNeural Networks vol 22 no 7 pp 1119ndash1132 2011

[23] Y Wang S Huang D Liu and B Wang ldquoResearch advanceon band selection-based dimension reduction of hyperspectralremote sensing imagesrdquo in Proceedings of the 2nd InternationalConference on Remote Sensing Environment and TransportationEngineering (RSETE rsquo12) pp 1ndash4 IEEE Nanjing China June2012

[24] B Waske S van der Linden J A Benediktsson A Rabe andP Hostert ldquoSensitivity of support vector machines to randomfeature selection in classification of hyperspectral datardquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 7pp 2880ndash2889 2010

[25] M Sugiyama ldquoLocal fisher discriminant analysis for superviseddimensionality reductionrdquo in Proceedings of the 23rd Interna-tional Conference onMachine Learning (ICML rsquo06) pp 905ndash912ACM June 2006

[26] Y Cheng ldquoMean shift mode seeking and clusteringrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol17 no 8 pp 790ndash799 1995

[27] W Li S Prasad J E Fowler and L M Bruce ldquoLocality-preserving dimensionality reduction and classification forhyperspectral image analysisrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 50 no 4 pp 1185ndash1198 2012

[28] L Zelnik-Manor and P Perona ldquoSelf-tuning spectral cluster-ingrdquo in Proceedings of the 18th Annual Conference on NeuralInformation Processing Systems pp 1601ndash1608 December 2004

[29] W K Wong and H T Zhao ldquoSupervised optimal localitypreserving projectionrdquo Pattern Recognition vol 45 no 1 pp186ndash197 2012

[30] X He and P Niyogi ldquoLocality preserving projectionsrdquo inAdvances in Neural Information Processing Systems 16 SThrunL Saul and B Scholkopf Eds MIT Press Cambridge MassUSA 2004

[31] H Wang S Chen Z Hu and W Zheng ldquoLocality-preservedmaximum information projectionrdquo IEEE Transactions on Neu-ral Networks vol 19 no 4 pp 571ndash585 2008

[32] M Sugiyama ldquoDimensionality reduction ofmultimodal labeleddata by local fisher discriminant analysisrdquo Journal of MachineLearning Research vol 8 pp 1027ndash1061 2007

[33] S Yan D Xu B Zhang H J Zhang Q Yang and S Lin ldquoGraphembedding and extensions a general framework for dimen-sionality reductionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 40ndash51 2007

[34] C K Chui and J Wang ldquoDimensionality reduction of hyper-spectral imagery data for feature classificationrdquo in Handbookof Geomathematics pp 1005ndash1047 Springer Heidelberg Ger-many 2010

[35] C-C Chang and C-J Lin ldquoLIBSVM a Library for supportvector machinesrdquo ACM Transactions on Intelligent Systems andTechnology vol 2 no 3 article 27 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 17: Research Article Subspace Learning via Local Probability ...downloads.hindawi.com/journals/mpe/2015/145136.pdf · and applicable toolkits were engendered one a er another. Hyperspectral

Mathematical Problems in Engineering 17

[8] J Wen Z Tian H She and W Yan ldquoFeature extraction ofhyperspectral images based on preserving neighborhood dis-criminant embeddingrdquo in Proceedings of the 2nd InternationalConference on Image Analysis and Signal Processing (IASP rsquo10)pp 257ndash262 Zhejiang China April 2010

[9] Y-R Yeh S-Y Huang and Y-J Lee ldquoNonlinear dimensionreduction with kernel sliced inverse regressionrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 21 no 11 pp1590ndash1603 2009

[10] J He L Zhang Q Wang and Z Li ldquoUsing diffusion geometriccoordinates for hyperspectral imagery representationrdquo IEEEGeoscience and Remote Sensing Letters vol 6 no 4 pp 767ndash7712009

[11] J Peng P Zhang and N Riedel ldquoDiscriminant learninganalysisrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 38 no 6 pp 1614ndash1625 2008

[12] F S Tsai and K L Chan ldquoDimensionality reduction techniquesfor data explorationrdquo in Proceedings of the 6th InternationalConference on Information Communications and Signal Process-ing pp 1ndash5 December 2007

[13] M D Farrell Jr and R M Mersereau ldquoOn the impact of PCAdimension reduction for hyperspectral detection of difficulttargetsrdquo IEEE Geoscience and Remote Sensing Letters vol 2 no2 pp 192ndash195 2005

[14] S Prasad and L M Bruce ldquoLimitations of principal com-ponents analysis for hyperspectral target recognitionrdquo IEEEGeoscience and Remote Sensing Letters vol 5 no 4 pp 625ndash629 2008

[15] J Yu Q Tian T Rui and T S Huang ldquoIntegrating discriminantand descriptive information for dimension reduction and clas-sificationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology vol 17 no 3 pp 372ndash377 2007

[16] J Kong S Wang J Wang L Ma B Fu and Y Lu ldquoAnovel approach for face recognition based on supervisedlocality preserving projection and maximummargin criterionrdquoin Proceedings of the International Conference on ComputerEngineering and Technology (ICCET rsquo09) vol 1 pp 419ndash423Singapore January 2009

[17] M Loog and R P W Duin ldquoLinear dimensionality reductionvia a heteroscedastic extension of LDA the Chernoff criterionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 26 no 6 pp 732ndash739 2004

[18] A C Jensen A Berge and A S Solberg ldquoRegressionapproaches to small sample inverse covariance matrix estima-tion for hyperspectral image classificationrdquo IEEE TransactionsonGeoscience andRemote Sensing vol 46 no 10 pp 2814ndash28222008

[19] J Jin BWang and L Zhang ldquoA novel approach based on Fisherdiscriminant null space for decomposition of mixed pixels inhyperspectral imageryrdquo IEEE Geoscience and Remote SensingLetters vol 7 no 4 pp 699ndash703 2010

[20] J Wen Z Tian X Liu and W Lin ldquoNeighborhood preservingorthogonal pnmf feature extraction for hyperspectral imageclassificationrdquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 6 no 2 pp 759ndash7682013

[21] Y Ren G Zhang G Yu and X Li ldquoLocal and global structurepreserving based feature selectionrdquoNeurocomputing vol 89 pp147ndash157 2012

[22] Z Fan Y Xu and D Zhang ldquoLocal linear discriminant analysisframework using sample neighborsrdquo IEEE Transactions onNeural Networks vol 22 no 7 pp 1119ndash1132 2011

[23] Y Wang S Huang D Liu and B Wang ldquoResearch advanceon band selection-based dimension reduction of hyperspectralremote sensing imagesrdquo in Proceedings of the 2nd InternationalConference on Remote Sensing Environment and TransportationEngineering (RSETE rsquo12) pp 1ndash4 IEEE Nanjing China June2012

[24] B Waske S van der Linden J A Benediktsson A Rabe andP Hostert ldquoSensitivity of support vector machines to randomfeature selection in classification of hyperspectral datardquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 7pp 2880ndash2889 2010

[25] M Sugiyama ldquoLocal fisher discriminant analysis for superviseddimensionality reductionrdquo in Proceedings of the 23rd Interna-tional Conference onMachine Learning (ICML rsquo06) pp 905ndash912ACM June 2006

[26] Y Cheng ldquoMean shift mode seeking and clusteringrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol17 no 8 pp 790ndash799 1995

[27] W Li S Prasad J E Fowler and L M Bruce ldquoLocality-preserving dimensionality reduction and classification forhyperspectral image analysisrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 50 no 4 pp 1185ndash1198 2012

[28] L Zelnik-Manor and P Perona ldquoSelf-tuning spectral cluster-ingrdquo in Proceedings of the 18th Annual Conference on NeuralInformation Processing Systems pp 1601ndash1608 December 2004

[29] W K Wong and H T Zhao ldquoSupervised optimal localitypreserving projectionrdquo Pattern Recognition vol 45 no 1 pp186ndash197 2012

[30] X He and P Niyogi ldquoLocality preserving projectionsrdquo inAdvances in Neural Information Processing Systems 16 SThrunL Saul and B Scholkopf Eds MIT Press Cambridge MassUSA 2004

[31] H Wang S Chen Z Hu and W Zheng ldquoLocality-preservedmaximum information projectionrdquo IEEE Transactions on Neu-ral Networks vol 19 no 4 pp 571ndash585 2008

[32] M Sugiyama ldquoDimensionality reduction ofmultimodal labeleddata by local fisher discriminant analysisrdquo Journal of MachineLearning Research vol 8 pp 1027ndash1061 2007

[33] S Yan D Xu B Zhang H J Zhang Q Yang and S Lin ldquoGraphembedding and extensions a general framework for dimen-sionality reductionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 40ndash51 2007

[34] C K Chui and J Wang ldquoDimensionality reduction of hyper-spectral imagery data for feature classificationrdquo in Handbookof Geomathematics pp 1005ndash1047 Springer Heidelberg Ger-many 2010

[35] C-C Chang and C-J Lin ldquoLIBSVM a Library for supportvector machinesrdquo ACM Transactions on Intelligent Systems andTechnology vol 2 no 3 article 27 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 18: Research Article Subspace Learning via Local Probability ...downloads.hindawi.com/journals/mpe/2015/145136.pdf · and applicable toolkits were engendered one a er another. Hyperspectral

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of