Classification of Hyperspectral Images Based on Multiclass...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1

Classification of Hyperspectral Images Based onMulticlass Spatial–Spectral Generative

Adversarial NetworksJie Feng , Member, IEEE, Haipeng Yu, Lin Wang, Xianghai Cao , Member, IEEE,

Xiangrong Zhang , Senior Member, IEEE, and Licheng Jiao, Fellow, IEEE

Abstract— Generative adversarial networks (GANs) arefamous for generating samples by training a generator and adiscriminator via an adversarial procedure. For hyperspectralimage classification, the collection of samples is always diffi-cult. However, directly applying GAN to hyperspectral imageclassification exists two problems. One is that the generatedsamples lack discriminative information. Meanwhile, the dis-criminator has no discriminative ability for multiclassification.Another is that spatial and spectral information requires to beconsidered in hyperspectral image classification simultaneously.To address these problems, a novel multiclass spatial–spectralGAN (MSGAN) method is proposed. In MSGAN, two gener-ators are devised to generate the samples containing spatialand spectral information, respectively, and the discriminatoris devised to extract joint spatial–spectral features and outputmulticlass probabilities. Moreover, novel adversarial objectivesfor multiclass are defined. The discriminator is devised topredict training samples belonging to true classes and generatedsamples belonging to all the classes with the same probability.The generators are devised to make the discriminator mistake.By adversarial learning between the discriminator and gen-erators, the classification performance of the discriminator ispromoted with the assistance of discriminative generated samples.Experimental results on hyperspectral images demonstrate thatthe proposed method achieves encouraging classification perfor-mance compared with several state-of-the-art methods, especiallywith the limited training samples.

Index Terms— Adversarial learning, convolutional neuralnetwork (CNN), generative adversarial networks (GANs),hyperspectral images, spatial–spectral information.

Manuscript received December 25, 2018; revised January 17, 2019; acceptedJanuary 23, 2019. This work was supported in part by the National NaturalScience Foundation of China under Grant 61871306, Grant 61772400, andGrant 61773304, in part by the Project Funded by China Postdoctoral ScienceFoundation under Grant 2015M570816 and Grant 2016T90892, in part bythe State Key Program of National Natural Science of China under Grant61836009, in part by the Open Research Fund of Key Laboratory ofSpectral Imaging Technology, Chinese Academy of Sciences, under GrantLSIT201803D, in part by the Fundamental Research Funds for the CentralUniversities under Grant JBX181707, in part by the Postdoctoral ResearchProgram in Shaanxi Province of China, and in part by the Joint Fund ofthe Equipment Research of Ministry of Education. (Corresponding author:Jie Feng.)

The authors are with the Key Laboratory of Intelligent Perceptionand Image Understanding, Ministry of Education of China, Xidian Uni-versity, Xi’an 710071, China (e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TGRS.2019.2899057

I. INTRODUCTION

THE development of hyperspectral sensors is a signifi-cant breakthrough in the remote sensing. Hyperspectral

sensors collect images in hundreds of narrow and contiguousspectral bands simultaneously, with wavelengths spanning thevisible to infrared spectrum [1]. In hyperspectral images,high spectral resolution improves the ability to describe anddifferentiate the interesting land-cover classes. Nowadays,hyperspectral images have been widely applied in variousfields, such as agriculture [2], military [3], and astronomy [4].

The classification is a commonly used technique in dif-ferent applications of hyperspectral images. There are twocrucial problems involved in the classification task, suitablefeature representation, and effective classifier design [5].The traditional classification methods usually complete themindividually. In the classification of hyperspectral images,the high-dimensional spectral bands usually lead to the“Hughes phenomenon” [6]. Therefore, feature representationin hyperspectral images focuses on transforming original high-dimensional data into an appropriate low-dimensional space,such as principal component analysis (PCA) [7], local preserv-ing projection [8], sparse graph learning [9], nonparametricweighted feature extraction [10], and discriminative local met-ric learning [11]. A series of spatial–spectral feature extractionmethods was proposed, such as 3-D Gabor filters [12] and 3-Dmorphological profile [13]. In [14], a novel controlled randomsampling strategy was proposed to provide more objective andaccurate evaluation for spatial–spectral hyperspectral imageclassification methods. In [15], a novel local covariance matrixrepresentation method was proposed to capture spectral corre-lation and spatial contextual information for spatial–spectralfeature extraction of HSIs. In [16], multiscale covariancemaps involved with covariance between spectra in local spatialwindows were devised to obtain spatial–spectral information.In [17], a novel extinction profile fusion method was pro-posed to use the information within and among extinctionprofiles for HSI classification. The representative classifiersof hyperspectral images include k-nearest neighbors [18],logistic regression (LR) [19], and support vector machine(SVM) [20]. Among these methods, SVM shows outstandingperformance by maximizing the margin between differentclasses. Some improved SVM methods were proposed, such

0196-2892 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

https://orcid.org/0000-0002-8032-7542

https://orcid.org/0000-0003-0997-4664

https://orcid.org/0000-0003-0379-2042


2 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

as Laplacian SVM [21], customizing kernel SVM [22], andparallel multiclass SVM [23]. A novel SVM-based bandweight strategy [24] was proposed to maximize the interclassdistance and improve the classification performance.

Deep learning-based methods have shown great potentialin the classification of hyperspectral images during the lastdecade. Compared with abovementioned traditional classifica-tion methods [7]–[24], deep learning-based methods extracthierarchical features and train the classifier simultaneously.In [25], a deep stacked autoencoder (SAE) method wasproposed for hyperspectral image classification, which com-bines PCA-based dimensionality reduction, hierarchical fea-ture extraction, and LR classification. The variants of SAEinclude sparse SAE [26], denoising SAE [27], and LaplacianSAE [28]. In [29], a deep belief network (DBN) was con-structed for hyperspectral image classification by learning therestricted Boltzmann machine network layer by layer. Thesetwo kinds of deep learning networks can extract nonlinear andinvariant features to the local scattering changes in hyper-spectral images [25]. However, there are many parameterscaused by full connection in these networks. Compared withSAE and DBN, convolutional neural network (CNN) [30] useslocal connections to extract the spatial information and weightsharing to decrease the number of parameters. Due to thissuperiority, a series of CNN methods [31]–[33] has emergedfor hyperspectral image classification. However, the perfor-mance of CNN methods [31]–[33] depends on the quantity oftraining samples greatly. Generally, the collection of trainingsamples is difficult in hyperspectral images. In [34], a pixel-pair CNN (PPF-CNN) method was proposed to expand thetraining set by reorganizing and relabeling existing trainingsamples. The small sample size problem can also be alleviatedby utilizing the sample generation of generative models.

Recently, a representative generative model, generativeadversarial networks (GANs) [35], has drawn increasing atten-tion due to its capability of generating high-quality samples.GAN consists of two networks, called generator and dis-criminator, respectively. The discriminator aims to determinewhether its input is from generated or real samples, whilethe generator aims to make the discriminator mistake. Thegenerator and discriminator are optimized with the adversarialobjective alternately. A series of improved GAN methods hasbeen developed [36]–[43]. Some methods focus on providingthe network structure of GAN, such as deep convolutionalGAN (DCGAN) [36], Laplacian GAN [37], and generativerecurrent adversarial networks (GRANs) [38]. DCGAN con-structs the networks using CNN without pooling layers andfully connected layers. Laplacian GAN uses a cascade ofconvolutional networks within a Laplacian pyramid structure.Different with DCGAN and Laplacian GAN, GRAN con-structs the generator by a recurrent feedback loop. Othersfocus on addressing the problem of GAN mode collapse. Thegenerator takes a random noise as an input, which easily leadsto uncontrollability training. Some extra information is addedto the input of the generator, such as conditional GAN [39]and information maximizing GAN [40].

By the adversarial learning, these GAN methods [35]–[43]can generate samples effectively. But since the discriminator of

these methods lacks of multiclass discriminative ability, thesemethods cannot be directly used for classification. To achieveclassification of hyperspectral images, some GAN-based net-works are proposed [44]–[46]. In [44], a GAN-based semi-supervised learning method was proposed. It is abbreviatedas HSGAN. In HSGAN, unlabeled training samples are usedto train 1DGAN first. Then, the trained discriminator isfine-tuned for classification by using labeled training samples.But HSGAN is based on 1DGAN, which does not make fulluse of spatial information. In [45], a 3DGAN method wasproposed for hyperspectral image classification. 3DGAN [45]is realized in a simplified way. Only three principal compo-nents are retained as the inputs of 3DGAN. The convolu-tion does not actually slide among the spectra. In 3DGAN,although both sigmoid and soft-max classifiers are appliedin discriminator D. The objective functions of G and D areonly adversarial in the branch of binary sigmoid classifier, andare not adversarial with each other in the branch of soft-maxclassifier. Multiclass discriminative ability of D cannot beeffectively promoted by the adversarial learning. Therefore,it is still a challenging subject to apply GAN to extractspatial–spectral features and achieve multiclass classificationof hyperspectral images simultaneously.

In this paper, a novel multiclass spatial–spectralGAN (MSGAN) method is proposed for hyperspectralimage classification. In MSGAN, two generators G1and G2 are designed to generate spectra by 1-D transposedconvolutional network (1D-TCN), and spatial patches by2D-TCN, respectively. The discriminator is designed toextract joint spatial–spectral features by using 1-D CNN and2-D CNN from spectra and spatial patches and output themulticlass probabilities by a soft-max classifier. The objectiveof the discriminator is defined to predict an input from trainingsamples belonging to one of the true classes, and an inputfrom the generators G1 and G2 belonging to all the classeswith the same probability. In the generator, the adversarialobjective is designed to make the discriminator mistake. Theadversarial learning in multiclass improves the classificationperformance of the discriminator with the assistance ofdiscriminative generated samples. Finally, the well-traineddiscriminator is directly used as the classifier. Furthermore,the conditional label information is appended to the input ofthe generators and batch normalization strategy [47] is usedto avoid mode collapse and improve the stability of MSGAN.

The main contributions of this paper can be summarized asfollows.

1) MSGAN achieves a GAN-based end-to-end multiclas-sification. It combines spatial and spectral samplegeneration, joint spatial–spectral feature extraction, andclassification into a unified optimization procedure.

2) MSGAN defines adversarial objections for multiclassbetween the generators and discriminator. Comparedwith 3DGAN [40], MSGAN uses the adversariallearning in multiclass to further improve multiclassdiscriminative ability of the discriminator.

3) For the characteristic of 3-D data cube in hyperspectralimages, MSGAN not only generates the sampleswith 1-D spectra and 2-D spatial patches, but also


FENG et al.: CLASSIFICATION OF HYPERSPECTRAL IMAGES 3

Fig. 1. Framework of GAN.

extracts joint spatial–spectral features. Compared withHSGAN [44], MSGAN implements an end-to-endGAN-based classification and utilizes spatial informa-tion to promote the classification performance.

4) MSGAN alleviates the small size problem of hyperspec-tral images by making full use of multiclass generatedsamples and adversarial learning.

The rest of this paper is organized as follows. Section IIreviews the GAN briefly. Section III describes the proce-dure of the proposed MSGAN method in detail. Afterword,the experimental validation and corresponding analysis on sev-eral hyperspectral data sets are presented in Section IV. Finally,some concluding remarks and suggestions are provided for thefurther work in Section V.

II. REVIEW OF GENERATIVE ADVERSARIAL NETWORKS

Before introducing the proposed MSGAN method, we firstreview the architecture of GAN briefly.

Inspired by the Nash equilibrium of game theory [48],Goodfellow et al. [35] proposed the GANs. As shownin Fig. 1, GAN includes two networks, one called the generatorG and the other called discriminator D. The objective of Gis to learn the distribution pdata of real data x and generatethe data that subjects to this distribution. G takes randomGaussian noises z as inputs and maps z into G(z). The inputsof discriminator D include both real data x and generateddata G(z). The output of D indicates the probability that theinput of D derives from real data rather than generated data.In brief, G wants to fool D and makes it mistake, and D wantsto distinguish between real and generated data as accuratelyas possible.

The optimization process of GAN is to find the Nashequilibrium between G and D. It is a mini-max game problem.

Its objective function is defined as follows [35]:minG maxD V (D, G)

= Ex∼pdata(x)[log D(x)] + Ez∼pz(z)[log(1 − D(G(z)))] (1)

where pz(z) indicates the distribution of noise z. D(x) andD(G(z)) represents the outputs of D when the inputs are realsamples x and generated samples G(z).

The training of GAN is implemented by adopting analternating optimization strategy between the generator Gand discriminator D. In one iteration, D is optimized ktimes by maximizing Ex∼pdata(x)[log D(x)]+ Ez∼pz(z)[log(1−D(G(z)))] with G fixed; then G is optimized once by minimiz-ing Ez∼pz(z)[log(1 − D(G(z)))] with D fixed. The adversarialtraining makes it possible that G and D promote each other.After many iterations of training, global optimality pdata = pg

is reached, where pg denotes the distribution learned by G.In this case, G learns the distribution of real data. Meanwhile,the ability of D for distinguishing between real and generateddata is improved.

III. MULTICLASS SPATIAL–SPECTRAL GAN

The flowchart of the proposed MSGAN method is shownin Fig. 2. As shown in Fig. 2, MSGAN consists of three parts:1D-TCN-based spectrum generation through generator G1and 2D-TCN-based spatial patch generation through generatorG2, joint spatial–spectral feature extraction, and classificationthrough the discriminator D.

In hyperspectral images, the spectra of pixels are denotedby xspe = {xspei

}Ni=1. The corresponding w × w spatial

neighborhood patches of pixels are obtained from hyperspec-tral images and denoted by xspa = {xspai

}Ni=1 after PCA is

applied to extract several principle components. Then, xspeand xspa are used to constitute the training sample set x ={(xspei

, xspai)}N

i=1, where N is the number of training samples.y={yi }N

i=1 represents the corresponding one-hot coded labelsof training samples.

At the sample generation stage, random noises z1 and z2are used as the inputs of G1 and G2, respectively. Moreover,the class labels y are added into the inputs to stabilizegenerators G1 and G2 and avoid mode collapse. After thetraining, multiclass spectra G1(z1, y) = {G1(z1i , yi )}N

i=1 andspatial patches G2(z2, y) = {G2(z2i , yi )}N

i=1 are generated,where G1(z1, y) are generated by 1D-TCN, and G2(z2, y) aregenerated by 2D-TCN.

At the feature extraction and classification stage, discrim-inator D with a soft-max layer extracts joint spatial–spectralfeatures and implements multiclassification. The discriminatoris devised to predict the training samples belonging to oneof multiple classes, and generated samples from G1 and G2belonging to all the classes with the same probability. Afterthe adversarial training, discriminator D is directly utilized toimplement multiclassification.

A. 1-D-TCN-Based Spectrum and 2D-TCN-Based SpatialPatch Generation Through G1 and G2 in MSGAN

Generators G1 and G2 are used to generate multiclassspectra and spatial patches, which contain spectral and spatialinformation, respectively. As shown in Fig. 3, both generatorsG1 and G2 are constructed by the TCN [49]. Specifically, 1-DTCN is devised to generate spectra G1(z1, y), and 2-D TCNis used to generate spatial patches G2(z2, y). The extra labelinformation y is appended into the inputs of generators G1and G2 to avoid potential mode collapse.

The objective of generators G1 and G2 is to learnthe distribution of training samples. It is achieved bymaking the discriminator predict the label of each gener-ated sample (G1(z1i , yi ), G2(z2i , yi )) as yi . After the train-ing of G1 and G2, the distribution of generated samples{(G1(z1i , yi ), G2(z2i , yi ))}N

i=1 is closer to that of trainingsamples {(xspei

, xspai)}N

i=1.In generators G1 and G2, four 1-D and 2-D transposed

convolution layers are stacked layer by layer. Rectified linear



Fig. 2. Flowchart of the proposed MSGAN method.

Fig. 3. Construction of generators G1 and G2 in MSGAN, where spectraare generated through G1 and spatial patches are generated through G2.

units (RELUs) are used as nonlinear activation function inall the transposed convolutional layers except the last one,where tanh function is utilized. To improve the stability of thegenerators, batch normalization strategy [47] is used.

B. Joint Spatial–Spectral Feature Extraction andClassification Through D in MSGAN

In discriminator D, the inputs contain the training sam-ples {xspei

, xspai}Ni=1 and generated samples {G1(z1i , yi ),

G2(z2i , yi )}Ni=1. The discriminator is used to extract joint

spatial–spectral features and classify hyperspectral images.Specifically, spectral features are extracted from the spec-

tra xspe and G1(z1, y) by 1-D CNN, and spatial featuresare extracted from the spatial patches xspa and G2(z2, y)by 2-D CNN. The spectral features and spatial featuresare first reshaped to the column vectors, respectively; then,the two vectors are concatenated together to obtain jointspatial–spectral features.

When joint spatial–spectral features are extracted, thesefeatures are used to perform classification. However, the dis-criminator in the classical GAN methods only distinguisheswhether an input is a real sample or not. In order to solvemulticlassification problem in hyperspectral images, a soft-max classification layer is added to the discriminator. Thus,class probabilities can be obtained from the discriminator.Correspondingly, novel adversarial objective functions formulticlassification are defined as follows, where lD and lG

are objective functions of the discriminator and generators,respectively, D(·) represents the output of the discriminator,and l(·) denotes the cross entropy.

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

lD =N∑

i=1

l(D(xspei, xspai

), yi )

+N∑

i=1

l(D(G1(z1i , yi ), G2(z2i , yi )), yGi )

lG =N∑

i=1

l(D(G1(z1i , yi ), G2(z2i , yi )), yi ).

(2)

From (2), the objective function lD of the discriminatorincludes two terms. The first term

∑Ni=1 l(D(xspei

,xspai

), yi ) encourages the discriminator to give higher



Fig. 4. Construction of discriminator D in MSGAN.

probabilities to true classes {yi}Ni=1 for training

samples {(xspei, xspai

)}Ni=1, and the second term∑N

i=1 l(D(G1(z1i , yi ), G2(z2i , yi )), yGi ) expects generatedsamples {G1(z1i , yi ), G2(z2i , yi )}N

i=1 to belong to none ofclasses yGi . yGi = [(1/C,(1/C), . . . , (1/C)], where C isthe number of classes. In detail, the class probabilities ofgenerated samples should be predicted as [0, 0, . . . , 0], whichrepresents generated samples do not belong to arbitraryclasses. However, the sum of all probabilities from soft-maxclassification layer is always 1. Thus, an approximationrepresentation yGi = [(1/C), (1/C), . . . , (1/C)] is used toreplace [0, 0, . . . , 0]. This approximation depends on thefact that yGi = [(1/C), (1/C), . . . , (1/C)] is the closestpoint to [0, 0, . . . , 0] under the condition that all the outputprobabilities add up to 1.

The objective functions of lD and lG are adversar-ial with each other for multiclass generated samples{G1(z1i , yi ), G2(z2i , yi )}N

i=1. The discriminator aims to predictgenerated samples belonging to all the classes with the sameprobability. On the contrary, the generators aim to makethe discriminator predict the label of generated samples asyi , which makes the discriminator mistake. By adversariallearning, the distribution of generated samples gets closer tothat of training samples. It urges the discriminator to extractmore discriminative features to distinguish between trainingsamples and generated samples. So the discriminative abilityof the discriminator is improved gradually.

Discriminator D and generators G1 and G2 are alternatelyoptimized to update the parameters. The discriminator isoptimized with the generators fixed. Then, the generators areoptimized with the discriminator fixed. Due to the stablenetwork structure of MSGAN, the discriminator is optimizedonce. In order to make the generators learn the distribution ofhyperspectral data with limited samples effectively, the gen-erators are optimized k times. Both of them are optimized bythe root-mean-square propagation method [50].

Fig. 4 shows the construction of discriminator D. As shownin Fig. 4, 1-D CNN is employed to extract spectral features,and 2-D CNN is employed to extract spatial features. Theextracted spatial–spectral features are reshaped and concate-nated, which are fed into the last soft-max classificationlayer. Like the generators, batch normalization is used in thediscriminator.

TABLE I

PROCEDURE OF THE PROPOSED MSGAN METHOD

Compared with other deep learning-based methods, such asSAE [25], DBN [29], and CNN [33], the proposed MSGANmethod alleviates the small size problem in hyperspectralimage classification with the assistance of generated sam-ples. Compared with traditional GAN method, the proposedMSGAN method extracts deep spatial–spectral features andimplements multiclassification for hyperspectral images simul-taneously. Compared with HSGAN [47] and 3DGAN [48],MSGAN method generates spatial and spectral informationmore effectively by two generators, and improves the multi-class discriminative ability of the discriminator by adversariallearning.

C. Procedure of MSGAN

The proposed MSGAN method combines spatial andspectral sample generation, joint spatial–spectral feature rep-resentation, and classifier training into a unified optimizationprocedure. The detailed procedure of the proposed MSGANmethod can be summarized in Table I.

IV. EXPERIMENTAL RESULTS

In this section, we investigate the performance of theproposed MSGAN method on three challenging hyperspectral



data sets in comparison with several state-of-the-art hyperspec-tral image classification methods.

A. Data Description

The Indian Pines, Pavia University, and Salinas hyperspec-tral data sets are adopted to demonstrate the effectiveness ofthe proposed MSGAN method.

1) The Indian Pines data set was gathered by the AirborneVisible/Infrared Imaging Spectrometer (AVIRIS) sensorover the Indian Pines test site in North-Western Indi-ana. It consists of 145 × 145 pixels and 224 spectralbands in the wavelength range from 0.2 to 2.4 µm.In the experiments, 200 bands are used by removingthe water absorption bands [104]–[108], [150]–[163],and 220. The Indian Pines scene contains two-thirdsagriculture and one-third natural vegetation. The groundtruth contains sixteen classes. The false-color compositeimage (bands 50, 27, 17) and the ground truth are shownin Fig. 5(a) and (d).

2) The Pavia University data set was acquired by theReflective Optics System Imaging Spectrometer sensorduring a flight campaign over Pavia, northern Italy. Thisdata set consists of 610 × 340 pixels, with a spatialresolution of 1.3 m per pixel. After 12 noisy spectralbands are removed, 103 spectral bands are retainedin the experiments. The ground truth contains ninerepresentative urban classes. The false-color compositeimage (bands 53, 31, 8) and the ground truth are shownin Fig. 5(b) and (e).

3) The Salinas data set was collected by the 224-bandAVIRIS sensor over Salinas Valley, CA, USA. In theexperiments, 204 bands are retained after eliminating20 water absorption bands: [108]–[112], [154]–[167],and 224. This data set consists of 512 × 217 pixelsand has a spatial resolution of 3.7 m per pixel. Theground truth contains 16 classes. The three-band colorcomposite image (bands 50, 170, 190) and the groundtruth are shown in Fig. 5(c) and (f).

B. Experimental Setting

To validate the performance of the proposed MSGANmethod, five representative deep learning-based methods forhyperspectral image classification, SAE [25], DBN [29],CNN [33], PPF-CNN [34], and 3-D CNN method [32] areused as the comparison methods. Furthermore, classical SVMwith radial basis function (RBF-SVM) [20] is also adoptedfor comparison. The classification performance of all themethods is measured by three widely used indexes: overallaccuracy (OA), average accuracy (AA), and kappa coeffi-cient (Kappa) [51]. All the experimental results are obtainedby running over 30 times independently with a randomdivision for training and test sets. All the experiments areimplemented by using Python language and TensorFlow [52]library. TensorFlow is an open source software library fornumerical computation using data flow graphs. A NVIDIA1080Ti graphics card is used to implement GPU computation.

Fig. 5. False-color composite images of (a) Indian Pines, (b) Pavia University,(c) Salinas, and the ground truth of (d) Indian Pines, (e) Pavia University, and(f) Salinas.

For RBF-SVM, one-against-all strategy is used to deal withthe multiclassification. The penalty and gamma parameters inRBF-SVM are determined by fivefold cross validation. ForSAE and DBN, the radius of spatial neighborhood window issearched in the range of [3], [21] with the interval of 2. ForCNN, 1 × 1 sized kernels are used in the convolutional layersas suggested in [33]. For PPF-CNN, the size of block windowof neighboring pixels is set to the default value in [34]. For3-D CNN, the spatial window size of 3-D input is resizedto 27 × 27 [32]. For MSGAN, the main network structureand parameters are shown in Table II. In Table II, G1 andG2 refer to two generators, and D represents the discriminatorof MSGAN. “BN?” denotes whether batch normalization isused. m represents the number of joint spatial–spectral featuresand n_classes represents the number of classes. The learningrates of discriminator and generators are set as 0.002 and0.01, respectively. The number of epoch is set as 1500.



TABLE II

DETAILED STRUCTURE AND PARAMETERS OF MSGAN

TABLE III

16 CLASSES OF THE INDIAN PINES IMAGE AND THE NUMBERS OF TRAIN-ING AND TEST SAMPLES FOR EACH CLASS

The spatial dimensions of both z1 and z2are set as 100.Moreover, the patch size of the input and the updating time ofthe generators in MSGAN will be investigated in Section IV-C.

C. Classification Results of Hyperspectral Data Sets

1) Classification Results of the Indian Pines Data Set: TheIndian Pines data set is randomly divided into 5% for trainingand 95% for testing. The numbers of training and test samplesfor each class are listed in Table III. Table IV records the aver-age classification accuracies and the corresponding standarddeviations of the seven algorithms over 30 independent runs.In Table IV, the first 16 rows correspond to the results foreach class, and the last three rows are OA, AA, and Kapparesults for all the classes. The best classification results in theseven algorithms are emphasized in gray regions. As shownin Table IV, deep learning-based methods, SAE, DBN, CNN,PPF-CNN, 3-D CNN, and MSGAN, are superior to RBF-SVMdue to hierarchical nonlinear feature extraction. Comparedwith SAE, DBN, and CNN, PPF-CNN obtains better clas-

Fig. 6. (a) Ground truth and (b)–(h) classification visual maps of the IndianPines data set by (b) RBF-SVM, (c) SAE, (d) DBN, (e) PPF-CNN, (f) CNN,(g) 3-D CNN, and (h) MSGAN.

sification results by enlarging the available training samples.Compared with PPF-CNN, 3-D CNN further improves theclassification performance with joint spatial-spectral featureextraction. Among the seven methods, MSGAN achieves thebest classification results in the majority of classes with theassistance of high-quality generated samples. Additionally,compared with other methods, MSAGN improves at least 2.8%in the OA index, 2% in the AA index and 3.1% in the Kappaindex.

Fig. 6 shows the classification visual maps of the seven algo-rithms on the Indian Pines data set. As shown in Fig. 6(b)–(f),SVM, SAE, DBN, CNN, and PPF-CNN misclassify manysamples in the middle of regions, especially corn-notill,corn-mintill, soybean-notill, and soybean-mintill classes. Themisclassification leads to visual noisy scattered points indifferent degrees. Compared with these methods, 3-D CNNand MSGAN improve the region uniformity significantly.Compared with 3-D CNN, MSGAN performs better in termsof the region uniformity of the soybean-clean and grass-treesclasses and the boundary localization of the stone-steel-towersclass.

2) Classification Results of the Pavia University Data Set:The Pavia University data set is randomly divided into 3% fortraining and 97% for testing. The numbers of training and testsamples for each class are listed in Table V.

The statistical classification results on the Pavia Universitydata set are summarized in Table VI. As shown in Table VI,CNN, PPF-CNN, 3-D CNN, and MSGAN are superior toRBF-SVM, SAE, and DBN by extracting the spatial infor-mation with local connections and decreasing the networkparameters with weight sharing. For the gravel class, the clas-sification results of RBF-SVM, SAE, DBN, and CNN arenot satisfying. Compared with them, the proposed MSGANmethod improves by 39.4%, 27.5%, 29.9%, and 21.7%. Forall the classes, the classification accuracies of MSGAN exceed93%. Among the seven algorithms, MSGAN obtains the beststatistical results in terms of the OA, AA, and Kappa indexes.



TABLE IV

CLASSIFICATION RESULTS OF RBF-SVM, SAE,DBN,PPF-CNN,CNN,3-D CNN, AND MSGAN ON THE INDIAN PINES DATA SET

TABLE V

NINE CLASSES OF THE PAVIA UNIVERSITY IMAGE AND THE NUMBERS

OF TRAINING AND TEST SAMPLES FOR EACH CLASS

Fig. 7 shows the classification visual maps of the sevenalgorithms on the Pavia University data set. As shownin Fig. 7(b)–(f), many samples belonging to the bitumenclass are misclassified as the asphalt class because of similarspectral signatures. The proposed MSGAN method provides abetter distinction for these two classes. Compared with othermethods, MSGAN achieves better region uniformity in thebare soil class. Moreover, MSGAN obtains better boundarylocalization of the grass class.

3) Classification Results of the Salinas Data Set: The Sali-nas data set is randomly divided into 1% for training and 99%for testing. The numbers of training and test samples for eachclass are listed in Table VII. Table VIII lists the classificationresults of all seven algorithms. It can be seen that the sevenalgorithms exceed 90% classification accuracy in most classes.However, grapes_untrained and vinyard_untrained classes aremisclassified by RBF-SVM,SAE,DBN, CNN, and PPF-CNN.Compared with these methods, MSGAN obviously improvesthe classification results by using effective joint spatial–spectral feature extraction and classification with the assistanceof discriminative generated samples. For the fallow class,MSGAN achieves completely correct classification result.

Fig. 7. (a) Ground truth and classification visual maps of the Pavia Universitydata set by (b) RBF-SVM, (c) SAE, (d) DBN, (e) CNN, (f) PPF-CNN, (g) 3-DCNN, and (h) MSGAN.

Compared with other methods, MSGAN obtains higher classi-fication accuracies in most classes and better statistical resultsin terms of all the OA, AA, and Kappa indexes.

Fig. 8 shows the classification visual maps of the sevenalgorithms on the Salinas data set. As shown in Fig. 8(b)–(f),many samples belonging to the grapes_untrained class are mis-classified as the vinyard_untrained class by RBF-SVM, SAE,DBN, CNN, and PPF-CNN. Compared with them, 3-D CNNand MSGAN provide better distinction for these two classes.



TABLE VI

CLASSIFICATION RESULTS OF RBF-SVM, SAE,DBN,PPF-CNN,CNN,3-D CNN, AND MSGAN ON THE PAVIA UNIVERSITY DATA SET

Fig. 8. (a) Ground truth and Classification visual maps of the Salinas dataset by (b) RBF-SVM, (c) SAE, (d) DBN, (e) CNN, (f) PPF-CNN, (g) 3-DCNN, and (h) MSGAN.

Moreover, compared with 3-D CNN, MSGAN performs betteruniformity in the fallow and vinyard_vertical_trellis classes.

D. Classification Results With Fixed Numberof Training Samples

We have added an experiment with the same fixed numberof training samples per class for all the comparison methods.

TABLE VII

16 CLASSES OF THE SALINAS IMAGE AND THE NUMBERS OF

TRAINING AND TEST SAMPLES FOR EACH CLASS

In the experiment, 150 samples from each class on the IndianPines, Pavia University, and the Salinas data sets are randomlyselected as the training samples. The remaining samples areused for testing. By the way, the classes are ignored if thenumber of samples is less than 150.

Table IX records the classification performance of theseven algorithms with the fixed number of training sam-ples. Compared with SVM, deep learning methods obtainbetter classification performance due to hierarchical featureextraction. Among these methods, MSGAN further improvesthe classification performance by using joint spatial–spectralfeatures and adversarial learning for multiclass. MSGANimproves at least 1.3% in the OA index, 1.5% in the AA index,and 1.6% in the Kappa Index on three hyperspectral data sets.

E. Investigation on Running Time

Tables X–XII list the training and test time of RBF-SVM,SAE, DBN, PPF-CNN, CNN, 3-D CNN, and MSGAN on theIndian Pines, Pavia University, and Salinas data sets, respec-tively. As shown in Tables X–XII, compared with RBF-SVM,six deep learning-based methods, SAE, DBN, PPF-CNN,CNN, 3-D CNN, and MSGAN, cost more training time with



TABLE VIII

CLASSIFICATION RESULTS OF RBF-SVM, SAE,DBN,PPF-CNN,CNN,3-D CNN, AND MSGAN ON THE SALINAS DATA SET

TABLE IX

CLASSIFICATION RESULTS OF RBF-SVM, SAE, DBN, PPF-CNN, CNN,3-D CNN, AND MSGAN WITH 150 TRAINING SAMPLES PER CLASS

the construction of deep network models. SAE and DBN arefaster on the training time than PPF-CNN, CNN, 3-D CNN,and MSGAN due to the input with the 1-D vector form.Among all the comparison methods, PPF-CNN, 3-D CNN, andMSGAN are time-consuming on the training time. MSGANis faster on the training time than 3-D CNN and PPF-CNN.3-D CNN costs more time because of increasing networkparameters caused by 3-D convolution. PPF-CNN costs moretime due to the expansion of training samples, especially whenthe number of training samples is large. For the test time,

TABLE X

RUNNING TIME OF RBF-SVM, SAE, DBN, PPF-CNN, CNN, 3-D CNN,AND MSGAN ON THE INDIAN PINES DATA SET

TABLE XI

RUNNING TIME OF RBF-SVM, SAE, DBN, PPF-CNN, CNN, 3-D CNN,AND MSGAN ON THE PAVIA UNIVERSITY DATA SET

TABLE XII

RUNNING TIME OF RBF-SVM, SAE, DBN, PPF-CNN, CNN, 3-D CNN,AND MSGAN ON THE SALINAS DATA SET

SAE, DBN, CNN, and MSGAN have obvious advantage thanRBF-SVM, PPF-CNN, and 3-D CNN. PPF-CNN is slowerby using the voting strategy with the surrounding pixels.3-D CNN costs more time by using complex 3-D convolution



Fig. 9. OA results of RBF-SVM, SAE, DBN, CNN, PPF-CNN, 3-D CNN,and MSGAN with different ratios of training samples on (a) Indian Pines,(b) Pavia University, and (c) Salinas data sets.

operation. MSGAN only costs 0.2, 0.8, and 1.3 s on the IndianPines, Pavia University, and Salinas data sets, respectively.

F. Sensitivity to the Number of Training Samples

As shown in Fig. 9(a)–(c), the classification performance ofthe seven algorithms with different ratios of training samples

is investigated. Specifically, 1%, 3%, 5%, 7%, 9% samplesfrom each class on the Indian Pines, 1%, 2%, 3%, 4%, 5%on the Pavia University, and 1%, 1.5%, 2%, 2.5%, 3% onthe Salinas data sets are randomly selected as the trainingsamples. Generally, deep learning-based methods are usuallyheavily parameterized and a large number of training samplesare required to guarantee the performance. When the ratioof training samples decreases, the classification performanceof all the seven algorithms declines. Compared with RBF-SVM, SAE, DBN, CNN, PPF-CNN, and 3-D CNN algo-rithms, MSGAN consistently provides superior performancewith different ratios of training samples. In addition, MSGANdeclines slower than other algorithms with less than 2%training samples on the three data sets. Thus, MSGAN is abetter choice when the number of training samples is limited.

G. Comparison With Other GAN-Based Methods

Two representative GAN-based hyperspectral image classi-fication methods, 3D-GAN [42] and HSGAN [41], are used ascomparison methods. The corresponding experimental resultsare shown in Table XIII.

Compared with HSGAN, MSGAN implements an end-to-end GAN-based classification and utilizes spatial informa-tion to promote the classification performance. It improves21.55%, 13.5%, and 10.8% in OA index on three hyperspectraldata sets. Compared with 3DGAN, MSGAN achieves betterclassification performance in terms of OA, AA, and Kappaindexes. Especially, MSGAN improves 6.3%, 11.2%, and6.7% in AA index than 3DGAN on the Indian pines, PaviaUniversity, and Salinas data sets, respectively. Compared withthe simplified 3-D-based way in 3DGAN, spatial–spectralinformation is obtained more effectively through the two gen-erators and discriminator in MSGAN. In addition, in 3DGAN,although both sigmoid and soft-max classifiers are appliedin discriminator D. The objective functions of G and D areonly adversarial in the branch of binary sigmoid classifier,and are not adversarial with each other in the branch of soft-max classifier. Multiclass discriminative ability of D cannot beeffectively and directly promoted by the adversarial learning.

H. Effectiveness Analysis for the Structure of Generators

To show the superiority of two generators, MSGAN onlywith spectral generator (MSGAN_SPE) and MSGAN onlywith spatial generator (MSGAN_SPA) are used as the com-parison methods. Table XIV shows the classification resultsof MSGAN_SPE, MSGAN_SPA, and MSGAN on the IndianPines, Pavia University, and Salinas hyperspectral data sets.Compared with MSGAN_SPE and MSGAN_SPA, MSGANachieves better classification results in terms of AA, OA,Kappa indexes on three data sets. It is shown that two genera-tors are more effective than single spectral generator or spatialgenerator in MSGAN for hyperspectral image classification.

I. Visualization of Generated Samples by MSGAN

We provide generated samples by the proposed method incomparison with original ones. Figs. 10–12 show the ground



TABLE XIII

CLASSIFICATION RESULTS OF 3DGAN AND MSGAN

TABLE XIV

CLASSIFICATION RESULTS OF MSGAN_SPE, MSGAN_SPA, AND MSGAN

TABLE XV

CLASSIFICATION RESULTS OF FCN AND MSGAN

truth spectra and spectra generated by MSGAN on theIn-dian Pines, Pavia, and Salinas data sets, respectively. Notethat x-axis indicates that the number of spectral bands andy-axis indicates normalized reflectance of spectral bands inFigs. 10–12. For each data set, spectra and their ground truthsfrom three representative classes are shown. We can see thatthe distribution of spectra generated by MSGAN is similarto the distribution of real data. In other words, a well-trainedG generates similar samples with the real samples. When thetraining samples are limited, the generated data is helpful toimprove the performance of the discriminator.

J. Effectiveness Analysis of Adversarial Learning in MSGAN

To evaluate the effectiveness of adversarial learning inMSGAN, we perform the comparison between MSGAN and

the fully CNN (FCN). The FCN is devised with the sameconstruction as the discriminator of MSGAN. Table XV liststhe classification results of FCN and MSGAN on the threehyperspectral data sets with different ratios of training sam-ples. As shown in Table XV, MSGAN has obvious advantageover FCN on different numbers of training samples, especiallywhen the training samples are limited. The reason is that thegenerated samples promote the classification performance ofMSGAN by adversarial learning between the generators andthe discriminator.

K. Analysis of Free Parameters in MSGAN

There are two important parameters w and k in MSGAN.w represents the input size of spatial patches, and it is setto [23], [25], [27], [29], [31]. k indicates the updating times



Fig. 10. Spectra of the Indian Pines data set. Ground truth spectra of(a) corn-notill, (b) soybean-mintill, and (c) wheat, and generated spectra of(d) corn-notill, (e) soybean-mintill, and (f) wheat by MSGAN.

Fig. 11. Spectra of the Pavia University data set. Ground truth spectra of(a) trees, (b) painted metal sheets, and (c) bitumen, and generated spectra of(d) trees, (e) painted metal sheets, and (f) bitumen by MSGAN.

Fig. 12. Spectra of the Salinas data set. Ground truth spectraof (a) Brocoli_green_weeds_1, (b) Lettuce_romaine_7wk, and (c) Vin-yard_vertical_trellis, and generated spectra of (d) Brocoli_green_weeds_1,(e) Lettuce_romaine_7wk, and (f) Vinyard_vertical_trellis by MSGAN.

of generators G1 and G2 relative to discriminator D. It isset to [1], [4], [7], [10], [13]. k aims to control the balancebetween the generators and discriminator in the process ofalternating optimization. Fig. 13(a)–(c) shows the OA resultsof MSGAN on the validation sets of the Indian Pines, PaviaUniversity, and Salinas data sets under different parameters wand k. If w is too small, MSGAN may not extract enoughspatial information of hyperspectral images. On the contrary,given a too large w, spatial windows may not represent thesamples effectively in MSGAN. Finally, w is selected as 27 for

Fig. 13. Sensitivity analysis to the spatial window size w and updatingtimes k of G1 and G2 for MSGAN on (a) Indian Pines, (b) Pavia University,and (c) Salinas data sets.

the three hyperspectral data sets. When k is equal to 10 or 13,the OA results of the three data sets reach the peak values.In the experiment, k is selected as 10.

V. CONCLUSION

In this paper, a novel MSGAN method is proposed forhyperspectral image classification. For the characteristic ofthree-dimensional data cube in HSIs, two generators are



designed in MSGAN to generate spectra and spatial patchesof HSIs, and a discriminator is designed to extract jointspatial–spectral features and output multiclass probabilities.To deal with multiclassification of limited training sam-ples, MSGAN defines new adversarial objectives between thegenerators and discriminator for multiclass. The adversarialobjectives make the discriminator predict generated samplesbelonging to none of classes and make the generators producemulticlass real samples. By making full use of multiclassgenerated samples and adversarial learning, MSGAN improvesmulticlass discriminative ability. At the same time, it alleviatesthe small sample size problem of HSIs. The experimentalresults demonstrate that the effectiveness of the proposedMSGAN method in terms of classification performance, run-ning time, and sensitivity to the number of training samples.

For hyperspectral images, some classes often have muchfewer samples than other classes. This imbalance phenomenonmay have a negative impact for classification. In the future,ensemble learning will be considered in MSGAN to deal withthe imbalance problem and further improve the classificationperformance.

REFERENCES

[1] C. I. Chang, Hyperspectral Data Exploitation: Theory and Applications.Hoboken, NJ, USA: Wiley, 2007.

[2] C. M. Gevaert, J. Suomalainen, J. Tang, and L. Kooistra, “Generation ofspectral–temporal response surfaces by combining multispectral satelliteand hyperspectral UAV imagery for precision agriculture applications,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 6,pp. 3140–3146, Jun. 2015.

[3] I. Makki, R. Younes, C. Francis, T. Bianchi, and M. Zucchetti, “A surveyof landmine detection using hyperspectral imaging,” ISPRS J. Pho-togramm. Remote Sens., vol. 124, pp. 40–53, Feb. 2017.

[4] A. J. Brown, M. R. Walter, and T. J. Cudahy, “Hyperspectral imagingspectroscopy of a Mars analogue environment at the North Pole Dome,Pilbara Craton, Western Australia,” Austral. J. Earth Sci., vol. 52, no. 3,pp. 353–364, Mar. 2005.

[5] X. Jia, B.-C. Kuo, and M. M. Crawford, “Feature mining for hyperspec-tral image classification,” in Proc. IEEE, vol. 101, no. 3, pp. 676–697,Mar. 2013.

[6] G. Hughes, “On the mean accuracy of statistical pattern recognizers,”IEEE Trans. Inf. Theory, vol. 14, no. 1, pp. 55–63, Jan. 1968.

[7] X. Kang, X. Xiang, S. Li, and J. A. Benediktsson, “PCA-based edge-preserving features for hyperspectral image classification,” IEEE Trans.Geosci. Remote Sens., vol. 55, no. 12, pp. 7140–7151, Dec. 2017.

[8] Y. Zhai et al., “A modified locality-preserving projection approach forhyperspectral image classification,” IEEE Trans. Geosci. Remote Sens.,vol. 13, no. 8, pp. 1059–1063, Aug. 2016.

[9] P. Chen, L. Jiao, F. Liu, S. Gou, J. Zhao, and Z. Zhao, “Dimensionalityreduction of hyperspectral imagery using sparse graph learning,” IEEEJ. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 10, no. 3,pp. 1165–1181, Mar. 2017.

[10] B.-C. Kuo, C.-H. Li, and J.-M. Yang, “Kernel nonparametric weightedfeature extraction for hyperspectral image classification,” IEEE Trans.Geosci. Remote Sens., vol. 47, no. 4, pp. 1139–1155, Apr. 2009.

[11] Y. Dong, B. Du, L. Zhang, and L. Zhang, “Dimensionality reductionand classification of hyperspectral images using ensemble discriminativelocal metric learning,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 5,pp. 2509–2524, May 2017.

[12] L. Shen and S. Jia, “Three-dimensional Gabor wavelets for pixel-basedhyperspectral imagery classification,” IEEE Trans. Geosci. Remote Sens.,vol. 49, no. 12, pp. 5039–5046, Dec. 2011.

[13] J. Zhu, J. Hu, S. Jia, X. Jia, and Q. Li, “Multiple 3-D feature fusionframework for hyperspectral image classification,” IEEE Trans. Geosci.Remote Sens., vol. 56, no. 4, pp. 1873–1886, Apr. 2018.

[14] J. Liang et al., “On the sampling strategy for evaluation of spectral-spatial methods in hyperspectral image classification,” IEEE Trans.Geosci. Remote Sens., vol. 55, no. 2, pp. 862–880, Feb. 2017.

[15] L. Fang, N. He, S. Li, A. J. Plaza, and J. Plaza, “A new spatial–spectralfeature extraction method for hyperspectral images using local covari-ance matrix representation,” IEEE Trans. Geosci. Remote Sens., vol. 56,no. 6, pp. 3534–3546, Jun. 2018.

[16] N. He et al., “Feature extraction with multiscale covariance maps forhyperspectral image classification,” IEEE Trans. Geosci. Remote Sens.,vol. 57, no. 2, pp. 755–769, Feb. 2019.

[17] L. Fang, N. He, S. Li, P. Ghamisi, and J. A. Benediktsson, “Extinctionprofiles fusion for hyperspectral images classification,” IEEE Trans.Geosci. Remote Sens., vol. 56, no. 3, pp. 1803–1815, Mar. 2018.

[18] C. Cariou and K. Chehdi, “Unsupervised nearest neigh-bors clustering with application to hyperspectral images,”IEEE J. Sel. Topics Signal Process., vol. 9, no. 6,pp. 1105–1116, Sep. 2015.

[19] M. Khodadadzadeh, J. Li, A. Plaza, and J. M. Bioucas-Dias,“A subspace-based multinomial logistic regression for hyperspectralimage classification,” IEEE Geosci. Remote Sen. Lett., vol. 11, no. 12,pp. 2105–2109, Dec. 2014.

[20] F. Melgani and L. Bruzzone, “Classification of hyperspectral remotesensing images with support vector machines,” IEEE Trans. Geosci.Remote Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004.

[21] Y. Gu and K. Feng, “Optimized Laplacian SVM with distance metriclearning for hyperspectral image classification,” IEEE J. Sel. Topics Appl.Earth Observ. Remote Sens., vol. 6, no. 3, pp. 1109–1117, Jun. 2013.

[22] B. F. Guo, S. R. Gunn, and R. I. Damper, “Customizing kernel functionsfor SVM-based hyperspectral image classi?cation,” IEEE Trans. Image.Process., vol. 17, no. 4, pp. 622–629, Jun. 2013.

[23] K. Tan, J. Zhang, Q. Du, and X. Wang, “GPU parallel implementationof support vector machines for hyperspectral image classification,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 10,pp. 4647–4656, Oct. 2015.

[24] C. Yan, X. Bai, P. Ren, L. Bai, W. Tang, and J. Zhou, “Band weightingvia maximizing interclass distance for hyperspectral image classifica-tion,” IEEE Geosci. Remote Sen. Lett., vol. 13, no. 7, pp. 922–925,Jul. 2016.

[25] Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deep learning-basedclassification of hyperspectral data,” IEEE J. Sel. Topics Appl. EarthObserv. Remote Sens., vol. 7, no. 6, pp. 2094–2107, Jun. 2014.

[26] A. Ng, “Sparse autoencoder,” CS294A Lecture Notes, pp. 1–19, 2011.[27] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol,

Stacked denoising autoencoders: Learning useful representations in adeep network with a local denoising criterion,” J. Mach. Learn. Res.,vol. 11, pp. 3371–3408, Dec. 2010.

[28] K. Jia, L. Sun, S. Gao, Z. Song, and B. E. Shi, “Laplacian auto-encoders:An explicit learning of nonlinear data manifold,” Neurocomputing,vol. 160, pp. 250–260, Jul. 2015.

[29] Y. Chen, X. Zhao, and X. Jia, “Spectral-spatial classification of hyper-spectral data based on deep belief network,” IEEE J. Sel. Topics Appl.Earth Observ. Remote Sens., vol. 8, no. 6, pp. 2381–2392, Jun. 2015.

[30] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifica-tion with deep convolutional neural networks,” in Proc. NIPS, 2012,pp. 1097–1105.

[31] S. Yu, S. Jia, and C. Xu, “Convolutional neural networks for hyper-spectral image classification,” Neurocomputing, vol. 219, pp. 88–98,Jan. 2017.

[32] Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi, “Deep feature extrac-tion and classification of hyperspectral images based on convolutionalneural networks,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 10,pp. 6232–6251, Oct. 2016.

[33] P. Jia, M. Zhang, W. Yu, F. Shen, and Y. Shen, “Convolutional neuralnetwork based classification for hyperspectral data,” in Proc. IEEE Int.Geosci. Remote Sens. Symp. (IGARSS), Jul. 2016, pp. 5075–5078.

[34] W. Li, G. Wu, F. Zhang, and Q. Du, “Hyperspectral image classificationusing deep pixel-pair features,” IEEE Trans. Geosci. Remote Sens.,vol. 55, no. 2, pp. 844–853, Feb. 2017.

[35] I. Goodfellow et al., “Generative adversarial nets,” in Proc. NIPS, 2014,pp. 2672–2680.

[36] A. Radford, L. Metz, and S. Chintala, “Unsupervised representationlearning with deep convolutional generative adversarial networks,” inProc. ICLR, Jan. 2016, pp. 1–16.

[37] E. Denton, S. Chintala, A. Szlam, and R. Fergus. (2015). “Deep gener-ative image models using a laplacian pyramid of adversarial networks.”[Online]. Available: https://arxiv.org/abs/1506.05751

[38] D. J. Im, C. D. Kim, H. Jiang, and R. Memisevic. (2016). “Generat-ing images with recurrent adversarial networks.” [Online]. Available:https://arxiv.org/abs/1602.05110



[39] M. Mirza and S. Osindero. (2014). “Conditional generative adversarialnets.” [Online]. Available: https://arxiv.org/abs/1411.1784

[40] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, andP. Abbeel, “InfoGAN: Interpretable representation learning by informa-tion maximizing generative adversarial networks,” in Proc. Adv. NeuralInf. Process. Syst., 2016, pp. 2172–2180.

[41] T. Salimans et al., “Improved techniques for training GANs,” in Proc.NIPS, 2016, pp. 2234–2242.

[42] M. Arjovsky, S. Chintala, and L. Bottou. (2017). “Wasserstein GAN.”[Online]. Available: https://arxiv.org/abs/1701.07875

[43] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville.(2017). “Improved training of Wasserstein GANs.” [Online]. Available:https://arxiv.org/abs/1704.00028

[44] Y. Zhan, D. Hu, Y. Wang, and X. Yu, “Semisupervised hyperspectralimage classification based on generative adversarial networks,” IEEEGeosci. Remote Sens. Lett., vol. 15, no. 2, pp. 212–216, Feb. 2018.

[45] L. Zhu, Y. Chen, P. Ghamisi, and J. A. Benediktsson, “Generativeadversarial networks for hyperspectral image classification,” IEEE Trans.Geosci. Remote Sens., vol. 56, no. 9, pp. 5046–5063, Sep. 2018.

[46] Y. Zhan et al., “Semi-supervised classification of hyperspectral databased on generative adversarial networks and neighborhood majorityvoting,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., Jul. 2018,pp. 5756–5759.

[47] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating deepnetwork training by reducing internal covariate shift,” in Proc. ICML,2015.

[48] L. J. Ratliff, S. A. Burden, and S. S. Sastry, “Characterization andcomputation of local Nash equilibria in continuous games,” in Proc. 51stAnnu. Allerton Conf. Commun. Control Comput. (Allerton), Monticello,IL, USA, Oct. 2013, pp. 917–924.

[49] V. Dumoulin and F. Visin. (2016). “A guide to convolution arithmeticfor deep learning.” [Online]. Available: https://arxiv.org/abs/1603.07285

[50] S. Ruder. (2016). “An overview of gradient descent optimization algo-rithms.” [Online]. Available: https://arxiv.org/abs/1609.04747

[51] G. M. Foody, “Status of land cover classification accuracy assessment,”Remote Sens. Environ., vol. 80, no. 1, pp. 185–201, Apr. 2002.

[52] M. Abadi et al. (2015). “TensorFlow: Large-scale machine learn-ing on heterogeneous systems.” [Online]. Available: https://arxiv.org/abs/1603.04467

Jie Feng (M’15) received the B.S. degree fromChang’an University, Xi’an, China, in 2008, andthe Ph.D. degree from Xidian University, Xi’an,in 2014.

She is currently an Associate Professor withthe Laboratory of Intelligent Perception and ImageUnderstanding, Xidian University, Xi’an. Her cur-rent interests include remote sensing image process-ing, deep learning, and machine learning.

Haipeng Yu received the B.S. degree from XidianUniversity, Xi’an, China, in 2016, where he iscurrently pursuing the M.S. degree with the KeyLaboratory of Intelligent Perception and ImageUnderstanding, Ministry of Education, School ofElectronic Engineering.

His research interests include machine learn-ing, remote sensing image processing, and patternrecognition.

Lin Wang received the B.S. degree from ShenzhenUniversity, Shenzhen, China, in 2016, where heis currently pursuing the M.S. degree with theKey Laboratory of Intelligent Perception and ImageUnderstanding, Ministry of Education, School ofElectronic Engineering.

His research interests include machine learn-ing, remote sensing image processing, and patternrecognition.

Xianghai Cao (M’13) received the B.E. and Ph.D.degrees from the School of Electronic Engineering,Xidian University, Xi’an, China, in 1999 and 2008,respectively.

Since 2008, he has been with Xidian University,where he is currently an Associate Professor withthe School of Artificial Intelligence. His researchinterests include remote sensing image processing,pattern recognition, and deep learning.

Xiangrong Zhang (SM’14) received the B.S. andM.S. degrees in computer science and technologyand the Ph.D. degree in pattern recognition andintelligent system from Xidian University, Xi’an,China, in 1999, 2003, and 2006, respectively.

She is currently a Professor with the Key Labora-tory of Intelligent Perception and Image Understand-ing, Ministry of Education, School of ElectronicEngineering, Xidian University. Her research inter-ests include visual information analysis and under-standing, pattern recognition, and machine learning.

Licheng Jiao (SM’89–F’18) received the B.S.degree from Shanghai Jiaotong University, Shanghai,China, in 1982, and the M.S. and Ph.D. degrees fromXi’an Jiaotong University, Xi’an, China, in 1984 and1990, respectively.

He has authored or co-authored over 150 scientificpapers. He has charged of about 40 important sci-entific research projects, and published more than20 monographs and 100 papers in international jour-nals and conferences. His research interests includeimage processing, natural computation, machine

learning, and intelligent information processing.

Classification of Hyperspectral Images Based on Multiclass...

Documents

Transcript of Classification of Hyperspectral Images Based on Multiclass...