IEEETRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1 Compressed-Domain Ship Detection ... ·...

12
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1 Compressed-Domain Ship Detection on Spaceborne Optical Image Using Deep Neural Network and Extreme Learning Machine Jiexiong Tang, Student Member, IEEE, Chenwei Deng, Member, IEEE, Guang-Bin Huang, Senior Member, IEEE, and Baojun Zhao Abstract—Ship detection on spaceborne images has attracted great interest in the applications of maritime security and traffic control. Optical images stand out from other remote sensing im- ages in object detection due to their higher resolution and more visualized contents. However, most of the popular techniques for ship detection from optical spaceborne images have two short- comings: 1) Compared with infrared and synthetic aperture radar images, their results are affected by weather conditions, like clouds and ocean waves, and 2) the higher resolution results in larger data volume, which makes processing more difficult. Most of the previ- ous works mainly focus on solving the first problem by improving segmentation or classification with complicated algorithms. These methods face difficulty in efficiently balancing performance and complexity. In this paper, we propose a ship detection approach to solving the aforementioned two issues using wavelet coeffi- cients extracted from JPEG2000 compressed domain combined with deep neural network (DNN) and extreme learning machine (ELM). Compressed domain is adopted for fast ship candidate extraction, DNN is exploited for high-level feature representation and classification, and ELM is used for efficient feature pooling and decision making. Extensive experiments demonstrate that, in comparison with the existing relevant state-of-the-art approaches, the proposed method requires less detection time and achieves higher detection accuracy. Index Terms—Compressed domain, deep neural network (DNN), extreme learning machine (ELM), JPEG2000, optical spaceborne image, remote sensing, ship detection. I. I NTRODUCTION S HIP detection in spaceborne remote sensing images is of vital importance for maritime security and other applica- tions, e.g., traffic surveillance, protection against illegal fish- eries, oil discharge control, and sea pollution monitoring [1]. Vessel monitoring from satellite images provides a wide visual field and covers large sea area and thus achieves a continuous monitoring of vessels’ locations and movements [2]. It is also Manuscript received November 18, 2013; revised April 20, 2014; accepted June 29, 2014. This work was supported in part by the National Natural Science Foundation of China under Grant 61301090, by the Beijing Excellent Talent Fund under Grant 2013D009011000001, and by the Excellent Young Scholars Research Fund of Beijing Institute of Technology under Grant 2013YR0508. J. Tang, C. Deng, and B. Zhao are with the School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China (e-mail: [email protected]). G.-B. Huang is with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TGRS.2014.2335751 known that optical spaceborne images have higher resolution and more visualized contents than other remote sensing images, which is more suitable for ship detection or recognition in the aforementioned applications. However, optical spaceborne images usually suffer from two main issues: 1) weather conditions like clouds, mists, and ocean waves result in more pseudotargets for ship detection, and 2) optical spaceborne images with higher resolution naturally lead to larger data quantity than other remote sensing images, and thus, optical spaceborne images are more difficult to be tackled for real-time applications. Previous works have established some basic ship detection frameworks. Lure and Rau [3] and Weiss et al. [4] proposed hybrid detection systems based on ship tracks in Advanced Very High Radiometer (AVHR) imagery. The works in [5]–[7] presented several approaches in ship detection and recogni- tion from airborne infrared images with sky–sea backgrounds. Burgess [8] introduced vessels detecting algorithm in Satel- lite Pour l’Observation de la Terre (SPOT) Multispectral and Landsat Thematic Mapper images. Also, Wu et al. [9] analyzed the characteristics of different satellite remote sensing images. These works cannot solve the first issue well even though they have low complexity. On the other hand, some other methods achieve better ship detection performance with the price of high computational complexity. Corbane et al. used a neural network to classify small ship candidates from SPOT-5 High Resolution Geometric 5-m images [10] and presented a complete processing chain for ship detection [2]. Morphological filtering is combined with wavelet analysis and radon transform to better distinguish ships from surrounding turbulence. Bi et al. [11] presented a statis- tical salient-region-based algorithm, and multiresource regions of interest are extracted to improve the ship–sea segmentation. Zhu et al. [1] adopted an ensemble learning algorithm, in which multiple high-dimensional local features (679 dimension) are extracted from potential targets using a support vector machine (SVM). Each of these techniques improves a specific procedure in either preprocessing or classification and achieves better performance than classical methods. However, the aforemen- tioned second issue has not been fully addressed. In real-time applications with limited resources, e.g., satellite-based object detection/recognition, the performance and computational com- plexity should be equally taken into account. Unlike the previous works which usually focus on one of the two issues, our approach aims to solve both issues. As for the 0196-2892 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Transcript of IEEETRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1 Compressed-Domain Ship Detection ... ·...

Page 1: IEEETRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1 Compressed-Domain Ship Detection ... · PDF file · 2017-09-24Optical Image Using Deep Neural Network ... which is more suitable

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1

Compressed-Domain Ship Detection on SpaceborneOptical Image Using Deep Neural Network

and Extreme Learning MachineJiexiong Tang, Student Member, IEEE, Chenwei Deng, Member, IEEE,

Guang-Bin Huang, Senior Member, IEEE, and Baojun Zhao

Abstract—Ship detection on spaceborne images has attractedgreat interest in the applications of maritime security and trafficcontrol. Optical images stand out from other remote sensing im-ages in object detection due to their higher resolution and morevisualized contents. However, most of the popular techniques forship detection from optical spaceborne images have two short-comings: 1) Compared with infrared and synthetic aperture radarimages, their results are affected by weather conditions, like cloudsand ocean waves, and 2) the higher resolution results in larger datavolume, which makes processing more difficult. Most of the previ-ous works mainly focus on solving the first problem by improvingsegmentation or classification with complicated algorithms. Thesemethods face difficulty in efficiently balancing performance andcomplexity. In this paper, we propose a ship detection approachto solving the aforementioned two issues using wavelet coeffi-cients extracted from JPEG2000 compressed domain combinedwith deep neural network (DNN) and extreme learning machine(ELM). Compressed domain is adopted for fast ship candidateextraction, DNN is exploited for high-level feature representationand classification, and ELM is used for efficient feature poolingand decision making. Extensive experiments demonstrate that, incomparison with the existing relevant state-of-the-art approaches,the proposed method requires less detection time and achieveshigher detection accuracy.

Index Terms—Compressed domain, deep neural network(DNN), extreme learning machine (ELM), JPEG2000, opticalspaceborne image, remote sensing, ship detection.

I. INTRODUCTION

SHIP detection in spaceborne remote sensing images is ofvital importance for maritime security and other applica-

tions, e.g., traffic surveillance, protection against illegal fish-eries, oil discharge control, and sea pollution monitoring [1].Vessel monitoring from satellite images provides a wide visualfield and covers large sea area and thus achieves a continuousmonitoring of vessels’ locations and movements [2]. It is also

Manuscript received November 18, 2013; revised April 20, 2014; acceptedJune 29, 2014. This work was supported in part by the National Natural ScienceFoundation of China under Grant 61301090, by the Beijing Excellent TalentFund under Grant 2013D009011000001, and by the Excellent Young ScholarsResearch Fund of Beijing Institute of Technology under Grant 2013YR0508.

J. Tang, C. Deng, and B. Zhao are with the School of Information andElectronics, Beijing Institute of Technology, Beijing 100081, China (e-mail:[email protected]).

G.-B. Huang is with the School of Electrical and Electronic Engineering,Nanyang Technological University, Singapore 639798.

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TGRS.2014.2335751

known that optical spaceborne images have higher resolutionand more visualized contents than other remote sensing images,which is more suitable for ship detection or recognition in theaforementioned applications.

However, optical spaceborne images usually suffer from twomain issues: 1) weather conditions like clouds, mists, and oceanwaves result in more pseudotargets for ship detection, and2) optical spaceborne images with higher resolution naturallylead to larger data quantity than other remote sensing images,and thus, optical spaceborne images are more difficult to betackled for real-time applications.

Previous works have established some basic ship detectionframeworks. Lure and Rau [3] and Weiss et al. [4] proposedhybrid detection systems based on ship tracks in AdvancedVery High Radiometer (AVHR) imagery. The works in [5]–[7]presented several approaches in ship detection and recogni-tion from airborne infrared images with sky–sea backgrounds.Burgess [8] introduced vessels detecting algorithm in Satel-lite Pour l’Observation de la Terre (SPOT) Multispectral andLandsat Thematic Mapper images. Also, Wu et al. [9] analyzedthe characteristics of different satellite remote sensing images.These works cannot solve the first issue well even though theyhave low complexity.

On the other hand, some other methods achieve better shipdetection performance with the price of high computationalcomplexity. Corbane et al. used a neural network to classifysmall ship candidates from SPOT-5 High Resolution Geometric5-m images [10] and presented a complete processing chainfor ship detection [2]. Morphological filtering is combined withwavelet analysis and radon transform to better distinguish shipsfrom surrounding turbulence. Bi et al. [11] presented a statis-tical salient-region-based algorithm, and multiresource regionsof interest are extracted to improve the ship–sea segmentation.Zhu et al. [1] adopted an ensemble learning algorithm, in whichmultiple high-dimensional local features (679 dimension) areextracted from potential targets using a support vector machine(SVM). Each of these techniques improves a specific procedurein either preprocessing or classification and achieves betterperformance than classical methods. However, the aforemen-tioned second issue has not been fully addressed. In real-timeapplications with limited resources, e.g., satellite-based objectdetection/recognition, the performance and computational com-plexity should be equally taken into account.

Unlike the previous works which usually focus on one of thetwo issues, our approach aims to solve both issues. As for the

0196-2892 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: IEEETRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1 Compressed-Domain Ship Detection ... · PDF file · 2017-09-24Optical Image Using Deep Neural Network ... which is more suitable

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

first one, deep neural network (DNN) is applied to obtain high-level features and overcome the limitations of existing shipdetection methods. According to the recent works in [12]–[18],deep architecture has multiple levels of feature representation,and the higher levels represent more abstract information. Froma concept point of view, DNN trains multiple hidden layerswith unsupervised initialization, and after such initialization,the entire network will be fine-tuned by a supervised back-propagation algorithm [12], [14]. Practically, DNN can beimplemented with autoencoder, which reconstructs the inputby using the corresponding output of the network [12]. Inaddition, learning sparse representations from original imagepixels leads to better classification performance than usingraw pixels or nonsparse representations obtained by principalcomponent analysis. Moreover, by exploiting the sparsity ofwavelet features, feature detectors localized in both time andfrequency domains can be obtained [19], [20].

As for the second issue, compressed domain is adoptedto improve detection efficiency and utilize sparse featuresfrom both space and frequency domains. Instead of reducingprocessing time by passively cutting an image into tiles orscaling to a low-resolution version, extracting wavelet featuresfrom JPEG 2000 codec can impressively increase the detectionefficiency. In addition, discrete wavelet transform (DWT) hasits unique advantages: multiresolution analysis and singularitydetection. The human visual system detects objects over twofundamental properties: 1) observation is performed simultane-ously in space and frequency and 2) images are perceived ina multiscale manner so that different elements can be focusedwith different scales [21]. After 2-D DWT, the original image isdecomposed into a low-frequency subband (denoted as LL) andseveral horizontal/vertical/diagonal high-frequency subbands(denoted as LH, HL, and HH). Different subbands describedifferent sparse features of input images. In our work, waveletdomain sparsity is further exploited for DNN-based featureextraction.

Until recently, few works have been done on object detectionin JPEG 2000 compressed domain. Xiong and Huang [22]extracted wavelet-based texture features from compressed do-main. Delac et al. [23] analyzed image compression effectsin face recognition systems and proposed a face recognitionmethod in JPEG 2000 compressed domain with independentcomponent analysis and principal component analysis [24].Ni [25] developed the concept of information tree and proposeda novel tree-distance measurement for JPEG 2000 compressed-domain image retrieval. Teynor et al. [26] presented a nonuni-form image retrieval method using color/texture features andmodeling wavelet coefficient distribution with Gaussian mix-ture densities. Zargari et al. [27] exploited packet header infor-mation for JPEG 2000 image retrieval, including the number ofnonzero bit planes, the number of coding passes, and the codeblock length.

However, the aforementioned approaches face difficulty inhandling ship detection under various conditions, and partic-ularly high-frequency subbands are not effectively utilized. Inthe proposed work, low-frequency subband and high-frequencysubbands are exploited for feature extraction using two DNNs.The singularities of LL are extracted to train the first DNN;

as LH, HL, and HH describe the image details in differentorientations (horizontal, vertical, and diagonal), these subbandsare combined before training the second DNN. Then, the out-puts of the two DNNs are pooled for final decision makingby extreme learning machine (ELM) [28], [29]. ELM is anovel and efficient training algorithm for the single-hidden-layer neural network. Compared with traditional neural networkor SVM methods, ELM can be trained hundred times fastersince its input weights and hidden node biases are randomlygenerated and the output weights are analytically computed.

The contributions of the proposed work are threefold: 1) anew compressed-domain framework is developed for fast shipdetection; 2) DNN is employed for hierarchical ship featureextraction in wavelet domain; compared with existing featuredescriptors, the proposed learning-based features are more ro-bust under variant conditions; and 3) a new training model, theELM, is adopted for feature fusion and classification, and thus,faster and better ship detection is achieved.

Using these novel techniques, the proposed framework ismore suitable for ship detection than the aforementioned ap-proaches with the following advantages.

1) Faster detection. Compressed domain achieves muchfaster detection than pixel domain.

2) More reliable results. High-level feature representationsare extracted by hierarchical deep architecture to ensuremore accurate classification.

3) Better utilization of information. Two DNNs are trainedwith multisubbands coefficients to make full use of thewavelet information.

The remainder of this paper is organized as follows.Section II overviews the proposed framework; Section IIIdescribes the preprocessing method for coarse ship locating;Section IV explains the ship feature representation in detail,including the selection of the inputs of autoencoders, finetuning, and ELM-based decision making for deep networks;Section V demonstrates the simulation results comparing theproposed method and other relevant state-of-the-art ship detec-tion methods; and Section VI concludes this paper.

II. PROPOSED METHOD

The typical JPEG 2000 compression is shown in Fig. 1.To clearly illustrate the proposed approach, it is necessary todefine compressed domain in advance. According to the workin [23], the compressed domain is anywhere in the compressionor decompression procedure, after transform or before inversetransform. Therefore, object detection can be conducted incompressed domain from points 1 to 6 in Fig. 1.

Unlike the other points, entropy coding (points 3 and 4 inFig. 1) will obviously change the spatial distribution of theobject features and destroy the structure information. Hence,points 1, 2 and 5, 6 are more suitable for ship detection.Furthermore, as points 5, 6 are symmetry to points 1, 2 in codecimplementation, only points 1, 2 are discussed hereinafter.

At the encoder side, DWT is first performed (point 1 inFig. 1). Then, the resulting coefficients are mapped to differentbit planes by quantization (point 2 in Fig. 1). The bit-planeencoding will not obviously change the properties of wavelet

Page 3: IEEETRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1 Compressed-Domain Ship Detection ... · PDF file · 2017-09-24Optical Image Using Deep Neural Network ... which is more suitable

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

TANG et al.: COMPRESSED-DOMAIN SHIP DETECTION ON SPACEBORNE OPTICAL IMAGE USING DNN AND ELM 3

Fig. 1. JPEG2000 image compression.

Fig. 2. Proposed ship detection framework.

features [30], and thus, the detection accuracy will not beseverely affected. Based on this analysis, point 1 is viewed asthe ideal place for ship detection.

The block diagram of the proposed framework is depicted inFig. 2. It can be decomposed into two main steps: image pre-processing (for coarse ship locating), and feature representationand classification (for ship object detecting).

In the preprocessing, CDF 9/7 wavelet coefficients are ex-tracted from JPEG 2000 codec. The wavelet coefficients indifferent subbands tend to reflect different properties of theoriginal image [31]. Generally speaking, the low frequency con-tains most of the global information, while the high frequencyrepresents local or detail information. In the proposed model,the low-frequency subband LL is exploited for the extraction ofthe regions of ship candidates.

On the other hand, the low-frequency coefficients and high-frequency coefficients (HFCs) are individually processed forfeature extraction by two DNNs, which are to be discussedin Section IV. Moreover, to fully exploit the information ofthe original image in wavelet domain, the resulting featuresfrom low and high subbands are further fused by ELM, formore accurate feature classification (i.e., higher ship detectionaccuracy). The detailed implementations are listed as follows.

1) Wavelet singularities of LL are detected to train a stackeddenoising autoencoder (SDA) 1. Note that SDA is one

of the implementation strategies of DNN and will beintroduced in Section IV-A.

2) The combination of the wavelet coefficients in high-frequency subbands (i.e., LH, HL, and HH) are used totrain an SDA2.

3) The weight matrices of the trained SDAs are consid-ered as feature extractors for low- and high-frequencysubbands, respectively. The obtained features are thencombined to train an online sequential ELM (OS-ELM)[28], [29], [32], [33].

It should be mentioned that the third step can be regardedas decision pooling of SDA1 and SDA2, or training a high-performance classifier of ship features. Since we are to makeour algorithm more robust to various environmental conditions,online training is adopted to further improve the network’s per-formance. The experiments in [29], [34], and [35] showed thatELM is fast and more accurate in large class training and thegeneralization performance of ELM turns out to be very stable.

III. COARSE SHIP LOCATING

As shown in Fig. 2, fast ship locating (i.e., ship candidateextraction) is performed in LL subband, which includes imageenhancement, sea–land segmentation, and ship locating basedon shape criteria.

Page 4: IEEETRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1 Compressed-Domain Ship Detection ... · PDF file · 2017-09-24Optical Image Using Deep Neural Network ... which is more suitable

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Fig. 3. Coarse ship segmentation: (a) Input image, (b) top-hat-transformedimage, (c) binarized with the first threshold T , (d) corrected with the secondthreshold T ′, (e) refined by morphology dilation and erosion, and (f) coarseship location.

A. Image Enhancement

In order to remove uneven illumination, a morphological op-erator, i.e., top-hat transform (THT), is used for ship extractionand background suppression. As ships are usually brighter thantheir surroundings, the white THT is employed in the proposedwork [shown in Fig. 3(b)]. The mathematical definition of whiteTHT is as follows:

Tw(f) = f − f ◦ b (1)

where f is the input LL coefficients of the original image, ◦denotes opening operation, and Tw is the enhanced image. Inthe simulations, b is set as a circular structuring element with aradius of 12.

B. Sea–Land Segmentation

Different from the traditional intensity histogram and maxi-mum variance segmentation, here, a statistical Gaussian modelis adopted to adaptively estimate the probabilistic distributionof the sea regions [36], and the algorithm is as follows.

1) Binarize the input image by the Otsu algorithm [37], andthen label the connected regions.

2) Find the geometrical center P of the largest connectedregion R.

3) Use point P as the starting point; traverse R to obtainanother set of points P ′ satisfying that the A×A (empir-ically set as 60 in the experiments) neighboring regions ofP ′ are inside the region R. Label the points P ′ as all-searegion S.

4) Compute the mean μ and variance σ of S, and use themas the statistical parameters of the Gaussian model.

The resulting μ and σ are used to compute a threshold (T )for image binarization, as follows:

T = μ+ λσ (2)

where λ is the weight of variation (σ) and set as three accordingto the Gaussian distribution.

The binarized image obtained by T [shown in Fig. 3(c)]usually remains holes in large lands or clouds. In this case,a new threshold (T ′) for the elimination of hole regions andincorrectly marked lands is chosen as

T ′ = λ′σ (3)

where λ′ is a parameter to control the similarity of land and sea,empirically set as four. After thresholding with T ′ [shown inFig. 3(d)], the median filtering (with size of [3 3]), morphologydilation, and erosion (circular structuring element with a radiusof three) are applied to fill the isolated holes. Then, the masksof land, cloud, and ship candidates are segmented [shown inFig. 3(e)]. In the following, ship candidates will be furtherextracted by using the unique shape properties of ships. Notethat some of the pseudotargets may be included in the extractedregions; however, they can be removed in the process of featurefusion and classification in Section IV.

C. Ship Locating Criteria

In the previous section, several connected regions are ex-tracted from the resultant masks by labeling the eight-connectedneighbors. Geometric properties of the connected regions arethen used for the locating of ship candidates, which are listedas follows [37], [38].

1) Area: It equals the number of pixels in the correspondingconnected region. Area is used to cut off the lands, clouds,and other obviously large/small false targets.

2) Major minor axis ratio: It is defined as

Rls =LaxisL

LaxisS(4)

where LaxisL and LaxisS are the length of long and shortaxes of the bounding rectangle, respectively.

3) Compactness: Compactness measures the degree of cir-cular similarity, and it is defined as

Compactness =Perimeter2

Area. (5)

By using these shape criteria, we can obtain the coarse loca-tions of ship candidates [shown in Fig. 3(f)]. In the experiments,the size of testing images is 2000 × 2000 (in pixels) with a

Page 5: IEEETRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1 Compressed-Domain Ship Detection ... · PDF file · 2017-09-24Optical Image Using Deep Neural Network ... which is more suitable

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

TANG et al.: COMPRESSED-DOMAIN SHIP DETECTION ON SPACEBORNE OPTICAL IMAGE USING DNN AND ELM 5

resolution of 5 m. The size of ship candidates is supposed to besmaller than 100 × 100 (or larger than 10 × 10). In this case,the regions with area larger than 10 000 (or smaller than 100)would be removed. Moreover, as the long axis of ship should belonger than the minor one, the major minor axis ratio is selectedas 1.5. Compactness is set as 40 to exclude the regions whichare obviously irregular.

It is also worth to note that, compared with original images,using a low-frequency subband (LL) for coarse ship locatingwould decrease the detection accuracy by 0.32% (in statisticalaverage), but the detection speed is improved by more than 60%.

IV. SHIP FEATURE REPRESENTATION

AND CLASSIFICATION

The state-of-the-art ship detection approaches extract com-plicated features and combine them with learning-basedclassification. These feature operators or descriptors, e.g., scale-invariant feature transform [39], speeded up robust features[40], histogram of gradient [41], local multiple pattern (LMP)in [1], and shape/texture features in [11], are engineered to beinvariant under certain rotations or scale variations and chosenfor some specific vision tasks.

Features extracted by these methods generally have somefundamental limitations in practical applications. For example,they may have poor performances when the images are cor-rupted by blur, distortion, or illumination, which commonlyexist in the remote sensing images. Relatively, learning featuresfrom image would help to tackle these issues. Recent worksin [42] and [43] have shown that the features extracted bythe unsupervised learning outperform those manually designedones on object detection or recognition.

However, ship detection is usually under complicated en-vironmental conditions, and the processed images may con-tain various pseudotargets, e.g., islands, clouds, coastlines, etc.Bengio et al. [44]–[46] indicated that traditional machine learn-ing algorithms, e.g., SVM, may have difficulties in efficientlyhandling such highly varying inputs. These learning schemesusually use a few layers of computational units to establish thetraining model. When dealing with highly variant conditions,the computation is exponentially increased.

DNN, decomposing inputs into multiple nonlinear process-ing layers, achieves better performance with much less param-eters in each layer. For example, the d inputs require O(2d)parameters to be represented by SVM with a Gaussian kernel,O(d2) parameters for a single-layer neural network, O(d) pa-rameters for a multilayer network with O(log2 d) layers, andO(1) parameters for a recurrent neural network. In other words,DNN would obtain much better feature representation than tra-ditional learning methods using the same amount of parameters.

With the aids of pyramid-like hierarchical architecture,sparse features can be extracted by DNN. Since the higherhidden layer is more sparse than the lower one, high-level rep-resentations are obtained by this layer-by-layer extraction. Theworks in [12]–[18] have shown that DNN achieves excellentperformance in vision and other applications.

Hence, our ship detection is based on deep architecture.Since, in wavelet domain, different subbands contain different

information of the original image, the features of low and highfrequencies are learned using two DNNs, respectively. Theresultant features are then fused by ELM [28] at the top level,and the reason for selecting ELM as the final pooling layer is tobe discussed in Section IV-D.

A. Introduction of SDA

As shown in Fig. 2, the SDAs [14] are used for training shipfeature extractors in wavelet domain. In this section, we firstbriefly overview the concepts and theories of SDA.

SDA used in our work is one of the implementation strategiesof DNN, and related research studies have gained great traction.It is based on the traditional autoencoder but uses the corruptedinputs rather than the original ones. The work in [14] has provedthat, by using corrupted inputs, SDA can achieve better learningaccuracy than that of the original autoencoder. Practically,denoising autoencoder is used as the building block of SDA, byfeeding the latent representation of the layer below. Each levelhas a representation of the input pattern that is more abstractthan the previous one [47].

The denoising autoencoder maps the input vector x∈ [0, 1]d

to a higher level representation and then uses latent represen-tation y∈ [0, 1]d

′through a deterministic mapping y=fθ(x)=

s(Wx+ b), parameterized by θ={W, b}, where s(·) is the acti-vation function, W is a d′×d weight matrix, and b is a bias vec-tor. The resulting latent representation y is then mapped back toa reconstructed vector z∈ [0, 1]d in the input space z=gθ′(y)=s(W ′y+b) with θ′ = {W ′, b′}. The reverse weight matrix W ′

mapping can optionally be constrained by W ′=WT, and in thiscase, the autoencoder is said to have tied weights.

Each training sample x(i) is mapped to a corresponding y(i)and a reconstruction z(i). The objective function of autoen-coder is as follows:

θ∗, θ∗′= argmin

θ,θ′

1

n

n∑i=1

L(x(i), z(i)

)

= argminθ,θ′

1

n

n∑i=1

L(x(i), gθ′

(fθ

(x(i)

)))(6)

where L is a loss function, and it can be the traditional squarederror L(x, z) = ‖x− z‖2.

An alternative loss, i.e., average reconstruction cross-entropy, is as follows:

LH(x, z) =H(Bx‖Bz)

= −d∑

k=1

[xk log zk + (1− x) log(1− zk)] (7)

where H denotes the cross-entropy and B represents the prob-abilities of training samples [14].

Combining (6) with L = LH , the final objective function thatwe optimized can be written as

θ∗, θ∗′= argmin

θ,θ′Eq0(X)L

(x(i), gθ′

(fθ

(X(i)

)))(8)

where q0(X) denotes the distribution associated with n train-ing samples and E denotes the expectation operator. Thisoptimization can be solved by an algorithm called stochastic

Page 6: IEEETRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1 Compressed-Domain Ship Detection ... · PDF file · 2017-09-24Optical Image Using Deep Neural Network ... which is more suitable

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Fig. 4. Visualization of feature representation in different levels.

gradient descent. Stochastic gradient descent is a widely usedoptimization method for minimizing an objective function[like the one in (8)] that could be written as a sum of differen-tiable functions. Generally, the stochastic gradient descent canavoid local minima of the training error and thus achieves goodclassification performance.

B. Inputs of SDAs

As mentioned in Section II and shown in Fig. 2, the low-frequency (LL) and high-frequency (LL, LH, and HH) sub-bands are trained by two SDAs, respectively.

Singularities represent the sparse structures of LL, and there-fore, they are extracted to train the SDA1. As the LH, HL,and HH already reflect the sparseness of the image, they arecombined and used as the inputs of SDA2.

Before training, the input data need to be initialized by azero-mean or z-score normalization [48]

Z =M − mean(M)

std(M)(9)

where M denotes the input feature matrix and mean and stddenote the 2-D mean and standard deviation, respectively. Thez-score obeys the standard normal distribution that each of thedimensional data has zero mean and standard deviation, andit helps to correct outlier data points and remove light effect,which may significantly improve the training performance.

1) SDA1—Low-Frequency Module: DWT provides local-ized information about the variation of the image around acertain point or its local regularity. Therefore, irregularitieswill be sharpened after DWT. More exactly, the existence ofdiscontinuities in the original image will result in local maximain the wavelet domain [21]. It has also been demonstrated thatfinding wavelet modulus local maxima is an effective way todetect the singularities [31].

The singularities and irregular structures often contain themost important information, and thus, they are particularlymeaningful for object recognition. Moreover, the characteristicsof singularities are suitable for ship detection with its uniqueintensity distribution compared with that of the background.

Based on the aforementioned analysis, we use the localmaxima of LL as the inputs of the SDA1.

2) SDA2—High-Frequency Module: The HFCs contain var-ious object details, e.g., edgelike lines and distinct points,

in different orientations. They reflect the sparse features ofthe original image. Hence, the coefficients of high-frequencysubbands are utilized for training the SDA2.

Practically, the high-frequency subbands are combined asfollows:

IH = α · LH + β · HL + γ · HHs.t. α+ β + γ = 1 (10)

where α, β, and γ denote the weight parameters of differentsubbands. These parameters are set equally to avoid artificiallyintervening the weights of each subband. Please note that, here,symmetric even-order wavelet decomposition filters are used,where CDF 9/7 wavelets are applied.

C. Pretraining and Fine Tuning

In this section, we introduce the details of SDA training forship feature extraction in low and high frequencies. Generallyspeaking, SDA-based feature extractor involves two main steps:pretraining and fine tuning.

The unsupervised layer-by-layer pretraining can help toachieve good generalization and low variance of testing error.Each layer is trained as a denoising autoencoder by minimizingthe reconstruction of its input (which is the output code of theprevious layer).

Based on the recent works in [49]–[51], some additionalparameters are set to further improve the performance of theSDA. Before training, the coefficients are scaled to [0, 1]d, andthe learning rate is set as 0.1. The number of training batchesdepends on the size of data set, usually between [10, 100].Different training batches should contain different classes oftraining samples to achieve better performance. Compared with5% noise that is typically used in SDA [14], the simulations in[51] indicated that it is better drop out 20% inputs combinedwith 50% hidden units.

Once all of the layers are pretrained, the network needsa second stage of supervised training called fine tuning. Thesupervised fine tuning is used to minimize the prediction error.Practically, a logistic regression layer is added on top of thepretrained network.

Fig. 4 shows the different feature representations in differentlevels. It can be seen that each layer’s mapping feature is moremeaningful than that of the previous one. The resultant featurein the low frequency contains more implicit information of the

Page 7: IEEETRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1 Compressed-Domain Ship Detection ... · PDF file · 2017-09-24Optical Image Using Deep Neural Network ... which is more suitable

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

TANG et al.: COMPRESSED-DOMAIN SHIP DETECTION ON SPACEBORNE OPTICAL IMAGE USING DNN AND ELM 7

ship, while the high-frequency one contains more edges andstructures. Their combination reflects the full representation ofa ship, and it has certain semantic meaning.

D. Feature Fusion With ELM

After the two SDAs have been trained, ELM is used for thefusion of ship features obtained from low and high frequencies,and the fused 100-dimensional feature vector is then utilized forclassification and final decision making (i.e., ship detection).

ELM is a training algorithm for single-hidden-layer feed-forward neural networks, and the input weights and hidden-layer bias are randomly set and need not to be tuned. ELM notonly learns extremely fast but also achieves good generalizationperformance [28], [29]. In addition, when the training data areone by one or chunk by chunk, OS-ELM [32], [33] is preferredsince retraining is not required whenever a new chunk of datais received.

In the following, we will introduce the ELM and OS-ELMalgorithms in detail.

1) ELM Training: Given a training set N = {(xi, ti)|xi ∈Rn, ti ∈ Rm, i = 1, . . . , L}, where xi is the feature vectorextracted above and ti represents the class label of each sample.G(·) denotes the activation function, and L is the number ofhidden nodes. The ELM training algorithm can be summarizedas follows [28].

1) Randomly assign hidden node parameters, e.g., inputweight wi and bias bi, i = 1, . . . , L.

2) Calculate the hidden-layer output matrix H .3) Calculate the output weight β

β = H†T (11)

where T = [t1, . . . , tN ]T, H† is the Moore–Penrose gen-eralized inverse of matrix H [29].

The orthogonal projection method can be efficiently used inELM: H† = (HTH)−1HT, if HTH is nonsingular, or H† =HT(HTH)−1, if HHT is nonsingular. According to the ridgeregression theory [29], it was suggested that a positive value1/λ is added to the diagonal of HTH or HHT in the calcu-lation of the output weights β. The resultant solution is morestable and achieves better generalization performance. That is,in order to improve the stability of ELM, we can have

β = HT

(1

λ+HHT

)−1

T (12)

and the corresponding output function of ELM is

f(x) = h(x)β = h(x)HT

(1

λ+HHT

)−1

T (13)

or we can have

β =

(1

λ+HHT

)−1

HTT (14)

and the corresponding output function of ELM is

f(x) = h(x)β = h(x)

(1

λ+HHT

)−1

HTT. (15)

2) OS-ELM: The sequential implementations of the leastsquares solution of OS-ELM [32], [33] are as follows.

Step 1) Initialization:Initialize the learning using a small chunk of

initial training data N0 = {(xi, ti)}N0i=1 from the

given training set N = {(xi, ti)|xi ∈ Rn, ti ∈ Rm,i = 1, . . . , L}.a) Randomly generate the hidden node parameters

(ai, bi), i = 1, . . . , L.b) Calculate the initial hidden-layer output

matrix H0

H0 =

⎛⎜⎝

G(a1, b1, x1) . . . G(aL, bL, x1)...

. . ....

G(a1, b1, xN0) · · · G(aL, bL, xN0

)

⎞⎟⎠ .

c) Estimate the initial output weight β(0) =P0H

T0 T0, where P0 = (HT

0 H0)−1 and T0 =

[t1, . . . , tN0]T.

d) Set k = 0.Step 2) Sequential Learning:

a) Present the (k + 1)th chunk of new observations:

Nk+1 = {(xi, ti)}∑

j=0k+1Nj

i=∑

j=0k+1+1

, where Nk+1

denotes the number of observations in the (k +1)th chunk.

b) Calculate the partial hidden-layer output matrixHk+1 for the (k + 1)th chunk of data Nk+1.

c) Calculate the output weight β(k+1)

Pk+1 =Pk − PkHTk+1

(I +Hk+1PkH

Tk+1

)−1Hk+1Pk (16)

β(k+1) =β(k) + Pk+1HTk+1

(Tk+1 −Hk+1β

(k)). (17)

d) Set k = k + 1; go to step 2)-a).

V. EXPERIMENTS AND ANALYSIS

Extensive experiments are conducted in this section. SinceSDA-based feature extraction, ELM-based feature fusion, andclassification are adopted in this work, we term the proposedmethod as SDA-ELM, which is compared with the relevantstate-of-the-art methods in [1] and [11]. In [1], multiple featuresare fused by SVM (denoted as MF-SVM), while in [11], salientregions are detected before SVM-based classification (denotedas SA-SVM). In addition, another method (SDA-based featurecombined with SVM-based classification) is also tested (de-noted as SDA-SVM).

In the following sections, to verify the effectiveness of eachcomponent of the proposed method (i.e., ship locating, featureextraction, feature fusion, and classification), the performanceof ship candidate segmentation is first tested; then, the proposedSDA-based feature extraction is compared with other featurerepresentation methods; classification performance of ELM isfurther evaluated against SVM, by using different combinationsof extracted ship features; and finally, the overall ship detectionaccuracy is compared to demonstrate the advantages of theproposed scheme under practical testing conditions.

For fair comparison, 4000 5-m-resolution SPOT 5 panchro-matic images with the size of 2000 × 2000 (in pixels) are

Page 8: IEEETRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1 Compressed-Domain Ship Detection ... · PDF file · 2017-09-24Optical Image Using Deep Neural Network ... which is more suitable

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

TABLE ICLASSIFICATION SAMPLES

Fig. 5. Performance comparison of ship candidate segmentation. (a) Originalimage. (b) Manually labeled ground truth (the white pixels indicate ship candi-dates, while the black pixels represent land/sea regions). (c) Results by Chan–Vese model in [1]. (d) Results by the proposed method.

prepared to build an image data set to compare the perfor-mances of MF-SVM [1], SA-SVM [11], SDA-SVM, and SDA-ELM, and 1600 training samples are extracted for featurelearning, shown in Table I. It should be emphasized that theimages for the extraction of 1600 training samples have notbeen included in the testing images.

The testing hardware and software conditions are listed asfollows: Intel-i7 2.4 G CPU, 8 G DDR3 RAM, Windows 7,Matlab R2012b, and Microsoft Visual Studio 2010.

A. Comparison of Coarse Ship Locating

The coarse ship locating is performed in the low-frequencysubband LL. Fig. 5 shows the comparison of segmentationresults of ship candidates, and one can see that the proposedmethod achieves more accurate segmentation than the Chan–Vese model in [1]. In addition, we also conducted objectivecomparisons to evaluate the performances of different methods.Three commonly used criteria were computed: false positiverate (FPR), false negative rate (FNR), and false error (FE). Theyare defined as follows:

FPR(SR,GT) =#(SR ∩ GT)

#(GT)∗ 100%

FNR(SR,GT) =#(SR ∩ GT)

#(GT)∗ 100%

FE (SR,GT) = (FPR + FNR) ∗ 100%

TABLE IIAVERAGE PERFORMANCE OF DIFFERENT SEGMENTATION METHODS

Fig. 6. Visualized 2-D space distributions of the first two principal compo-nents of different features. (a) Singularities of LL. (b) HFCs. (c) Shape/texturefeatures of SA-SVM [11]. (d) LMPs of MF-SVM [1]. (e) Proposed SDA-basedfeatures. (f) Two-dimensional outputs of ELM pooling.

where SR denotes the segmentation results (ship candidates),GT denotes the manually labeled ground truth, #(·) is thenumber of pixels in the corresponding region, and GT andER denote the regions which are not included in GT and ER,respectively.

The averaging comparison results of the proposed methodand Chan–Vese model in [1] are demonstrated in Table II. It isshown that the proposed model has lower FPR, FNR, and FE(better performance).

B. Comparison of Feature Representations

In this experiment, representation performances of differentfeatures are compared, including the LL singularities (LLSs),the HFCs, the LMPs in [1], the shape/texture features in [11],and the proposed SDA-based features.

Principal component analysis [52] is used for visualizingdifferent features in 2-D space. Fig. 6 shows the first twoprincipal components of each feature, where the red pointsrepresent ships and the green ones represent other subclassesshown in Table I.

As can be seen, the distributions of LLS and HFC arecompletely blended together. Relatively, the distances of thefeature points in [11] expand little in the Cartesian coordinates,and still, a large amount of feature points are overlapped. LMPoutperforms the aforementioned features; nearly half of thered points are separated from the green ones. The proposed

Page 9: IEEETRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1 Compressed-Domain Ship Detection ... · PDF file · 2017-09-24Optical Image Using Deep Neural Network ... which is more suitable

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

TANG et al.: COMPRESSED-DOMAIN SHIP DETECTION ON SPACEBORNE OPTICAL IMAGE USING DNN AND ELM 9

Fig. 7. Classification comparison of different features and learningalgorithms.

SDA-based feature performs the best; almost all points are wellseparated. Noted that the outputs are nearly converged after thefeature vector is clustered by ELM [in Fig. 6(f)].

The reason for the poor performances of LLS and HFC isobvious; they are simply the raw DWT coefficients exactedfrom the JPEG 2000 codec. As for the features used in SA-SVM[11], the feature vector includes some simple and straightfor-ward shape or texture information of the objects, such as size,contrast, and energy. It is obvious that clouds and ships mayshare similarities in shape (e.g., object size, major minor axisratio, etc.) and texture (e.g., contrast, energy, etc.). Unlike theothers, LMP obtains more precise information of the samplesby enlarging the dimension of its feature vector. The proposedSDA-based features capture the unique characteristics of shipby multilayer feature learning.

C. Comparison of Feature Fusion and Classification

In this section, the classification performance of ELM iscompared against that of SVM, by using different combina-tions of extracted features. The classification accuracy of eachmethod is computed as

T =Number of correctly classified samples

Number of tested samples∗ 100%.

The experiments are based on k-fold cross-validation [53],which provides a better Monte Carlo estimate than simply ran-domly divided data set. First, the data set is randomly split intok mutually exclusive subsets S1, S2, . . . , Sk of approximatelyequal size. Then, the classifier is trained and tested k times;for each testing t ∈ {1, 2, . . . , k}, it is trained on S without St

and tested on St. As our training data set has 1600 samples,we select k = 4. The classification accuracy of each feature isshown in Fig. 7.

Due to the high-dimensional (679) feature vector, MF-SVMachieves better performance than SA-SVM by 4.5%. It is notsurprising that SDA-SVM and SDA-ELM perform better thanthe other ones, since the features visualized in Fig. 6 are wellseparated. The results further demonstrate the effectiveness ofthe proposed SDA-based features. Moreover, ELM outperformsSVM in terms of learning accuracy, and this is also consistentto the conclusions obtained in [28] and [29].

To make full use of the information in low- and high-frequency subbands, in the proposed framework, ELM isadopted to achieve the fusion of features extracted by the

Fig. 8. Classification error of different features from low and highfrequencies.

TABLE IIICLASSIFICATION TIME COST USING DIFFERENT METHODS

TABLE IVDETECTION PERFORMANCES OF DIFFERENT METHODS

corresponding two SDAs. An additional experiment has beenconducted to verify the effectiveness of the fused feature,and the results are shown in Fig. 8. It can be seen that theperformance (in terms of classification error) of the proposedmethod using both low-frequency coefficient and HFC (denotedas (SDA1+SDA2)-ELM) outperform other methods using low-frequency coefficient (denoted as SDA1-LOW-ELM) or HFC(denoted as SDA2-HIGH-ELM).

Apart from classification accuracy, the feature extraction,training, and testing time of SA-SVM [11], MF-SVM [1], SDA-SVM, and SDA-ELM are compared, and the results are listedin Table III. Due to the relatively tight feature vector extractedin the wavelet domain, the extraction times of SDA-SVM andSDA-ELM are much less than those of SA-SVM and MF-SVM.The training time of SDA-ELM is less than those of SA-SVM,MF-SVM, and SDA-SVM by 50%, and the testing time is lessthan those by 30%. This advantage will be increasing whenlarger data set is used. That is, our approach performs better inextracting, training, and testing time, and these results benefitfrom all aforementioned advantages: faster feature extractionand higher learning efficiency of ELM.

Page 10: IEEETRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1 Compressed-Domain Ship Detection ... · PDF file · 2017-09-24Optical Image Using Deep Neural Network ... which is more suitable

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Fig. 9. Detection results of different methods under different experimental conditions. (Rows 1–4) Land, little clouds, cotton shaped cloud with mist, andlarge area of floccus. (Column 1) Input images. (Column 2) Coarse location of ship candidates. (Column 3) Classification results of MF-SVM [1]. (Column 4)Classification results of SA-SVM [11]. (Column 5) Classification results of our method.

D. Comparison of Overall Detection Performances

Finally, the detection performances of MF-SVM [1], SA-SVM [11], SDA-SVM, and SDA-ELM are compared. For faircomparison, the coarse ship locating described in Section III isapplied with MF-SVM, SA-SVM, and SDA-SVM algorithms.The evaluation criteria are defined as [1]

Accuracy =Number of correctly detected ships

Number of real ships∗ 100%

Missing ratio = 100%− Accuracy

False ratio =Number of falsely detected candidates

Number of detected ships∗ 100%

Error ratio = Missing ratio + False ratio.

The results are shown in Table IV, and one can see that SDA-ELM achieves the best performance. In addition, Fig. 9 showsseveral detection results (SDA-ELM against the other two ex-isting methods) of optical spaceborne images with various falsetargets: island, small clouds, mist, large floccus, etc. One cansee that SA-SVM and MF-SVM cannot well adapt to the largecomplicated area. SA-SVM eliminates part of pseudotargets butalso some ships in the salient region. MF-SVM detects some ofthe ships with several false targets still remained. Comparedwith SA-SVM and MF-SVM, the proposed approach achievesthe best results in various weather conditions.

VI. CONCLUSION

In this paper, we have proposed a compressed-domainship detection framework using DNN and ELM for opticalspaceborne images. Compared with the previous works, theproposed approach achieves better classification by deep-learning-based feature representation model with faster detec-tion in compressed domain. After ship candidates are extracted,the singularities in LL are detected to train the SDA1. Then,the combination of high-frequency components (i.e., LH, HL,and HH) is used to train the SDA2. The two SDAs are viewedas feature extractors to obtain high-level features, and theresultant features are fused by ELM to further improve theclassification results. ELM learns extremely faster and hasbetter generalization than other traditional learning algorithms.Extensive experiments demonstrate that our proposed schemeoutperforms the state-of-the-art methods in terms of detectiontime and accuracy.

As for the possible shortcomings of the proposed work, theparameters in coarse ship locating should be more adaptive tothe image contents. In addition, due to the availability of imagedata sets, the simulations in the proposed work are conductedusing panchromatic images, and other remote sensing imagecould be further tested or verified in a future work.

Moreover, in the experiments, the images of resolution of 5 mare used. In this case, the ships that we succeed to detect maybe larger than 50 m (10 × 10 pixels). However, the limitationon the size of the detected ship is not induced by the proposedframework; it is mainly due to the resolution of the originalimages. In other words, when 5-m-resolution images are used

Page 11: IEEETRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1 Compressed-Domain Ship Detection ... · PDF file · 2017-09-24Optical Image Using Deep Neural Network ... which is more suitable

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

TANG et al.: COMPRESSED-DOMAIN SHIP DETECTION ON SPACEBORNE OPTICAL IMAGE USING DNN AND ELM 11

for ship detection, it is not reasonable to detect smaller ships(for example, 20 m), as very limited object details/features canbe extracted from such a small region (4 × 4 pixels).

Finally, since the object features are extracted and fused byDNN and ELM, the extracted features are with high robustness.Thus, the proposed framework is expected to work well formultispectral or synthetic aperture radar images. Our futurework may focus on the use of the proposed work for shipdetection from multiple sensors.

REFERENCES

[1] C. Zhu, H. Zhou, R. Wang, and J. Guo, “A novel hierarchical method ofship detection from spaceborne optical image based on shape and texturefeatures,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 9, pp. 3446–3456, Sep. 2010.

[2] C. Corbane, L. Najman, E. Pecoul, L. Demagistri, and M. Petit, “A com-plete processing chain for ship detection using optical satellite imagery,”Int. J. Remote Sens., vol. 31, no. 22, pp. 5837–5854, Jul. 2010.

[3] F. Y. Lure and Y.-C. Rau, “Detection of ship tracks in AVHRR cloudimagery with neural networks,” in Proc. IEEE IGARSS, 1994, vol. 3,pp. 1401–1403.

[4] J. M. Weiss, R. Luo, and R. M. Welch, “Automatic detection of ship tracksin satellite imagery,” in Proc. IEEE IGARSS, 1997, vol. 1, pp. 160–162.

[5] A. N. de Jong, “Ship infrared detection or vulnerability,” in Proc. SPIE,1993, pp. 216–224, International Society for Optics and Photonics.

[6] J.-W. Lu, Y.-J. He, H.-Y. Li, and F.-L. Lu, “Detecting small target of shipat sea by infrared image,” in Proc. IEEE Int. CASE, 2006, pp. 165–169.

[7] C. DeSilva, G. Lee, and R. Johnson, “All-aspect ship recognition in in-frared images,” in Proc. Electron. Technol. Directions Year 2000, 1995,pp. 194–198.

[8] D. W. Burgess, “Automatic ship detection in satellite multispectral im-agery,” Photogramm. Eng. Remote Sens., vol. 59, no. 2, pp. 229–237,1993.

[9] G. Wu, J. de Leeuw, A. K. Skidmore, Y. Liu, and H. H. Prins, “Perfor-mance of Landsat TM in ship detection in turbid waters,” Int. J. Appl.Earth Observ. Geoinf., vol. 11, no. 1, pp. 54–61, Feb. 2009.

[10] C. Corbane, F. Marre, and M. Petit, “Using SPOT-5 HRG data in panchro-matic mode for operational detection of small ships in tropical area,”Sensors, vol. 8, no. 5, pp. 2959–2973, 2008.

[11] F. Bi, F. Liu, and L. Gao, “A hierarchical salient-region based algorithmfor ship detection in remote sensing images,” in Advances in NeuralNetwork Research and Applications. New York, NY, USA: Springer-Verlag, 2010, pp. 729–738.

[12] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithmfor deep belief nets,” Neural Comput., vol. 18, no. 7, pp. 1527–1554,Jul. 2006.

[13] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality ofdata with neural networks,” Science, vol. 313, no. 5786, pp. 504–507,Jul. 2006.

[14] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting andcomposing robust features with denoising autoencoders,” in Proc. ICML,2008, pp. 1096–1103, ACM.

[15] D. Erhan, P.-A. Manzagol, Y. Bengio, S. Bengio, and P. Vincent, “Thedifficulty of training deep architectures and the effect of unsupervised pre-training,” in Proc. ICAIS, 2009, pp. 153–160.

[16] Y. Bengio, “Learning deep architectures for AI,” Found. Trends Mach.Learn., vol. 2, no. 1, pp. 1–127, 2009.

[17] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol,“Stacked denoising autoencoders: Learning useful representations in adeep network with a local denoising criterion,” J. Mach. Learn. Res.,vol. 9999, pp. 3371–3408, 2010.

[18] A. Coates and A. Y. Ng, “Learning feature representations with k-means,”in Neural Networks: Tricks of the Trade. New York, NY, USA: Springer-Verlag, 2012, pp. 561–580.

[19] Y. Bengio and A. Courville, “Deep learning of representations,” in Hand-book on Neural Information Processing. New York, NY, USA: Springer-Verlag, 2013, pp. 1–28.

[20] E. C. Smith and M. S. Lewicki, “Efficient auditory coding,” Nature,vol. 439, no. 7079, pp. 978–982, 2006.

[21] M. Tello, C. López-Martínez, and J. J. Mallorqui, “A novel algorithm forship detection in SAR imagery based on the wavelet transform,” IEEEGeosci. Remote Sens. Lett., vol. 2, no. 2, pp. 201–205, Apr. 2005.

[22] Z. Xiong and T. S. Huang, “Wavelet-based texture features can be ex-tracted efficiently from compressed-domain for JPEG 2000 coded im-ages,” in Proc. ICIP, 2002, vol. 1, p. I-481, IEEE.

[23] K. Delac, M. Grgic, and S. Grgic, “Effects of JPEG and JPEG 2000 com-pression on face recognition,” in Pattern Recognition and Image Analysis.New York, NY, USA: Springer-Verlag, 2005, pp. 136–145.

[24] “Face recognition in JPEG and JPEG 2000 compressed domain,” ImageVis. Comput., vol. 27, no. 8, pp. 1108–1120, Jul. 2009.

[25] L. Ni, “A novel image retrieval scheme in JPEG 2000 compressed domainbased on tree distance,” in Proc. ICICS, 2003, vol. 3, pp. 1591–1594,IEEE.

[26] A. Teynor, W. Müller, and W. Kowarschick, “Compressed Domain ImageRetrieval Using JPEG2000 and Gaussian Mixture Models,” in VisualInformation and Information Systems. New York, NY, USA: Springer-Verlag, 2006, pp. 132–142.

[27] F. Zargari, A. Mosleh, and M. Ghanbari, “A fast and efficient compresseddomain JPEG 2000 image retrieval method,” IEEE Trans. Consum Elec-tron., vol. 54, no. 4, pp. 1886–1893, Nov. 2008.

[28] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine:Theory and applications,” Neurocomputing, vol. 70, no. 1–3, pp. 489–501,Dec. 2006.

[29] G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme learning ma-chine for regression and multiclass classification,” IEEE Trans. Syst.,Man, Cybern. B, vol. 42, no. 2, pp. 513–529, Apr. 2012.

[30] M. D. Adams, The JPEG 2000 Still Image Compression Standard, 2001.[31] S. Mallat and W. L. Hwang, “Singularity detection and processing with

wavelets,” IEEE Trans. Inf. Theory, vol. 38, no. 2, pp. 617–643, Mar. 1992.[32] G.-B. Huang, N.-Y. Liang, H.-J. Rong, P. Saratchandran, and

N. Sundararajan, “On-line sequential extreme learning machine,” in Proc.Comput. Intell., 2005, pp. 232–237.

[33] N.-Y. Liang, G.-B. Huang, P. Saratchandran, and N. Sundararajan, “A fastand accurate on-line sequential learning algorithm for feedforward net-works,” IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1411–1423, Nov. 2006.

[34] M. Sahin, “Comparison of modelling ANN and ELM to estimate solarradiation over Turkey using NOAA satellite data,” Int. J. Remote Sens.,vol. 34, no. 21, pp. 7508–7533, 2013.

[35] M. Pal, A. E. Maxwell, and T. A. Warner, “Kernel-based extreme learningmachine for remote-sensing image classification,” IEEE Geosci. RemoteSens. Lett., vol. 4, no. 9, pp. 853–862, 2013.

[36] X. You and W. Li, “A sea–land segmentation scheme based on statisticalmodel of sea,” in Proc. CISP, 2011, vol. 3, pp. 1155–1159, IEEE.

[37] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 3rd ed.Knoxville, TN, USA: Gatesmark Publishing, 2007, pp. 742–745.

[38] M. Sonka, V. Hlavac, and R. Boyle, Image Processing, Analysis, MachineVision. Stamford, CT, USA: Cengage Learning, 1999.

[39] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Nov. 2004.

[40] H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded up robustfeatures,” in Proc. ECCV , 2006, pp. 404–417.

[41] N. Dalal and B. Triggs, “Histograms of oriented gradients for humandetection,” in Proc. CVPR, 2005, vol. 1, pp. 886–893.

[42] M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt, “Se-quential deep learning for human action recognition,” in Human BehaviorUnderstanding. New York, NY, USA: Springer-Verlag, 2011, pp. 29–39.

[43] Y. Netzer et al., “Reading digits in natural images with unsupervisedfeature learning,” in Proc. NIPS Workshop Deep Learn. UnsupervisedFeature Learn., 2011, vol. 2011, pp. 1–9.

[44] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layer-wise training of deep networks,” Adv. Neural Inf. Process. Syst., vol. 19,p. 153, 2007.

[45] Y. Bengio, O. Delalleau, and N. L. Roux, “The curse of highly variablefunctions for local kernel machines,” in Proc. Adv. Neural Inf. Process.Syst., 2005, pp. 107–114.

[46] Y. Bengio and Y. LeCun, “Scaling learning algorithms towards AI,” inLarge-Scale Kernel Machines, vol. 31. Cambridge, MA, USA: MITPress, 2007, pp. 1–41.

[47] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layer-wise training of deep networks,” Proc. Adv. Neural Inf. Process. Syst.,vol. 19, pp. 1–8, 2007.

[48] L. A. Shalabi, Z. Shaaban, and B. Kasasbeh, “Data mining: A preprocess-ing engine,” J. Comput. Sci., vol. 2, no. 9, p. 735, 2006.

[49] G. Hinton, “A practical guide to training restricted Boltzmann machines,”Momentum, vol. 9, no. 1, 2010.

[50] D. Erhan et al., “Why does unsupervised pre-training help deep learning?”J. Mach. Learn. Res., vol. 11, pp. 625–660, Feb. 2010.

[51] N. Srivastava, “Improving neural networks with dropout,” Ph.D. disserta-tion, Univ. Toronto, Toronto, ON, Canada, 2013.

Page 12: IEEETRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1 Compressed-Domain Ship Detection ... · PDF file · 2017-09-24Optical Image Using Deep Neural Network ... which is more suitable

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

[52] I. Jolliffe, Principal Component Analysis. Hoboken, NJ, USA: Wiley,2005.

[53] R. Kohavi, “A study of cross-validation and bootstrap for accuracyestimation and model selection,” in Proc. IJCAI, 1995, vol. 14, no. 2,pp. 1137–1145.

Jiexiong Tang (S’13) received the B.Eng. degreefrom the Beijing Institute of Technology, Beijing,China, in 2013, where he is currently working towardthe M.S. degree in the School of Electrical andInformation Engineering.

His research interests include remote sensing im-age processing, feature learning and extraction, ex-treme learning machines, deep neural networks, andapplications in computer vision.

Chenwei Deng (M’09) received the Ph.D. degree insignal and information processing from the BeijingInstitute of Technology, Beijing, China, in 2009.

He is currently an Associate Professor with theSchool of Information and Electronics, Beijing Insti-tute of Technology. He has authored or coauthoredover 20 technical papers in refereed internationaljournals and conference proceedings, and coeditedone book. His current research interests includeimage/video coding, quality assessment, perceptualmodeling, feature representation/fusion, and object

detection/recognition/tracking.Prof. Deng was awarded the titles of Beijing Excellent Talent and Excellent

Young Scholar of Beijing Institute of Technology in 2013.

Guang-Bin Huang (M’98–SM’04) received theB.Sc. degree in applied mathematics and the M.Eng.degree in computer engineering from NortheasternUniversity, Shenyang, China, in 1991 and 1994,respectively, and the Ph.D. degree in electrical en-gineering from Nanyang Technological University,Singapore, in 1999. During undergraduate period,he also concurrently studied in the Applied Mathe-matics Department and the Wireless CommunicationDepartment, Northeastern University.

From June 1998 to May 2001, he was a ResearchFellow with Singapore Institute of Manufacturing Technology (formerly knownas Gintic Institute of Manufacturing Technology), where he led/implementedseveral key industrial projects (e.g., Chief Designer and Technical Leader ofSingapore Changi Airport Cargo Terminal Upgrading Project, etc.). Since May2001, he has been an Assistant Professor and an Associate Professor withthe School of Electrical and Electronic Engineering, Nanyang TechnologicalUniversity. He serves as an Associate Editor of Neurocomputing, NeuralNetworks, and Cognitive Computation. His current research interests includemachine learning, computational intelligence, and extreme learning machines.

Dr. Huang serves as an Associate Editor of the IEEE TRANSACTIONS ON

CYBERNETICS.

Baojun Zhao received the Ph.D. degree in electro-magnetic measurement technology and equipmentfrom the Harbin Institute of Technology, Harbin,China, in 1996.

From 1996 to 1998, he was a Postdoctoral Fellowwith the Beijing Institute of Technology, Beijing,China, where he has been engaged in teaching andresearch work in the Radar Research Laboratorysince 1998. He has been the Project Leader of morethan 30 national projects. He is currently a Professor,a Ph.D. Supervisor, and the National Professional

Laboratory Director of Signal Acquisition and Processing. He has authoredor coauthored over 100 publications. His main research interests includeimage/video coding, image recognition, infrared/laser signal processing, andparallel signal processing. He has received four ministerial-level scientific andtechnological progress awards in these fields.