Robustness Improvement of Hyperspectral Image Unmixing by Spatial Second-Order Regularization

13
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014 5209 Robustness Improvement of Hyperspectral Image Unmixing by Spatial Second-Order Regularization Sebastian Bauer, Johannes Stefan, Matthias Michelsburg, Thomas Laengle, and Fernando Puente León, Senior Member, IEEE Abstract—The acquisition of hundreds of images of a scene, each at a different wavelength, is known as hyperspectral imaging. This high amount of data allows the extraction of much more information from hyperspectral images compared with con- ventional color images. The forward-looking imaging approach emerged from remote sensing, but is still not very widespread in industrial and other practical applications. Spectral unmixing, in particular, aims at the determination of the components present in a scene as well as the abundance to which each component contributes. This information is valuable, for instance, when discrimination tasks are to be performed. Involving not only spectral, but also spatial information was found to have the potential to improve the unmixing results. Several publications use spatial first-order regularization (closely related to the total variation approach) to incorporate this spatial information. Like in classical image processing, this approach favors piecewise constant pixel transitions. This is why it was proposed in the literature to use second-order regularization instead of first order to approach piecewise-linear transitions. Therefore, we intro- duce Hessian-based regularization to hyperspectral unmixing and propose an algorithm to calculate the regularized result. We use simulated data and images measured in our laboratory to show that both the first- and second-order approaches share many properties and produce similar results. The second-order approach, however, is more robust and thus more accurate in finding the minimum. Both methods smoothen the images in the case of supervised unmixing (i.e., the component spectra are known beforehand) and enhance unsupervised unmixing (when the spectra are not known). Index Terms— Hyperspectral image, unmixing, denoising, total variation, Hessian, regularization. I. I NTRODUCTION H YPERSPECTRAL imaging started spreading from its origin, remote sensing, into other areas of science and is now applied in various fields such as biomedical imaging, chemometrics [1], mineral classification [2] and quality assess- ment of various goods such as food [3]. Due to the fact that Manuscript received April 10, 2014; revised July 22, 2014 and October 1, 2014; accepted October 3, 2014. Date of publication October 8, 2014; date of current version October 28, 2014. The associate editor coor- dinating the review of this manuscript and approving it for publication was Prof. Jose M. Bioucas-Dias. S. Bauer, J. Stefan, M. Michelsburg, and F. Puente León are with the Institute of Industrial Information Technology, Karlsruhe Institute of Technology, Karlsruhe 76187, Germany (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). T. Laengle is with the Fraunhofer Institute of Optronics, System Technologies and Image Exploitation, Karlsruhe 76131, Germany (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2014.2362008 hyperspectral imaging requires little or no sample preparation, provides fast measurement results and acts non-destructively, it is very appealing to very different fields of science, as well as laboratory and industrial applications. While common RGB images consist of only three spectral channels, hyperspectral images contain up to several hundred, evenly spaced spectral channels. This means that for each considered wavelength, a complete image is acquired. From this increased amount of data, more and more precise information can be gained. Using hyperspectral data for the determination of the components present in a scene and the extent to which these components contribute within specific image areas is called spectral unmixing. It is a goal that is not only sought-after in remote sensing, but also in laboratory applications. Examples include the assessment of kidney calculi [1], detecting fake tablets against pharmaceutical counterfeiting [4], determina- tion of the component distribution in wheat kernels [5] and the water content of bread [6]. In spite of these examples, the number of implemented laboratory/industry applications using unmixing is rather low. There is huge potential to be tapped in terms of speed, quality, efficiency and low cost, especially in the processing industry. The number of unmixing techniques is quite large. They are based on partly very different mathematical concepts. An overview of the concepts and techniques is, for example, given in [7]. In the past, most attention was paid to finding the correct image spectra, leaving out spatial information contained in the image pixels. Such methods include, for example, the minimum volume simplex analysis [8] and the N-FINDR algorithm [9]. Assuming the applicability of the linear mixing model (see Section II-A), the image spectra form a simplex in a vector space. The pure spectra are the vertices of such a simplex. Methods exploiting the spectral information like, e.g., minimum volume simplex analysis, aim at finding these vertices. They are stretched to their limits, however, when the image is, e.g., divided into two regions formed by two independent sets of spectra. In such a case, two simplexes exist. Thus, a different idea for improving the unmixing results is to incorporate spatial information into the unmixing methods. This can be included in both the endmember extraction process, where the spectra of the components present in the scene are detected, or in the unmix- ing process itself. Plaza et al. [10] state that the use of the spatial information about neighboring pixels that is naturally contained in every hyperspectral image should be used for end- member extraction methods, although few do so far. They also 1057-7149 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Transcript of Robustness Improvement of Hyperspectral Image Unmixing by Spatial Second-Order Regularization

Page 1: Robustness Improvement of Hyperspectral Image Unmixing by Spatial Second-Order Regularization

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014 5209

Robustness Improvement of Hyperspectral ImageUnmixing by Spatial Second-Order Regularization

Sebastian Bauer, Johannes Stefan, Matthias Michelsburg, Thomas Laengle,and Fernando Puente León, Senior Member, IEEE

Abstract— The acquisition of hundreds of images of a scene,each at a different wavelength, is known as hyperspectralimaging. This high amount of data allows the extraction of muchmore information from hyperspectral images compared with con-ventional color images. The forward-looking imaging approachemerged from remote sensing, but is still not very widespread inindustrial and other practical applications. Spectral unmixing, inparticular, aims at the determination of the components presentin a scene as well as the abundance to which each componentcontributes. This information is valuable, for instance, whendiscrimination tasks are to be performed. Involving not onlyspectral, but also spatial information was found to have thepotential to improve the unmixing results. Several publicationsuse spatial first-order regularization (closely related to the totalvariation approach) to incorporate this spatial information. Likein classical image processing, this approach favors piecewiseconstant pixel transitions. This is why it was proposed in theliterature to use second-order regularization instead of first orderto approach piecewise-linear transitions. Therefore, we intro-duce Hessian-based regularization to hyperspectral unmixingand propose an algorithm to calculate the regularized result.We use simulated data and images measured in our laboratoryto show that both the first- and second-order approaches sharemany properties and produce similar results. The second-orderapproach, however, is more robust and thus more accurate infinding the minimum. Both methods smoothen the images inthe case of supervised unmixing (i.e., the component spectra areknown beforehand) and enhance unsupervised unmixing (whenthe spectra are not known).

Index Terms— Hyperspectral image, unmixing, denoising, totalvariation, Hessian, regularization.

I. INTRODUCTION

HYPERSPECTRAL imaging started spreading from itsorigin, remote sensing, into other areas of science and

is now applied in various fields such as biomedical imaging,chemometrics [1], mineral classification [2] and quality assess-ment of various goods such as food [3]. Due to the fact that

Manuscript received April 10, 2014; revised July 22, 2014 andOctober 1, 2014; accepted October 3, 2014. Date of publication October 8,2014; date of current version October 28, 2014. The associate editor coor-dinating the review of this manuscript and approving it for publication wasProf. Jose M. Bioucas-Dias.

S. Bauer, J. Stefan, M. Michelsburg, and F. Puente León are withthe Institute of Industrial Information Technology, Karlsruhe Institute ofTechnology, Karlsruhe 76187, Germany (e-mail: [email protected];[email protected]; [email protected]; [email protected]).

T. Laengle is with the Fraunhofer Institute of Optronics, SystemTechnologies and Image Exploitation, Karlsruhe 76131, Germany (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2014.2362008

hyperspectral imaging requires little or no sample preparation,provides fast measurement results and acts non-destructively,it is very appealing to very different fields of science,as well as laboratory and industrial applications. Whilecommon RGB images consist of only three spectral channels,hyperspectral images contain up to several hundred, evenlyspaced spectral channels. This means that for each consideredwavelength, a complete image is acquired. From this increasedamount of data, more and more precise information can begained. Using hyperspectral data for the determination of thecomponents present in a scene and the extent to which thesecomponents contribute within specific image areas is calledspectral unmixing. It is a goal that is not only sought-after inremote sensing, but also in laboratory applications. Examplesinclude the assessment of kidney calculi [1], detecting faketablets against pharmaceutical counterfeiting [4], determina-tion of the component distribution in wheat kernels [5] andthe water content of bread [6]. In spite of these examples, thenumber of implemented laboratory/industry applications usingunmixing is rather low. There is huge potential to be tappedin terms of speed, quality, efficiency and low cost, especiallyin the processing industry.

The number of unmixing techniques is quite large. Theyare based on partly very different mathematical concepts.An overview of the concepts and techniques is, for example,given in [7]. In the past, most attention was paid to findingthe correct image spectra, leaving out spatial informationcontained in the image pixels. Such methods include, forexample, the minimum volume simplex analysis [8] and theN-FINDR algorithm [9]. Assuming the applicability of thelinear mixing model (see Section II-A), the image spectraform a simplex in a vector space. The pure spectra are thevertices of such a simplex. Methods exploiting the spectralinformation like, e.g., minimum volume simplex analysis, aimat finding these vertices. They are stretched to their limits,however, when the image is, e.g., divided into two regionsformed by two independent sets of spectra. In such a case,two simplexes exist. Thus, a different idea for improvingthe unmixing results is to incorporate spatial informationinto the unmixing methods. This can be included in boththe endmember extraction process, where the spectra of thecomponents present in the scene are detected, or in the unmix-ing process itself. Plaza et al. [10] state that the use of thespatial information about neighboring pixels that is naturallycontained in every hyperspectral image should be used for end-member extraction methods, although few do so far. They also

1057-7149 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: Robustness Improvement of Hyperspectral Image Unmixing by Spatial Second-Order Regularization

5210 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

compare endmember extraction methods that include spatialinformation with some that do not. Results show that with thespatial preprocessing algorithm (SPP) [11], even the output ofalgorithms that by default do not include spatial information,such as N-FINDR, can be improved. Among the endmemberextraction methods using spatial information are, e.g., thespatial-spectral endmember extraction tool (SSEE) [12] andthe automatic morphological endmember extraction (AMEE)method [13].

Apart from the endmember extraction methods mentionedin [10], various unmixing methods including spatial informa-tion have been proposed recently. They use Stone’s complexitypursuit [14], [15], Markov random fields [16] and spatialfiltering in a given environment of the respective consideredpixel [17]. Kernel functions are applied to nonlinear unmixingin [18]. The inclusion of additional images of the sceneacquired by an RGB camera into unmixing is studied in [19].Canham et al. [20] determine the abundance for each pixelseparately and cluster the pixels afterwards.

Many spatial information-incorporating unmixing methodsinclude regularization, i.e., the consideration of additionalinformation. This results in the minimization of an objectivefunction of the form

J = Jdata + λR. (1)

Jdata is the term ensuring data fidelity, which means thatthe measured data can be explained as good as possible bythe chosen model and the calculated parameter values. Theregularization term R incorporates additional constraints. Theregularization parameter λ has to be adjusted appropriately tocontrol the tradeoff between data fidelity and regularization.If the applicability of the linear mixing model Y = MA + N(see subsection II-A) with the recorded image Y, the end-member matrix M, abundance matrix A and noise matrix Nis assumed, (1) becomes

J (M, A) = Jdata(M, A) + λR(M, A). (2)

The estimates M and A are then found by minimizing theobjective function.

The data fidelity term is often chosen to

Jdata = ‖MA − Y‖2Norm, (3)

where ‖ · ‖Norm denotes a matrix norm which can be chosenfrom norms that are compatible with the subsequent minimiza-tion algorithm. Often used examples include the Frobeniusnorm ‖ · ‖F [21] and the spectral norm ‖ · ‖2 [22], wherea slightly different objective function is used to denoiseand deblur grayscale images by second-order regularization.In case of the Frobenius norm, minimization of Jdata yieldsthe least squares solution.

Zymnis et al. [21] describe the implementation of an alter-nating projected subgradient algorithm for the calculation ofa total variation-like regularization. For total variation (TV) ingeneral see the paper of Rudin, Osher and Fatemi [23]. Briefly,in the case of 1D signals, total variation aims at denoisingthe signal while preserving the original signal form. This is

expressed by the objective function (see [24])

JTV =∑

∀i

(ui − zi )2 + λ

∀i

|ui+1 − ui | . (4)

Here, u is the uncorrupted signal that will be approximated,while z stands for the measured signal. The index i accountsfor all available signal values at the respective time steps.The total variation

∑∀i

|ui+1 − ui | takes care that the differ-

ences between adjacent sampling values remain as small aspossible, therefore favoring constant signal sequence. This isequivalent to minimizing the absolute gradient. One majorcharacteristic of TV compared to the application of, e.g., linearfilters, is that edges are well preserved. The approach ofZymnis et al. which transfers the idea of TV to hyperspectralimage unmixing involves the minimization of the objectivefunction

J = ‖MA − Y‖2F + λ

I∑

i=1

j∈N (i)

‖ai − a j ‖1. (5)

N (i) describes the set of indices of considered pixels inthe neighborhood of pixel ai . ‖x‖1 = ∑

i |xi | is the vector�1 norm. I stands for the total number of pixels in theimage, while ai denotes the i -th column of the abundancematrix, i.e., the abundances corresponding to pixel i . The ideabehind this approach is that the abundance variation from onepixel to another in the original image is not noise-like, butsmooth. Within large areas, there is even no variation in theoriginal image. The approach favors pixel transitions with nochanges in the abundance map. The abundance map of anendmember is a matrix having the same spatial dimensionsas the original image, where each element corresponds tothe abundance of the respective endmember at the corre-sponding pixel position. The conventional TV method appliedto grayscale images preserves edges really well, but theso-called staircase effect is likely to appear: smoothly changingtransitions are approximated by a step-by-step change. Thisis unfavorable, especially when gradual changes are presentin the scene. We will check if the TV-regularization (5)introduced to hyperspectral image unmixing also introducesthe staircase effect into the abundance maps. To circumventthe possibility that the staircase effect may arise, it is possi-ble to use second-order regularization terms instead. WhileTV tries to keep the first derivative as small as possible,second-order regularization makes the second derivative van-ish, which in turn enables piecewise linear transitions andnot only constant ones. Iordache et al. [25] combine theTV regularization approach with a sparsity constraint. In theconclusion, they state that the application of second-orderregularization might be well-suited for linear data. To thebest of our knowledge, second-order regularization of abun-dance maps has not been applied to hyperspectral imageunmixing yet.

For this reason, the intention of this article is twofold: first, itaims at introducing the second-order regularization to unmix-ing in general, including hyperspectral remote sensing, whilesecond, it deepens the usage of common techniques in bothremote sensing and laboratory analysis. Possible industrial

Page 3: Robustness Improvement of Hyperspectral Image Unmixing by Spatial Second-Order Regularization

BAUER et al.: ROBUSTNESS IMPROVEMENT OF HYPERSPECTRAL IMAGE UNMIXING 5211

applications of hyperspectral unmixing are the classificationof mixtures of materials, e.g., raw materials such as ore.For paving the way for such applications, an applicationexample with real laboratory data including known groundtruth is given. In this paper, we apply the method of Zymniset al., extend it to second-order regularization and give acomparison between the results. Additionally, we will showhow the algorithms incorporating the pixel neighborhood per-form in comparison with the common alternating least squaressolution. The remainder is organized as follows: In Section II,unmixing and regularization will be explained in detail andthe developed algorithm will be presented. Section III intro-duces the considered data, while the unmixing results willbe explained in IV. The algorithms and their results willbe discussed in Section V. We will give our conclusion andprovide final remarks in Section VI.

II. UNMIXING AND REGULARIZATION BASICS

A. Mixing Models

Unmixing methods are based on a mixing model [26].In remote sensing, the linear mixing model (LMM)

y = Ma + n (6)

allows for a good representation of the signal: Due to thelow spatial resolution, each pixel captures radiation reflectedby multiple separate areas that are made up of one end-member each. Therefore, the detected spectrum y is a linearcombination of all endmembers contained in the endmembermatrix M, weighted by the corresponding weighting vector a.Here, the measured spectrum y consists of the L recordedspectral samples at the respective wavelengths. M is a L × Rmatrix containing R endmembers mr = [m1,r , . . . , mL ,r ]T ,r = 1, . . . , R, in its columns. The vector a = [a1, . . . , aR]T

represents the weighting coefficients corresponding to eachendmember spectrum.

For the whole image Y with dimensions L × I , (6) con-verts to

Y = MA + N. (7)

Here, I is the number of pixels in the image. A is theabundance matrix with dimensions R× I , while N (size L × I )represents additive noise. When light interacts with more thanone endmember before it is detected by the sensor, the linearmixing model (7) is not accurate enough and needs to bereplaced by a nonlinear one representing the physical synthesisof the reflected light. For the sake of accuracy, the usageof nonlinear mixing models (NMMs) [27] in laboratory andindustry applications is required rather than the usage of thelinear due to the fact that the spatial resolution is much higher.However, there are still many cases in which the use of theLMM is at least approximately valid, see [1]. Another reasonfor choosing the LMM is that the unmixing calculation processis far less time-consuming than with NMMs. In this case, theresult is less, but still sufficient accurate, while it is calculatedfaster. Especially for real-time processing, this is an importantissue.

B. Constraints

Two constraints will be considered when indicated to ensurephysical plausibility:

• Both the endmember spectra and abundances must benonnegative (nonnegativity (NN) constraint)

ml,r ≥ 0 ∀l, r (8)

ar,i ≥ 0 ∀r, i. (9)

• The accumulated abundances of all endmembers in onepixel must be one:

R∑

r=1

ar,i = 1 ∀i. (10)

This is the sum-to-one constraint (STO constraint). In lab-oratory environment, especially when all endmembers areknown, its application is very reasonable, as each objecttotally consists of the endmembers.

C. First-Order Regularization

In this section, the approach of Zymnis et al. will beillustrated shortly. For a more detailed explanation see [21].The objective function (5) is minimized with an alternatingalgorithm fixing first M and minimizing for A and secondlyfixing A and minimizing for M. As the objective is biconvex,i.e., for given A it is convex in M and vice versa, this isa reasonable approach. For minimizing for A, a projectedsubgradient method is used. Such methods are a good choicefor non-differentiable objective functions like the one that isunder consideration. It is worth mentioning that when theobjective function is differentiable, the only subgradient of afunction is its gradient [28]. Throughout this subsection, onlythe columns ai of A will be considered, as they belong toone pixel. ai is the i -th column of A. The same holds for yicorrespondingly. The position of the pixel in the picture is notimportant for explaining the general method. Subsection II-Dwhich explains the second-order regularization will pay moreattention to the pixel positions.

For fixed M, the pixelwise minimization at pixel i requiresthe solution of the minimization problem

minimize ‖yi − Mai‖22 + λ

j∈N (i)

‖ai − a j‖1

subject to ar,i ≥ 0,∑

∀r

ar,i = 1. (11)

The independent variable is ai , while the a j are kept constantduring the optimization run. Then

gi = 2MT Mai − 2MT yi + λsi (12)

is a subgradient of the minimization problem (11). The vectorsi ∈ R

R is calculated by si = ∑j∈N (i)

sgn(ai − a j ) with sgn

being the sign function. The value of ai is iteratively updatedaccording to

a(t+1)i :=

(a(t)

i − γ (t)gi

)

P. (13)

Page 4: Robustness Improvement of Hyperspectral Image Unmixing by Spatial Second-Order Regularization

5212 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

γ (t) is the step size, while (·)P means projection on theprobability simplex for ensuring the constraints given in (11).It is also possible to include the STO constraint into the min-imization procedure itself, see [29]. Notwithstanding Zymniset al., a different projection algorithm will be used throughoutthis paper, namely the one given in [30]. Zymnis et al. statethat all columns of A can be updated simultaneously (primaldecomposition). With the matrix S(t) ∈ R

R×I having thecolumns si , i = 1, . . . , I , the update A(t+1) is calculated by

A(t+1) :=(

A(t) − γ (t)(2MT MA(t) − 2MT Y + λS(t)))

P.

(14)

This time, (·)P stands for projecting each column of thecontained matrix on the probability simplex.

After performing some iterations for fixed M, it will beupdated while keeping A constant. This is done by theminimization task

minimize ‖Y − MA‖2F (15)

subject to M ≥ 0. (16)

The minimization is calculated successively by the gradientsteps

M(t+1) :=(

M(t) − γ (t)(M(t)A − Y)AT)

+ . (17)

(·)+ means defining negative values to zero. The step size ischosen to

γ (t) = 1

‖AAT ‖2(18)

in accordance with [21]. ‖ · ‖2 stands for the spectral norm.One change has been made compared to the work of

Zymnis et al., namely, parameters have been adjusted totake care of the different data values. According to thenonsummable diminishing step size rule [28], the algorithmstill converges when γ (t) is chosen to γ (t) = c

t [31]. Thisensures that the changes applied to the initial values by thealgorithm are large enough to come to a feasible solution.

D. Second-Order Regularization

1) Algorithm: The aim of second-order regularization is toapproximate the measured data as well as possible, while at thesame time the second derivative of the abundance maps is keptas low as possible to favor piecewise linear pixel transitions.For continuous functions f (x, y) in general, the Hessian (alsoconsidered second derivative) is defined as

H ( f ) =

⎢⎢⎣

∂2 f

∂x2

∂2 f

∂x∂y∂2 f

∂y∂x

∂2 f

∂y2

⎥⎥⎦ . (19)

For this reason, we will also refer to second-order regulariza-tion as Hessian (HS) denoising for simplicity.

The discrete hyperspectral image has size Nx × Ny withNx and Ny defining the expansion along x and y, whileNx · Ny = I holds. The Frobenius norm of the Hessian of anabundance map Ar , r ∈ {1, . . . , R}, of a single endmember r

at pixel position (nx , ny) (nx = 1, . . . , Nx , ny = 1, . . . , Ny )results to

‖H (Ar)(nx , ny)‖F =√

A2r,x x(nx , ny) + A2

r,xy(nx , ny)

+A2r,yx(nx , ny) + A2

r,yy(nx , ny).

(20)

In this notation, Ar,x x denotes the second derivative alongthe x-direction, Ar,xy , Ar,yx and Ar,yy the derivatives alongthe respective other directions. Note that for forming a sin-gle abundance map Ar , the respective line of the originalabundance matrix A has to be taken and rearranged to thedimensions Nx × Ny of the original image.

For a hyperspectral image with discrete pixels, the second-order regularization objective function according to the generalform (1) is defined to

J = ‖MA − Y‖22 + λR (A). (21)

R (A) is the sum of Frobenius norms of the Hessians of eachendmember matrix, added up over all pixel positions:

R (A) =R∑

r=1

Nx∑nx =1

Ny∑ny=1

‖H (Ar)(nx , ny)‖F. (22)

Applying the principle of primal decomposition, the valuesof the other pixels (nx +1, ny) and so on described by analogywith (11) are considered constant while minimizing for pixel(nx , ny). As the objective function (21) in contrast to thefirst-order regularization (5) is differentiable, it is convenientto use a gradient projection method, the bounded versionof the steepest descent method [32]. Just like the first-orderregularization (5), the objective function (21) is biconvex aswell: for constant M, the data fidelity part of (21) is the sameas in (5), which is convex. The regularizer (22) is also convexdue to the fact that it is the sum of matrix norms, which inturn are convex in general [33].

Once again following the idea of primal decomposition, theoptimization problem

minimize

Jnx ,ny = ‖ynx ,ny− Manx ,ny ‖2

2 + λ

R∑

r=1

‖H (Ar)(nx , ny)‖F

subject to anx ,ny ≥ 0,∑

∀r

anx ,ny ,r = 1 (23)

for each pixel (nx , ny) is minimized with the gradientprojection method. Here, ynx ,ny

is the pixel at imageposition (nx , ny). This notation ignores the pixel’s posi-tion in the image matrix Y to enable the use of a moreintuitive notation. The same holds for the pixel’s abun-dances anx ,ny . For the gradient projection method see,e.g., Bertsekas et al. [28] or [32]. Walton [34] provides aconvergence proof for (23) when square summable but notsummable step sizes are used. These fulfill the conditions

∞∑

t=1

γ (t) = ∞,

∞∑

t=1

(γ (t))2 < ∞. (24)

Page 5: Robustness Improvement of Hyperspectral Image Unmixing by Spatial Second-Order Regularization

BAUER et al.: ROBUSTNESS IMPROVEMENT OF HYPERSPECTRAL IMAGE UNMIXING 5213

For this reason, like for the TV minimization, the step size isalso chosen to

γ (t) = c

t. (25)

The convergence proof states that the objective functiondoes not decrease monotonically; for this reason, only theminimum objective function value of all iterations is the bestone, not the final. As the algorithm in general convergespretty well, we refrain from searching for the best valueof all iterations and accept the last one. The minimizationprocedure is repeated for all pixels; the gradient matrix ∇ Jis defined to hold the gradients of the pixelwise objec-tive functions in its columns. Its columns are ordered thesame way as in A. This fact enables the application of theequation

A(t+1) :=(

A(t) − γ (t)∇ J)

P. (26)

The complete algorithm is given in the following:given T , P , c, R, Yinitialize A = 1/R · ones(R × I ),if unsupervised unmixing=true then

initialize M = rand(Y)else

initialize M = M(0)

end iffor t=1 to T do

for p=1 to P doconvert A to the matrices Ar , r = 1, ..., Rcalculate ∇ J from the Ar

A := (A − (c/p) · ∇ J )Pend forM := (

M + (1/‖AAT ‖)(MA − Y)AT)+

end forBy M = rand(Y) we mean that the initial spectra matrixconsists of R pixel spectra randomly selected from the imagematrix Y. When the endmember spectra are known beforehand(supervised case), the initial matrix M is defined to the knownspectra M(0). As these known spectra, in comparison withthe spectra contained in the hyperspectral image that is tobe unmixed, could be distorted due to different acquisitionconditions, noise etc., it is reasonable to allow for them beingcorrected during the unmixing process.

2) Derivative Calculation: Let us point out that thecalculation of the derivatives Ar,x x (nx , ny), Ar,xy(nx , ny),Ar,yx(nx , ny) and Ar,yy(nx , ny) should be done carefully.For the following considerations, we neglect the data fidelityterm. Although the objective function (21) is rotationallyinvariant, the gradient of the pixelwise objective function isnot. Figure 1 illustrates this issue: a) shows a rotationallysymmetric picture (to be exact, it consists of a rotated sin(x)/xfunction shifted along the z-axis to contain only positivevalues), while b) shows the value of the TV regularizer (20)which is rotationally symmetric as well. The rotationallysymmetric picture Fig. 1 a) is to be smoothed by Hessianregularization, leaving out any data fidelity. Therefore, thevalues of all pixels on a specific circle should be changedequally to come to a smoothed, symmetric picture, regardless

of the position of a pixel on the circle. c) displays the valueof the pixelwise gradient ∂ Jnx ,ny /∂anx ,ny when the forwarddifferences

Ar,x x(nx , ny) = Ar (nx , ny) − 2Ar (nx + 1, ny)

+Ar (nx + 2, ny) (27a)

Ar,xy(nx , ny) = Ar (nx , ny)−Ar (nx+1, ny)

−Ar (nx , ny +1)+Ar(nx+1, ny +1) (27b)

Ar,yx(nx , ny) = Ar,xy(nx , ny) (27c)

Ar,yy(nx , ny) = Ar (nx , ny) − 2Ar (nx , ny + 1)

+Ar (nx , ny + 2) (27d)

are used. It should be kept in mind that the gradient is notcalculated along any image direction, but gives a measurefor the change of the objective function value when thepixel value Ar (nx , ny) is varied. The usage of the differ-ences (27) results in a wavelike shape of the gradient along thex-y-direction that is introduced into the picture during smooth-ing. The algorithm is not able to correct this wavelike shapein one of the next steps, as this emphasis of the x-y-directiontakes place in every iteration. For this reason, the secondderivatives are approximated by the central differences [35]

Ar,x x(nx , ny) = Ar (nx − 1, ny) − 2Ar (nx , ny)

+Ar (nx + 1, ny) (28a)

Ar,xy(nx , ny) = Ar (nx , ny + 1) − Ar (nx , ny)

−Ar (nx − 1, ny + 1)

+Ar (nx − 1, ny) (28b)

Ar,yx(nx , ny) = Ar (nx + 1, ny) − Ar (nx , ny)

−Ar (nx + 1, ny − 1)

+Ar (nx , ny − 1) (28c)

Ar,yy(nx , ny) = Ar (nx , ny − 1) − 2Ar (nx , ny)

+Ar (nx , ny + 1). (28d)

The regularizer’s (22) gradient resulting from the usageof the central differences (28) is displayed in Fig. 1 d).Although it is also rotationally asymmetrical, the algorithmis able to cope with this slight asymmetry and smoothes thepicture without introducing a wavelike shape. The gradient’selements

(∇R(nx , ny))

r at position (nx , ny) are calculated bythe formula(∇R(nx , ny)

)r

={

10Ar (nx , ny) − 3Ar (nx , ny + 1) − 3Ar (nx , ny − 1)

− 3Ar (nx − 1, ny) − 3Ar (nx + 1, ny)

+ Ar (nx − 1, ny + 1) + Ar (nx + 1, ny − 1)}

/{[

Ar (nx − 1, ny) − 2Ar (nx , ny) + Ar (nx + 1, ny)]2

+ [Ar (nx , ny + 1) − Ar (nx , ny)

− Ar (nx − 1, ny + 1) + Ar (nx − 1, ny)]2

+ [Ar (nx + 1, ny) − Ar (nx , ny)

− Ar (nx + 1, ny − 1) + Ar (nx , ny − 1)]2

Page 6: Robustness Improvement of Hyperspectral Image Unmixing by Spatial Second-Order Regularization

5214 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

Fig. 1. Example of the directional dependence of the gradient ∇R(nx , ny)when the second-order differences are calculated differently. a): exemplaryrotated sin(x)/x function that has to be smoothed by both TV and HSregularization. b): the TV regularizer (20) applied to the picture in a); theresult is rotationally symmetric as well. c): the gradient ∇R(nx , ny) of theHessian with the forward differences (27) applied to the picture a). Notethat the gradient is not a spatial one, but a gradient by the considered pixelanx ,ny which is a scalar in the considered 1D case. d) shows the gradient∇R(nx , ny) with the central differences (28) applied to a). It can be seenthat although d) is not perfectly rotationally symmetric, it comes closer to asymmetric shape than c). c) introduces a wavelike shape into the abundancemaps, while d) does not. Therefore, the central differences (28) are used forthe Hessian regularization.

+ [Ar (nx , ny − 1)−2Ar(nx , ny)+Ar (nx , ny +1)

]2}(1/2)

.

(29)

These elements equal(∂ Jnx ,ny /∂anx ,ny

)r

when the datafidelity is omitted. This holds because for the respective pixel(nx , ny), all other elements of the sums in (22), i.e., the valuesof the Hessian at different pixel positions, are consideredconstant and fall out during the differentiation.

The regularizer (22) is twice differentiable except for thecase when it is zero, i.e., all second derivatives Ar,x x arezero. This means that the denominator in the gradient (29)is also zero. In this case, we define the regularizer’s gradientto zero because when the regularizer is zero, it is at its absoluteminimum and the abundances should not be changed. Such acase almost never occurs due to numerical reasons, therefore,it does not affect the practical application of the regularizer.Periodic boundary conditions are assumed at the image borderswhere an extension of two pixels is required to be able tocalculate the central differences (28). It is worth mentioningthat all of the used differences (28) contain the respective con-sidered pixel. Therefore, when the differences are differenti-ated with respect to the corresponding pixel value anx ,ny , eachdirectional difference is considered. In the literature, there arealso differences that are completely rotationally symmetric, butdo not have the aforementioned property that all differencescontain the respective considered pixel. When differentiatingthe objective function with respect to the considered pixel,some directions fall out as the objective function term doesnot depend on them. This results in a minimization in which

not all spatial directions are considered equally. Such an effectis not desired.

E. Spatial Regularization’s Effect on the Spectral Domain

Although this paper focuses on the spatial regularizationof unmixing, it is worth to take a closer look at its effectsintroduced into the spectral domain. As indicated in the intro-duction, the pure endmembers form the vertices of a simplex.This fact still holds in the case of spatial regularization, ofcourse. Therefore, the question is how the spatial informationhelps the calculated endmember spectra to come closer to thetrue endmember spectra.

The spatial preprocessing algorithm (SPP, [11]), an end-member extraction method, is based on the assumption thatpure image pixels are more likely to be found in homogeneousimage regions (a homogeneous region is a region of pixelswith similar spectra). This is due to the fact that mixed pixelscan be found at transitions between image areas and suchtransitions are not homogeneous by nature, but rather gradual.Gradual transitions result in spectrally dissimilar neighboringpixels. Pure pixels, by contrast, are likely to be found inhomogeneous regions, as mixtures usually do not stretch overlarge image areas. While the SPP passively tries to locatehomogeneous regions to find feasible endmember candidates,spatial regularization actively promotes the creation of morehomogeneous abundance regions during the calculation. Dueto the fact that it is quite likely that there are homogeneousregions in a scene, it is a good approach to promote theapproximation of homogeneous regions to have more of them.After performing an abundance map calculation sequence, theendmembers are calculated from the determined abundances inthe least squares sense. As the objective functions of both first-order (5) and second-order regularization (21), respectively,are convex for fixed abundance matrices, the endmembercalculation is improved towards the correct endmembers ingeneral, as larger regions of the image have a more accurateabundance approximation. This effect of the regularization onboth endmember and abundance approximations is confirmedby simulations without noise, in which the abundances are cor-rected greatly towards the true ones, and the spectra estimationis improved.

III. EXPERIMENTAL DATA

A. Simulated Data

As by its design, the Hessian-based denoising should be ableto approximate piecewise linear data best, we created simu-lated data with piecewise linear transitions. For this purpose, ahyperspectral image of size 5×5×206 using 5 random abun-dance maps fulfilling the STO constraint was created. By linearspatial interpolation, abundance maps having 50 × 50 pixelwere created (Fig. 4, first row). The used spectra were the pureones described in the next subsection III-B. Additive whiteGaussian noise (AWGN) was added to the simulated imageby noising each wavelength’s image separately. This ensuresthat the noise level is wavelength dependent. We repeatedthis experiment with some selected AVIRIS spectra. The usedspectra are explained in Section IV-A.

Page 7: Robustness Improvement of Hyperspectral Image Unmixing by Spatial Second-Order Regularization

BAUER et al.: ROBUSTNESS IMPROVEMENT OF HYPERSPECTRAL IMAGE UNMIXING 5215

Fig. 2. Spectra of the 5 endmembers.

Fig. 3. Organizer box filled with mixtures of white powders.

B. Measured Data

For the practical evaluation of the proposed algorithm,real data have been acquired. The hyperspectral images arerecorded with a Specim SP-SWIR-LVDS-100-N25E SWIRlinescan camera. The recorded wavelength range is 1000 nmto 2500 nm, covered by 256 spectral channels. The line widthis 320 pixels. A specially crafted organizer box made fromnonreflecting plastic with 6 × 7 compartments is filled with35 mixtures of white powders. The powders have been mixedby shaking and stirring in separate containers. The substancesare lactose, cornstarch, sugar powder (97% saccharose), mag-nesium carbonate and calcium sulfate (gypsum), see Table I.These components have spectra with distinctive features,see Fig. 2. After cutting off the first 20 and last 30 bands due tolow signal-to-noise ratio, the spectra have been normalized sothat the sum of all spectral values of one spectrum amountsto one. Ordered from top left to bottom right, the last fivecompartments contain the pure substances. The organizer boxfilled with powder is shown in Fig. 3. Prior to the applicationof the unmixing algorithms, the following steps were taken:

TABLE I

POWDER MIXTURES COMPONENTS

1) Data Preprocessing: The data have been white-balancedby applying the following formula to each of the image’s pixelspectra:

y∗ = y − yblack

ywhite − yblack. (30)

Here, y stands representatively for the spectrum of eachimage pixel. yblack is the corresponding pixel of the previouslyacquired dark frame, while ywhite is the corresponding pixel ofan image previously acquired by recording a white area withknown reflectance.

Defective pixels on the sensor are determined by calculatingthe deviation of a norm, postulating that the vast majority ofpixels is intact. This norm is calculated using the white image.Image values that arise from pixels that are considered defectare interpolated spectrally.

2) Segmentation: For the easy application of the unmixingalgorithm, the central 27×27 pixels of each compartment areconsidered. The size of the taken block should be the same forall compartments. It was chosen as large as possible, meaningthat as many powder pixels as possible are considered while allpixels belonging to the crafter box were left out. Mixed pixelscontaining both crafter box materials and powder were leftout, too. The considered blocks were appended horizontallyto form the measured hyperspectral image Y that was fed intothe algorithms.

3) Endmember Spectra: The endmember spectra can becalculated in two ways: As the last five compartments are filledwith only one substance each, the endmember spectra canbe calculated by averaging over all spectra of the respectivecompartments. We refer to such spectra as pure spectra. As forthe described configuration ground truth is available, it isalso possible to calculate the ideal spectra with the leastsquares step

M = arg minM

1

2‖MAground truth − Y‖2

2 (31)

subject to M ≥ 0. (32)

It is important to keep in mind that the ideal spectraare only an estimation of the endmember spectra, as localinhomogeneities within each compartment are not considered.Both pure and ideal spectra hardly differ. This is an indicationthat the LMM can be used in this case.

IV. RESULTS

Both simulated and measured data have been unmixedwith both the TV algorithm according to [21] whichminimizes the objective function (5) and the second-orderregularization algorithm presented in subsection II-D. For the

Page 8: Robustness Improvement of Hyperspectral Image Unmixing by Spatial Second-Order Regularization

5216 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

Fig. 4. Unmixing results of a simulated image. First row: ground truth abundance maps, second row: unmixing results obtained with TV regularization,third row: unmixing with HS regularization, fourth row: unmixing result calculated by the ALS algorithm.

TV regularization, the pixel neighborhood N (i) is chosen tothe Moore neighborhood (8-connected). As edge or cornerpixels do not have 8 neighbors, the values of S are normalizedby dividing by the number of adjacent pixels to make surethat all pixel values are treated evenly during minimization.For the second-order regularization algorithm, the image iscontinued periodically. The number of endmembers R thealgorithms are supposed to find is always set to the numberof components present in the scene. The initial spectra arechosen randomly from the pixel spectra of the image. Allabundance maps have been initialized equally by one dividedby the number of endmembers.

Due to the fact that alternating least squares is one ofthe most often used unmixing methods, an STO and NNconstrained alternating least squares algorithm (ALS; Matlab’slsqlin() method) is used to calculate the corresponding abun-dance maps both the first-order and second-order regulariza-tion algorithms are compared with.

A. Simulated Data

AWGN with 30 dB SNR was added to a simulated imagecontaining the powder spectra. Table IV lists the obtainedparameter values when the initial endmembers are chosen tonoised versions (30 dB SNR) of the ones that were usedto construct the image. This approach simulates supervised(i.e., the endmembers are known) unmixing when the previ-ously recorded spectra that are fed into the algorithms arenoise-corrupted. Please note that in Table IV, the given SNRvalue refers to the SNR that was added to the initial image,not to the endmembers. The unmixing results of a single runare presented in Fig. 4. Table II gives the cumulated root-mean-square error of the approximation of both spectra andabundances of each substance. One exemplary endmembercalculated by each of the algorithms is given in Fig. 5.The deviation between the spectra is similar for the otherendmembers. It is interesting to note that both regularizedalgorithms correct the initial, noisy endmembers towards the

Page 9: Robustness Improvement of Hyperspectral Image Unmixing by Spatial Second-Order Regularization

BAUER et al.: ROBUSTNESS IMPROVEMENT OF HYPERSPECTRAL IMAGE UNMIXING 5217

TABLE II

SIMULATED DATA: ROOT-MEAN-SQUARE ERROR OF THE CONSIDERED

WHITE POWDERS, ADDED OVER ALL SUBSTANCES

Fig. 5. One endmember calculated by TV, HS and ALS in the run shownin Fig. 4 compared with the true spectrum and the spectrum the algorithmswere initialized with. Both TV and HS show good accordance with the originalspectra and are therefore hard to see.

correct ones, while the ALS solution (number of iterations: 10)deviates even more from the true ones. Figure 6 shows onehorizontal cut through the first abundance map at verticalpixel number 10. Note that no staircase effect occurs althoughTV regularization is applied. This can be explained with theused algorithm: as mentioned in subsection II-C, the pixelwisesubgradient is calculated by si = ∑

j∈N (i)sgn(ai − a j ). The

only case in which no regularization is introduced due tothe fact that the subgradient is zero is the one when allpixels in the neighborhood have the same value. When doubleprecision is used, this is very unlikely, even when the numberof iterations is very high. Therefore, there will always bedifferences between the pixels and no staircase effect occurswhen the described TV regularization algorithm is applied.The fact that the abundances of all endmembers are normalizedto one after each subgradient step makes the occurrence of thestaircase effect even more unlikely.

We created a second hyperspectral image using the abun-dance maps shown in Fig. 4, but this time we used spectragiven in the AVIRIS database [36]. The used materials werealmandine, antigorite, brucite, ice and water which have moredifferent spectra than the white powders considered before.The used spectra contained 239 spectral values from 870 nmto 2976 nm. The resulting hyperspectral datacube was treatedthe same way as the first one, except that a higher noise levelwas added to the endmembers (SNR=20 dB). The resultingerror values are shown in Table III. Note that the absolutevalue of the errors cannot be compared with the first simulatedimage due to the fact that the endmembers are of differentmagnitude. Both HS and TV regularization perform better

Fig. 6. Horizontal cut through the first abundance maps of each row shownin Fig. 4 at vertical pixel number 10.

TABLE III

SIMULATED DATA: ROOT-MEAN-SQUARE ERROR OF THE CONSIDERED

AVIRIS MATERIALS, ADDED OVER ALL SUBSTANCES

TABLE IV

FEASIBLE PARAMETER VALUES FOR SIMULATED DATA. THE SNR

VALUE REFERS TO THE IMAGE, I.E., EACH BAND IS

CORRUPTED BY SNR OF THE GIVEN LEVEL

TABLE V

PARAMETER VALUES USED FOR MEASURED DATA

than ALS in estimating the abundance maps, however, theendmember approximation is worse than the one obtainedby ALS. The parameter values were the same as for the firstimage, see Table IV.

In general, the estimation of both abundances and endmem-bers in comparison with ALS improves when regularizationis applied, especially with high spatial smoothing (high λ).There is no significant difference between HS andTV regularization.

The stated parameter values in Table IV are found to providesatisfactory results, however, the exact values depend on theapplication, e.g., the ground truth abundance map structure andon how much the results can be smoothed without destroyingthe underlying structure. The most critical parameter is thestepsize parameter c (end of Section II-C and (25)), theconvergence does not depend noteworthy on the iteration

Page 10: Robustness Improvement of Hyperspectral Image Unmixing by Spatial Second-Order Regularization

5218 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

Fig. 7. Abundance maps results of the measured data. First row: ground truth abundance maps, second row: unmixing results obtained with TV regularization,third row: unmixing with HS regularization, fourth row: unmixing result calculated by the ALS algorithm. The columns are arranged according to the endmembernumbers in Fig. 2.

numbers P and T . An image with an SNR of 10 dB isalmost impossible to be unmixed meaningfully, however, itis often possible to recognize the structure at least partly. Theexperiment shown in Fig. 4 was also conducted with piecewiseconstant data; in this case, the parameter values are in the rangeof the ones given in Table IV. We omit the results as they donot provide more insight.

B. Measured Data

Both first and second-order regularization have been appliedto the measured data described in Subsubsection III-B2. Thechosen parameter values are given in Table V. The number ofALS iterations is 10.

The resulting abundance maps are shown in Fig. 7, whileFig. 8 displays the corresponding endmembers. The cumulatedroot-mean-square errors of the approximations of both spectra

and abundances of each substance are given in Table VI.The abundance maps are improved by the application ofregularization. In general, the spectra resulting from bothfirst-order and second-order regularization hardly differ. Theendmembers calculated by ALS are good approximations ofthe true endmembers; however, the endmembers calculated bythe regularization algorithms are much closer to the groundtruth endmembers. Every outlying endmember estimation inthe ALS solution is corrected greatly towards the groundtruth endmember by the regularization algorithms. Note thatin Fig. 8, the endmembers calculated by ALS (dash-dottedline) are the ones that have the largest distance to theground truth endmembers, while the endmembers calculatedby TV or HS regularization mostly overlap with the groundtruth endmembers. Especially endmember 4, which is hard tounmix due to the lack of distinctive minima and maxima, isimproved greatly by spatial regularization. The corresponding

Page 11: Robustness Improvement of Hyperspectral Image Unmixing by Spatial Second-Order Regularization

BAUER et al.: ROBUSTNESS IMPROVEMENT OF HYPERSPECTRAL IMAGE UNMIXING 5219

Fig. 8. Resulting endmembers after the application of the unmixingalgorithms. The spectra have been offset for clarity, while the spectra colorscorrespond to the ones in Fig. 2. The ALS spectra are the ones that have thelargest distance to the ground truth endmembers. The spectra calculated byTV or HS regularization are hard to see because they are very close to oreven overlap with the ground truth endmembers.

abundance maps are improved compared to the ones calculatedby ALS, especially the compartment with the pure endmemberis recognized. Nevertheless, the abundance approximation ofendmember 4 is still rather poor. From different analyses, weknow that the abundances of this endmember are consequentlyunderestimated. The reason for this effect could be that thesubstance is slightly transparent. This would mean that in theconsidered mixtures, only the spectra of the other substancescan be detected. Such detailed chemical analysis is omitted,however, because the focus of this paper is on mathematicalunmixing methods.

It is likely that the usage of spectrum preprocessing meth-ods, such as the calculation of first or second derivativespectra, spectrum filtering, e.g., Savitzky-Golay [37] and otherapproaches, improves the unmixing result. Also, it would bereasonable to allow for additional endmembers to be found,both to account for noise and a constant component, as it wasdone by Zymnis et al. [21]. However, it goes beyond the scopeof this paper to investigate the effects of the application of suchadditional methods.

V. DISCUSSION

In this section, we will discuss some properties of the pre-sented HS regularization in comparison with the TV regular-ization. In the case of unsupervised unmixing, it is sometimestricky for both algorithms to find the correct parameter setwhich is appropriate for the given data. The most important

TABLE VI

MEASURED DATA: ROOT-MEAN-SQUARE ERROR OF THE CONSIDERED

SUBSTANCES, ADDED OVER ALL SUBSTANCES

Fig. 9. Objective function value according to (5) of a TV run (simulatedimage with AVIRIS spectra). Whole run (left), magnified section (right). Theused parameter values were P = 100, T = 50, k = 100 and λ = 5 · 10−6.Note the increasing objective function at around 700 iterations.

parameter for both TV and HS regularization is the step sizeparameter c (see the end of Section II-C and equation (25))which has quite a large impact on the convergence. The num-bers of iterations P and T each can be chosen arbitrarily froma range between 10 and 100. Whenever initial informationlike roughly or even completely known spectra is available,it is advisable to include the information in the unmixingprocess. Possible stopping criteria for both algorithms are themaximum number of iterations (used throughout the paper), orstopping when the objective function value after each iterationt hardly changes. This stopping criterion, however, is notrecommended because both TV and HS regularization are nodescent methods. This means that the objective function doesnot decrease with every step, but converges in the long run.Additionally, it is possible to observe the magnitude of thevector �1 norms of the respective gradients and stop as soonas it is smaller than some pre-defined threshold.

With fixed P and T , the calculation time of HS is aboutas long as the one of the TV calculation. The exact numberdepends on the specific image and on the other computationsrunning on the machine, as the calculations were performedon a multicore desktop PC. We performed the unmixing ofthe simulated image with the AVIRIS spectra 15 times. TheHS calculation time ranged from 204.6 s to 221.2 s, while theTV calculation took from 202.77 s to 212.66 s. In average,the HS minimization needs 1.5 % more time.

Finally, we want to point out the most significant advan-tage of the HS regularization over the TV regularization.As mentioned before, both approaches are no descent methods.Depending on the step size parameter c, namely for largervalues, for the considered scenarios it was observed thatthe convergence of the TV minimization is frequently notmonotonic. During the first few sequences of abundance

Page 12: Robustness Improvement of Hyperspectral Image Unmixing by Spatial Second-Order Regularization

5220 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 12, DECEMBER 2014

map minimization steps, each sequence interrupted by asingle endmember calculation step, the objective function isconstantly decreasing. After some sequences, however, theobjective function starts to rise after the single endmembercalculation step was calculated. For an exemplary run, seeFig. 9. Even with an increased number of abundance mapcalculations, the objective function continued increasing. Thisphenomenon was never observed during any HS run, wherethe objective function decreases continuously. The reason forthe rise could be the structure of the TV objective function.Although the objective function increase (about 0.1 to 1 %)did not result in worse abundance or endmember estima-tion results during our calculations, it could do so whendifferent images with different spectra are to be unmixed.Therefore, HS regularization is the more robust and in certainsituations more accurate regularization.

VI. CONCLUSION

Although the spatial information contained in a hyperspec-tral image is its main characteristic compared with singlespectra, this information inherently present in an image wasignored for a long time in the development of hyperspectralimage unmixing algorithms. Only some years ago, algorithmsincluding this information started emerging. One approach ofincluding the spatial information is to incorporate it in theobjective function by adding a regularizer. The first-order,TV-like regularization method [21] aims at keeping the firstspatial derivative as small as possible, therefore promotingconstant abundances by design. In practice, however, due to thesum-to-one constraint, it is also suitable for linear transitions.In this paper, we extended this work and presented a newmethod for spectral unmixing incorporating spatial informa-tion. Its mode of operation is to keep the value of the secondspatial derivative as low as possible, thus favoring piecewiselinear transitions between image regions. Both methods hardlydiffer in terms of complexity and results. Although its calcula-tion needs slightly more computation time, the main advantageof the new second-order regularization method, however, isits increased robustness towards parameter changes. In prac-tice, it also has better convergence properties resulting in ancontinuous decrease of the objective function, while the first-order objective function sometimes increases. Additionally,the second-order objective function has the advantage ofbeing differentiable, thus enabling further improvement of theimplementation. The performance can be enhanced by usingfaster minimization methods, e.g., methods with variable stepsize which will be subject to future work. The investigatedmethods can be applied to all kinds of hyperspectral data,regardless of the acquisition, whether by remote sensing or lab-oratory devices. The applicability and usefulness of this specialmethod and remote sensing unmixing methods in general inlaboratory and industrial environments has been pointed outby using especially acquired laboratory mixing data. As otherpublications already stated, the linear mixing model can stillbe used in such environments and provides satisfactory results.Further work will incorporate the use of unmixing methodsfor the determination of material fractions in industrial sortinggoods.

REFERENCES

[1] S. Piqueras, L. Duponchel, R. Tauler, and A. de Juan, “Resolutionand segmentation of hyperspectral biomedical images by multivariatecurve resolution-alternating least squares,” Anal. Chim. Acta, vol. 705,nos. 1–2, pp. 182–192, 2011.

[2] T. H. Kurz, J. Dewit, S. J. Buckley, J. B. Thurmond, D. W. Hunt,and R. Swennen, “Hyperspectral image analysis of different carbon-ate lithologies (limestone, karst and hydrothermal dolomites): ThePozalagua Quarry case study (Cantabria, north-west Spain),” Sedimen-tology, vol. 59, no. 2, pp. 623–645, Feb. 2012.

[3] D.-W. Sun, Hyperspectral Imaging for Food Quality Analysis andControl. Amsterdam, The Netherlands: Elsevier, 2010.

[4] M. B. Lopes, J.-C. Wolff, J. M. Bioucas-Dias, and M. A. T. Figueiredo,“Near-infrared hyperspectral unmixing based on a minimum volume cri-terion for fast and accurate chemometric characterization of counterfeittablets,” Anal. Chem., vol. 82, no. 4, pp. 1462–1469, 2010.

[5] M. Arngren, M. N. Schmidt, and J. Larsen, “Unmixing of hyperspectralimages using Bayesian non-negative matrix factorization with volumeprior,” J. Signal Process. Syst., vol. 65, no. 3, pp. 479–496, 2011.

[6] Z. Liu and F. Møller, “Bread water content measurement based onhyperspectral imaging,” in Proc. Scandin. Workshop Imag. Food Quality,Kongens Lyngby, Denmark, May 2011, pp. 93–98.

[7] J. M. Bioucas-Dias et al., “Hyperspectral unmixing overview: Geomet-rical, statistical, and sparse regression-based approaches,” IEEE J. Sel.Topics Appl. Earth Observat. Remote Sens., vol. 5, no. 2, pp. 354–379,Apr. 2012.

[8] J. Li and J. M. Bioucas-Dias, “Minimum volume simplex analysis:A fast algorithm to unmix hyperspectral data,” in Proc. IEEE Int. Geosci.Remote Sens. Symp. (IGARSS), vol. 3. Jul. 2008, pp. III-250–III-253.

[9] M. E. Winter, “N-FINDR: An algorithm for fast autonomousspectral end-member determination in hyperspectral data,”Proc. SPIE Imag. Spectrometry V, vol. 3753, pp. 266–275, Oct. 1999.[Online]. Available: http://dx.doi.org/10.1117/12.366289

[10] A. Plaza, G. Martín, J. Plaza, M. Zortea, and S. Sánchez, “Recent devel-opments in endmember extraction and spectral unmixing,” in OpticalRemote Sensing (Augmented Vision and Reality), vol. 3, S. Prasad,L. M. Bruce, and J. Chanussot, Eds. Berlin, Germany: Springer-Verlag,2011, pp. 235–267.

[11] M. Zortea and A. Plaza, “Spatial preprocessing for endmember extrac-tion,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 8, pp. 2679–2693,Aug. 2009.

[12] D. M. Rogge, B. Rivard, J. Zhang, A. Sanchez, J. Harris, and J. Feng,“Integration of spatial–spectral information for the improved extractionof endmembers,” Remote Sens. Environ., vol. 110, no. 3, pp. 287–303,Oct. 2007.

[13] A. Plaza, P. Martinez, R. Perez, and J. Plaza, “Spatial/spectral endmem-ber extraction by multidimensional morphological operations,” IEEETrans. Geosci. Remote Sens., vol. 40, no. 9, pp. 2025–2041, Sep. 2002.

[14] J. V. Stone, “Blind source separation using temporal predictability,”Neural Comput., vol. 13, no. 7, pp. 1559–1574, 2001.

[15] S. Jia and Y. Qian, “Spectral and spatial complexity-based hyperspec-tral unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 12,pp. 3867–3879, Dec. 2007.

[16] O. Eches, N. Dobigeon, and J.-Y. Tourneret, “Enhancing hyperspectralimage unmixing with spatial correlations,” IEEE Trans. Geosci. RemoteSens., vol. 49, no. 11, pp. 4239–4247, Nov. 2011.

[17] A. Zare, “Spatial-spectral unmixing using fuzzy local information,”in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), Jul. 2011,pp. 1139–1142.

[18] X. Wu, X. Li, and L. Zhao, “A kernel spatial complexity-based nonlinearunmixing method of hyperspectral imagery,” in Life System Modelingand Intelligent Computing. Berlin, Germany: Springer-Verlag, 2010,pp. 451–458.

[19] M. Michelsburg and F. Puente León, “Combined spatial and spectralunmixing of image signals for material recognition in automated inspec-tion systems,” Proc. SPIE, Videometrics, Range Imag., Appl. XII; Autom.Vis. Inspection, vol. 8791, p. 87911E, May 2013.

[20] K. Canham, A. Schlamm, A. Ziemann, B. Basener, and D. Messinger,“Spatially adaptive hyperspectral unmixing,” IEEE Trans. Geosci.Remote Sens., vol. 49, no. 11, pp. 4248–4262, Nov. 2011.

[21] A. Zymnis, S.-J. Kim, J. Skaf, M. Parente, and S. Boyd, “Hyperspectralimage unmixing via alternating projected subgradients,” in Proc. Conf.Rec. 41st Asilomar Conf. Signals, Syst. Comput. (ACSSC), Nov. 2007,pp. 1164–1168.

Page 13: Robustness Improvement of Hyperspectral Image Unmixing by Spatial Second-Order Regularization

BAUER et al.: ROBUSTNESS IMPROVEMENT OF HYPERSPECTRAL IMAGE UNMIXING 5221

[22] S. Lefkimmiatis, A. Bourquard, and M. Unser, “Hessian-based normregularization for image restoration with biomedical applications,” IEEETrans. Image Process., vol. 21, no. 3, pp. 983–995, Mar. 2012.

[23] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation basednoise removal algorithms,” Phys. D, Nonlinear Phenomena, vol. 60,nos. 1–4, pp. 259–268, 1992.

[24] D. C. Dobson and C. R. Vogel, “Convergence of an iterative methodfor total variation denoising,” SIAM J. Numer. Anal., vol. 34, no. 5,pp. 1779–1791, 1997.

[25] M.-D. Iordache, J. M. Bioucas-Dias, and A. Plaza, “Total variationspatial regularization for sparse hyperspectral unmixing,” IEEE Trans.Geosci. Remote Sens., vol. 50, no. 11, pp. 4484–4502, Nov. 2012.

[26] N. Keshava and J. F. Mustard, “Spectral unmixing,” IEEE SignalProcess. Mag., vol. 19, no. 1, pp. 44–57, Jan. 2002.

[27] J. M. P. Nascimento and J. M. Bioucas-Dias, “Nonlinear mixture modelfor hyperspectral unmixing,” Proc. SPIE, Image Signal Process. RemoteSens. XV, vol. 7477, p. 74770I, Sep. 2009.

[28] D. P. Bertsekas, Nonlinear Programming. Belmont, MA, USA:Athena Scientific, 1999.

[29] A. Huck, M. Guillaume, and J. Blanc-Talon, “Minimum dispersionconstrained nonnegative matrix factorization to unmix hyperspectraldata,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 6, pp. 2590–2602,Jun. 2010.

[30] Y. Chen and X. Ye. (2011). “Projection onto a simplex,” [Online].Available: http://arxiv.org/abs/1101.6081

[31] B. T. Polyak, Introduction to Optimization. New York, NY, USA:Optimization Software, 1987.

[32] C. T. Kelley, Iterative Methods for Optimization, vol. 18. Philadelphia,PA, USA: SIAM, 1999.

[33] S. P. Boyd and L. Vandenberghe, Convex Optimization. Cambridge,U.K.: Cambridge Univ. Press, 2004, ch. 3. [Online]. Available:http://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf

[34] N. Walton. Projected Gradient Descent. [Online]. Available:http://staff.science.uva.nl/~walton/Notes/P_Gradient_Descent.pdf,accessed Mar. 27, 2014.

[35] M. Bergounioux and L. Piffet, “A second-order model for imagedenoising,” Set-Valued Variat. Anal., vol. 18, nos. 3–4, pp. 277–306,2010.

[36] R. N. Clark et al. USGS Digital Spectral Library Splib06a. USGeological Survey, Reston, VA, USA, 2007. [Online]. Available:http://speclab.cr.usgs.gov/spectral.lib06

[37] F. Tsai and W. Philpot, “Derivative analysis of hyperspectral data,”Remote Sens. Environ., vol. 66, no. 1, pp. 41–51, 1998.

Sebastian Bauer received the B.Sc. and M.Sc.degrees in electrical engineering from the KarlsruheInstitute of Technology (KIT), Karlsruhe, Germany,in 2010 and 2012, respectively. Since 2013, hehas been a Research Associate with the Instituteof Industrial Information Technology, KIT. Hisresearch interests are system modeling, signal andimage processing, and communications.

Johannes Stefan received the B.Sc. andM.Sc. degrees in electrical engineering andinformation technology from the Karlsruhe Instituteof Technology, Karlsruhe, Germany, in 2010 and2013, respectively. His main research interestsinclude signal processing, image analysis, andconvex optimization techniques.

Matthias Michelsburg received theDipl.-Ing. degree in electrical engineering fromthe Karlsruhe Institute of Technology (KIT),Karlsruhe, Germany, in 2010. Since 2010, he hasbeen with the Institute of Industrial InformationTechnology, KIT. His research interests arehyperspectral imaging, image processing, patternrecognition, and classification.

Thomas Laengle received the M.Sc. degree incomputer science from the Karlsruhe Institute ofTechnology (KIT), Karlsruhe, Germany, in 1993,with a focus on the localization of robot systems.He was a Researcher in Robotics, and received thePh.D. degree in computer science in 1996. He wasa Leader of a Research Group with the Institute forProcess Control and Robotics, KIT, and received theHabilitation degree in distributed diagnosis systemsin 2003. Since 2011, he has been an ExtraordinaryProfessor at KIT. He is currently the Head of the

Business Unit Vision Based Inspection Systems with the Fraunhofer Instituteof Optronics, System Technologies and Image Exploitation, Karlsruhe. Hisresearch interests include different aspects of image processing and realtime algorithms for inspection systems. He also offers lectures in ComputerScience at KIT and initiates many possibilities for students to work on appliedresearch.

Fernando Puente León (S’93–M’99–SM’06)received the Dipl.-Ing. degree in electrical engi-neering and the Ph.D. degree in automated visualinspection from the Karlsruhe Institute of Technol-ogy (KIT), Karlsruhe, Germany, in 1994 and 1999,respectively. He is currently a Professor with theDepartment of Electrical Engineering and Informa-tion Technology, KIT, where he heads the Instituteof Industrial Information Technology. From 2001 to2002, he was with DS2, Valencia, Spain. From 2002to 2003, he was a Post-Doctoral Research Associate

with the Institut für Mess-und Regelungstechnik, KIT, and a Professor with theDepartment of Electrical Engineering and Information Technology, TechnischeUniversität München, Munich, Germany, from 2003 to 2008. His researchinterests include image processing, automated visual inspection, informationfusion, measurement technology, pattern recognition, and communications.