• date post

27-Aug-2020
• Category

Documents

• view

5

0

Embed Size (px)

Transcript of Data Analysis Methods and Applications: Hyperspectral Band ... Hyperspectral Imagery (HSI)...

• INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

Data Analysis Methods and Applications: Hyperspectral Band Selection and Data

Classification on Embedded Grassmannians

Sofya Chepushtanova

Department of Mathematics Colorado State University

February 10, 2014

SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 1 OF 48

• INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

Outline

1. Introduction Motivation Sparse SVMs

2. Hyperspectral Band Selection Hyperspectral Imagery (HSI) Algorithm Computational Results Future Work

3. Classification of Data on Grassmannians Grassmannian Framework Algorithm Application to HSI Future Work

4. Future Directions

SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 2 OF 48

• INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

Motivation

Application-driven research

Algorithms for Threat Detection (ATD) program (launched in 2009): developing novel mathematical and statistical methods to extract meaningful information from large data streams

Big data: massive, high-dimensional, complex

Growing demand for geometric data analysis, classification, and dimension reduction models

Dimension reduction - how? Feature extraction: transforms the data to a lower dimensional space, using manifold learning techniques Feature selection: identifies the relevant set of features while maintaining or improving the performance of a prediction model

SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 3 OF 48

• INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

Support Vector Machines

Training data xi ∈ Rn with class labels di ∈ {−1,+1}, i = 1, . . . ,m; D = diag(di) and X is the m× n data matrix. Separating hyperplane P = {x : wTx + b = 0}, w ∈ Rn is normal to P. Points on wTx + b = ±1 are support vectors. The optimal P has the largest margin 2/‖w‖2.

SVM:

min w,b,ξ

‖w‖22 2

+ CeTξ

s. t. D(Xw + be) + ξ ≥ e, ξ ≥ 0.

Decision function: f (x) = sgn(wTx + b)

Class +1

Margin

W

Class -1

WTx+b=0

WTx+b=1

Misclassified

points

WTx+b=-1

Optimal

Separating

Hyperplane

Support vectors

SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 4 OF 48

• INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

Nonlinear SVM: Kernel Trick

Φ : x ∈ RN 7→ Φ(x) ∈ RN′ ,N′ > N. Kernel function Kij = K(xi, xj) = Φ(xi)

TΦ(xj).

Ф

Input

Space

Feature

Space

the decision function is f (x) = sgn( ∑m

i=1 αidiK(xi, x) + b). RBF K(xi, x) = exp(−γ‖xi − x‖2), polynomial K(xi, x) = (xTi x + 1)

n.

SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 5 OF 48

• INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

Arbitrary-Norm Separating Hyperplane

Dual norm

For a norm ‖· ‖ on R, the dual norm ‖x‖′ := max ‖y‖=1

xTy.

Example: for p, q ∈ [1,∞], 1/p + 1/q = 1, the p-norm and q-norm are dual.

Theorem (Mangasarian, 1998)

Let q ∈ Rn be any point not on the plane P := {x|wTx + b = 0}, 0 6= w ∈ Rn, b ∈ R. Then the distance between q and p(q) is given by:

‖q− p(q)‖ = |w Tq + b| ‖w‖′

.

SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 6 OF 48

• INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

Sparse SVMs

Corollary

‖q− p(q)‖∞ = |wTq + b|/‖w‖1

(where ‖x‖1 = ∑n

i=1 |xi| and ‖x‖∞ = maxi {|xi|})

If the `∞-norm is used to measure the distance between the planes, then the margin is given by 2/‖w‖1, which yields the following sparse SVM (SSVM):

min w,b,ξ

‖w‖1 + CeTξ

s. t. D(Xw + be) + ξ ≥ e, ξ ≥ 0.

SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 7 OF 48

• INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

Sparse SVMs

SSVM⇒ LP (with ‖w‖1 = w+ + w− and w = w+ − w−): min

w+,w−,b,ξ eT(w+ + w−) + CeTξ

s. t. D(X(w+ − w−) + be) + ξ ≥ e, w+,w−, ξ ≥ 0.

Sparsity of `1-norm:

−3 −2 −1 0 1 2 3 4

−1.5

−1

−0.5

0

0.5

1

1.5

2

x1

x2

Class −1 Class +1 2−norm hyperplanes 1−norm hyperplanes

−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

w1

w 2

feasible set 1−norm locus 2−norm locus solution to 1−norm SVM solution to 2−norm SVM

SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 8 OF 48

• INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

Hyperspectral Imagery (HSI)

Hyperspectral sensors generate imagery in the electromagnetic spectrum, capturing aspects that are imperceptible to the human eye.

The radiance of materials is measured within each pixel area at a very large number of contiguous spectral wavelength bands.

Spatial and spectral information is contained in data cubes.

Each pixel is a vector x ∈ Rn .

Z,bands

X,columns of pixels

Y, rows

of pixels

20 40 60 80 100 120 140 160 180 200 220

2000

3000

4000

5000

6000

7000

Band index

S pe

ct ra

ia nc

e

Alfalfa Corn−notill Corn−min Corn Grass−Pasture Grass−Trees Grass−PastureMowed Hay−windrowed Oats Soybeans−notill Soybeans−min Soybeans−clean Wheat Woods Bldg−Grass−Trees−Drives Stone−steel Towers

SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 9 OF 48

• INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

Hyperspectral Imagery (HSI)

Band selection identify a subset of bands that contain the most discriminatory information→ use them for further analysis

Methods 1 Filters:

all bands→ filter→ band subset→ predictor 2 Wrappers:

all bands→ space of band subsets→ predictor (wrapper)→ band subset

3 Embedded algorithms: all bands→ predictor→ band subset

SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 10 OF 48

• INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

Band Selection via SSVMs (Collaborators: M. Kirby and C. Gittins)

A linear SSVM: basic model for band selection. We solve it by the primal dual interior point method. This allows one to monitor the variation of the primal and dual variables simultaneously.

A weight ratio criterion for embedded band selection: allows to easily distinguish the non-zero weights from the zero weights.

The bagging (Bootstrap AGGregatING) approach is employed to enhance the robustness of SSVMs.

We extend the binary band selection to the multiclass case.

The SSVM algorithm is an effective technique for embedded band selection⇒ high accuracies in numerical experiments.

SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 11 OF 48

• INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

Recall: Sparse Linear SVMs

Training data xi ∈ Rn with class labels di ∈ {−1,+1}, i = 1, . . . ,m; D = diag(di) and X is the m× n data matrix. Separating hyperplane P = {x : wTx + b = 0}, w ∈ Rn is normal to P. Points on wTx + b = ±1 are support vectors. The optimal P has the largest margin 2/‖w‖1.

SSVM:

min w,b,ξ

‖w‖1 + CeTξ

s. t. D(Xw + be) + ξ ≥ e, ξ ≥ 0.

Decision function: f (x) = sgn(wTx + b)

Class +1

Margin

W

Class -1

WTx+b=0

WTx+b=1

Misclassified

points

WTx+b=-1

Optimal

Separating

Hyperplane

Support vectors

SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 12 OF 48

• INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

Sparsity in w

Comparison of weights for sparse SVM and standard SVM models using two classes of a hyperspectral data set.

0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 −1.5

−1

−0.5

0

0.5

1

1.5 x 10

−3

Wavelength (µm)

W ei

gh ts

Sparse SVM weights

Standard SVM weights

Weight ratio criterion The resulting weights of the model w1,w2, . . . ,wl are ordered s.t.:

|wi1 | ≥ |wi2 | ≥ · · · ≥ |wil |.

The key feature of this sparse approach is that

|wik | |wik+1 |

= O(1)

save for where the weights transition to zero:

|wik∗ | |wik∗+1|

= O(10M).

SOFYA CHEPUSHTAN