Data Analysis Methods and Applications: Hyperspectral Band ... Hyperspectral Imagery (HSI)...

Click here to load reader

download Data Analysis Methods and Applications: Hyperspectral Band ... Hyperspectral Imagery (HSI) Hyperspectral

of 48

  • date post

    27-Aug-2020
  • Category

    Documents

  • view

    5
  • download

    0

Embed Size (px)

Transcript of Data Analysis Methods and Applications: Hyperspectral Band ... Hyperspectral Imagery (HSI)...

  • INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

    Data Analysis Methods and Applications: Hyperspectral Band Selection and Data

    Classification on Embedded Grassmannians

    Sofya Chepushtanova

    Department of Mathematics Colorado State University

    February 10, 2014

    SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 1 OF 48

  • INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

    Outline

    1. Introduction Motivation Sparse SVMs

    2. Hyperspectral Band Selection Hyperspectral Imagery (HSI) Algorithm Computational Results Future Work

    3. Classification of Data on Grassmannians Grassmannian Framework Algorithm Application to HSI Future Work

    4. Future Directions

    SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 2 OF 48

  • INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

    Motivation

    Application-driven research

    Algorithms for Threat Detection (ATD) program (launched in 2009): developing novel mathematical and statistical methods to extract meaningful information from large data streams

    Big data: massive, high-dimensional, complex

    Growing demand for geometric data analysis, classification, and dimension reduction models

    Dimension reduction - how? Feature extraction: transforms the data to a lower dimensional space, using manifold learning techniques Feature selection: identifies the relevant set of features while maintaining or improving the performance of a prediction model

    SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 3 OF 48

  • INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

    Support Vector Machines

    Training data xi ∈ Rn with class labels di ∈ {−1,+1}, i = 1, . . . ,m; D = diag(di) and X is the m× n data matrix. Separating hyperplane P = {x : wTx + b = 0}, w ∈ Rn is normal to P. Points on wTx + b = ±1 are support vectors. The optimal P has the largest margin 2/‖w‖2.

    SVM:

    min w,b,ξ

    ‖w‖22 2

    + CeTξ

    s. t. D(Xw + be) + ξ ≥ e, ξ ≥ 0.

    Decision function: f (x) = sgn(wTx + b)

    Class +1

    Margin

    W

    Class -1

    WTx+b=0

    WTx+b=1

    Misclassified

    points

    WTx+b=-1

    Optimal

    Separating

    Hyperplane

    Support vectors

    SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 4 OF 48

  • INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

    Nonlinear SVM: Kernel Trick

    Φ : x ∈ RN 7→ Φ(x) ∈ RN′ ,N′ > N. Kernel function Kij = K(xi, xj) = Φ(xi)

    TΦ(xj).

    Ф

    Input

    Space

    Feature

    Space

    the decision function is f (x) = sgn( ∑m

    i=1 αidiK(xi, x) + b). RBF K(xi, x) = exp(−γ‖xi − x‖2), polynomial K(xi, x) = (xTi x + 1)

    n.

    SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 5 OF 48

  • INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

    Arbitrary-Norm Separating Hyperplane

    Dual norm

    For a norm ‖· ‖ on R, the dual norm ‖x‖′ := max ‖y‖=1

    xTy.

    Example: for p, q ∈ [1,∞], 1/p + 1/q = 1, the p-norm and q-norm are dual.

    Theorem (Mangasarian, 1998)

    Let q ∈ Rn be any point not on the plane P := {x|wTx + b = 0}, 0 6= w ∈ Rn, b ∈ R. Then the distance between q and p(q) is given by:

    ‖q− p(q)‖ = |w Tq + b| ‖w‖′

    .

    SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 6 OF 48

  • INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

    Sparse SVMs

    Corollary

    ‖q− p(q)‖∞ = |wTq + b|/‖w‖1

    (where ‖x‖1 = ∑n

    i=1 |xi| and ‖x‖∞ = maxi {|xi|})

    If the `∞-norm is used to measure the distance between the planes, then the margin is given by 2/‖w‖1, which yields the following sparse SVM (SSVM):

    min w,b,ξ

    ‖w‖1 + CeTξ

    s. t. D(Xw + be) + ξ ≥ e, ξ ≥ 0.

    SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 7 OF 48

  • INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

    Sparse SVMs

    SSVM⇒ LP (with ‖w‖1 = w+ + w− and w = w+ − w−): min

    w+,w−,b,ξ eT(w+ + w−) + CeTξ

    s. t. D(X(w+ − w−) + be) + ξ ≥ e, w+,w−, ξ ≥ 0.

    Sparsity of `1-norm:

    −3 −2 −1 0 1 2 3 4

    −1.5

    −1

    −0.5

    0

    0.5

    1

    1.5

    2

    x1

    x2

    Class −1 Class +1 2−norm hyperplanes 1−norm hyperplanes

    −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

    −0.6

    −0.4

    −0.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    w1

    w 2

    feasible set 1−norm locus 2−norm locus solution to 1−norm SVM solution to 2−norm SVM

    SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 8 OF 48

  • INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

    Hyperspectral Imagery (HSI)

    Hyperspectral sensors generate imagery in the electromagnetic spectrum, capturing aspects that are imperceptible to the human eye.

    The radiance of materials is measured within each pixel area at a very large number of contiguous spectral wavelength bands.

    Spatial and spectral information is contained in data cubes.

    Each pixel is a vector x ∈ Rn .

    Z,bands

    X,columns of pixels

    Y, rows

    of pixels

    20 40 60 80 100 120 140 160 180 200 220

    2000

    3000

    4000

    5000

    6000

    7000

    Band index

    S pe

    ct ra

    l r ad

    ia nc

    e

    Alfalfa Corn−notill Corn−min Corn Grass−Pasture Grass−Trees Grass−PastureMowed Hay−windrowed Oats Soybeans−notill Soybeans−min Soybeans−clean Wheat Woods Bldg−Grass−Trees−Drives Stone−steel Towers

    SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 9 OF 48

  • INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

    Hyperspectral Imagery (HSI)

    Advantage: rich detailed radiance information Disadvantage: huge amount of data (more is not always better)

    Band selection identify a subset of bands that contain the most discriminatory information→ use them for further analysis

    Methods 1 Filters:

    all bands→ filter→ band subset→ predictor 2 Wrappers:

    all bands→ space of band subsets→ predictor (wrapper)→ band subset

    3 Embedded algorithms: all bands→ predictor→ band subset

    SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 10 OF 48

  • INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

    Band Selection via SSVMs (Collaborators: M. Kirby and C. Gittins)

    A linear SSVM: basic model for band selection. We solve it by the primal dual interior point method. This allows one to monitor the variation of the primal and dual variables simultaneously.

    A weight ratio criterion for embedded band selection: allows to easily distinguish the non-zero weights from the zero weights.

    The bagging (Bootstrap AGGregatING) approach is employed to enhance the robustness of SSVMs.

    We extend the binary band selection to the multiclass case.

    The SSVM algorithm is an effective technique for embedded band selection⇒ high accuracies in numerical experiments.

    SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 11 OF 48

  • INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

    Recall: Sparse Linear SVMs

    Training data xi ∈ Rn with class labels di ∈ {−1,+1}, i = 1, . . . ,m; D = diag(di) and X is the m× n data matrix. Separating hyperplane P = {x : wTx + b = 0}, w ∈ Rn is normal to P. Points on wTx + b = ±1 are support vectors. The optimal P has the largest margin 2/‖w‖1.

    SSVM:

    min w,b,ξ

    ‖w‖1 + CeTξ

    s. t. D(Xw + be) + ξ ≥ e, ξ ≥ 0.

    Decision function: f (x) = sgn(wTx + b)

    Class +1

    Margin

    W

    Class -1

    WTx+b=0

    WTx+b=1

    Misclassified

    points

    WTx+b=-1

    Optimal

    Separating

    Hyperplane

    Support vectors

    SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 12 OF 48

  • INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

    Sparsity in w

    Comparison of weights for sparse SVM and standard SVM models using two classes of a hyperspectral data set.

    0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 −1.5

    −1

    −0.5

    0

    0.5

    1

    1.5 x 10

    −3

    Wavelength (µm)

    W ei

    gh ts

    Sparse SVM weights

    Standard SVM weights

    Weight ratio criterion The resulting weights of the model w1,w2, . . . ,wl are ordered s.t.:

    |wi1 | ≥ |wi2 | ≥ · · · ≥ |wil |.

    The key feature of this sparse approach is that

    |wik | |wik+1 |

    = O(1)

    save for where the weights transition to zero:

    |wik∗ | |wik∗+1|

    = O(10M).

    SOFYA CHEPUSHTAN