Hyperspectral Image Reduction

26
1 EDST - UL Ecole Doctorale des Sciences et de Technologie Université Libanaise “Band Selection for Dimension Reduction in Hyper Spectral Image Using Integrated Information Gain and Principal Components Analysis Technique ” Submitted to Dr. Jihane KHODER January 30 th , 2016 Sarah Hussein Master TIS | TIS04 Course Summary of the Article: Kitti Koonsanit, Chuleerat Jaruskulchai, Apisit Eiumnoh - Vol. 2

Transcript of Hyperspectral Image Reduction

1

EDST - UL Ecole Doctorale des Sciences

et de Technologie Université Libanaise

“Band Selection for Dimension Reduction in Hyper

Spectral Image Using Integrated Information Gain and

Principal Components Analysis Technique ”

Submitted to Dr. Jihane KHODER January 30th, 2016

Sarah Hussein Master TIS | TIS04 Course

Summary of the Article:

Kitti Koonsanit, Chuleerat Jaruskulchai, Apisit Eiumnoh - Vol. 2

O U T L I N E S

INTRODUCTION METHODOLOGIES

RESULTS

1� 2� 3� 4�CONCLUSIONS

2 Sarah Hussein Master TIS | TIS04 Course

INTRODUCTION

¨  Context ¨  Objectives 1�

3 Sarah Hussein Master TIS | TIS04 Course

I N T R O D U C T I O NContext

q  Some applications requires fast data processing à Satellite application

q  Dimension Reduction of hyper spectral remote sensing data is a need in such

applications

q  Principle Component Analysis (PCA) is the most popular dimension reduction

technique for remotely sensed data

q  However, data volumes are increasing & high computational demands of PCA

are required

q  Need of a fast and efficient algorithm for PCA

4 Sarah Hussein Master TIS | TIS04 Course

I N T R O D U C T I O NObjectives

“ An Implementation of Information Gain with PCA

dimension reduction of hyper spectral data ” &

“Comparison of the effects of integrated IG and PCA method of band selection on the final clustering results

for hyper spectral imaging applications”

5 Sarah Hussein Master TIS | TIS04 Course

METHODOLOGIES

¨  Materials ¨  Dimensionality Reduction ¨  Background on PCA ¨  Information Gain IG ¨  PCA - IG

2�

6 Sarah Hussein Master TIS | TIS04 Course

M E T H O D O L O G I E SMaterials

q  Hyper spectral data was obtained from Small Multi - Mission Satellite (SMMS)

q  Focus on data taken in June 2010 in

Amnat Charoen province, Thailand

¨  200 x 200 pixels ¨  115 bands ¨  Total size of 8.86 Mbytes

q  Unsupervised classification method using simple K-mean from the Weka software package

7 Sarah Hussein Master TIS | TIS04 Course

M E T H O D O L O G I E SDimensionality Reduction

q  Hyper spectral images provide abundant information about bands

q  Their high dimensionality substantially increases the computational burden

q  Reducing the redundancy of the spectral and spatial information without losing

any valuable details is crucial

q  Therefore, the conventional processing methods require dimension reduction

q  It is a transformation from a high order dimension to a low order which

eliminates data redundancy

q  Dimension reduction is a transformation from a high order dimension to a low

order dimension

q  Principle Component Analysis (PCA) is the most popular dimension reduction

technique for remotely sensed data

q  However, data volumes are increasing & high computational demands of PCA

are required à need a fast and efficient algorithm for PCA

8 Sarah Hussein Master TIS | TIS04 Course

M E T H O D O L O G I E SDimensionality Reduction (Cont.)

q  The collected hyper spectral image data are in the form of three dimensional

image cube

¨  Two spatial dimensions (horizontal and vertical)

¨  One spectral dimension (from SMMS spectrum 1 to spectrum 115)

q  Reducing the dimensionality and being convenient for the subsequent

processing steps is a must

q  PCA reduction technique applied !

9 Sarah Hussein Master TIS | TIS04 Course

M E T H O D O L O G I E SBackground on PCA

q  PCA is a widely used dimension reduction technique in data analysis

q  It is the optimal linear scheme for reducing a set of high dimensional vectors into

a set of lower dimensional vectors

q  Two methods are applicable in PCA: the matrix method & the data method

q  To compute PCA, the general 4 steps are followed

1.  Find mean vector in x-space

2.  Assemble covariance matrix in x-space

3.  Compute eigenvalues and corresponding eigenvectors

4.  Form the components in y-space

10

Sarah Hussein Master TIS | TIS04 Course

M E T H O D O L O G I E SBackground on PCA (Cont.)

q  Only the first few components contain the

needed information

q  Intrinsic dimensionality is the number of

components with most of information

q  Each data image may have a different

intrinsic dimensionality

q  PCA maximizes the covariance and reduces

redundancy to achieve lower dimensionality

11

Sarah Hussein Master TIS | TIS04 Course

M E T H O D O L O G I E SInformation Gain

q  Information Gain (IG) is a measure of dependence between the feature and the

class label

q  It is one of the most popular feature selection techniques as it is easy to

compute and simple to interpret

q  Information gain of a feature or band X and the class labels Y is calculated as

¨  Entropy (H) : uncertainty associated with a random variable

¨  H(X) : entropy of band X

¨  H(X|Y) : entropy of band X after observing Class Y

12 Sarah Hussein Master TIS | TIS04 Course

M E T H O D O L O G I E SInformation Gain (Cont.)

q  H(X) and H(X|Y) are calculated through the equations:

q  The maximum value of information gain is 1

q  A feature with a high information gain is relevant

q  Information gain is evaluated independently for each feature

13

Sarah Hussein Master TIS | TIS04 Course

M E T H O D O L O G I E SInformation Gain (Cont.)

q  Information Gain does not eliminate redundant features

q  Result from band selection by IG methods suspected to be notoriously

redundant : much data but not much information

14 Sarah Hussein Master TIS | TIS04 Course

M E T H O D O L O G I E SPCA - IG

q  IG methods integrated with PCA (PCA-IG) are proposed to transform into a

reduced representation set of features à Feature extraction

q  The optimal bands are those that maximally preserves features that separate

different object classes

q  PCA method does not guarantee the preservation of classification information

among different classes

q  IG value preserve features that separate different object classes

15 Sarah Hussein Master TIS | TIS04 Course

M E T H O D O L O G I E SPCA – IG (Cont.)

q  At the band selection stage, PCA-IG

method was integrated as follow:

q  X is a band member of band selected if

X is a band member of PCA AND X is a

band member of IG

16 Sarah Hussein Master TIS | TIS04 Course

M E T H O D O L O G I E SPCA – IG (Cont.)

Original Satellite Image

Spectral Attributes

PCA Method IG Method

PCA of Band AND

IG of Band

Band Selection

17 Sarah Hussein Master TIS | TIS04 Course

RESULTS

¨  Experiment 1 ¨  Experiment 2

3�

18 Sarah Hussein Master TIS | TIS04 Course

R E S U L T SExperiment 1

ORIGINAL 115 BAND

10 BAND - PCA 10 BAND – PCA-IG

19 Sarah Hussein Master TIS | TIS04 Course

R E S U L T SExperiment 1 (Cont.)

4 Cluster 115 Band

Original Image

4 Cluster 10 Band

Reduced by PCA

4 Cluster 10 Band

Reduced By PCA-IG

Cluster 1 34% 35% 34%

Cluster 2 3% 2% 3%

Cluster 3 30% 31% 30%

Cluster 4 31% 32% 31%

All Clusters 100% 100% 100%

The percent clustering for various classes for experiment 1:

20 Sarah Hussein Master TIS | TIS04 Course

R E S U L T SExperiment 2

q  Band selection on the statlog (Landsat satellite) data set from UCI data- bases

q  The database consists of:

q  Multi-spectral values of pixels in 3x3 neighborhoods in a satellite image

q  Classification associated with the central pixel in each neighborhood

q  A frame of Landsat MSS imagery consists of 4 digital images of the same scene:

q  Two in the visible region (corresponding to green and red regions)

q  Two in the near infra-red

q  Each pixel is an 8-bit binary word

q  0 corresponding to black

q  255 to white

21 Sarah Hussein Master TIS | TIS04 Course

R E S U L T SExperiment 2 (Cont.)

q  The spatial resolution of a pixel is about 80m x 80m

q  Each image contains 2340X3380 such pixels

q  These data contain 6435 instances

q  Each instance consists of 36 band attributes

q  The proposed process was implemented on java environment

q  Tested on CPU 2.80 GHz Intel(R) Core two duo processor with 1 GB of RAM

22

Sarah Hussein Master TIS | TIS04 Course

R E S U L T SExperiment 2 (Cont.)

7 Cluster 36 band

Original image

7 Cluster 7 band

Reduced by PCA

7 Cluster 7 band

Reduced by PCA-IG

Cluster1 8% 10% 9%

Cluster2 20% 22% 21%

Cluster3 14% 12% 12%

Cluster4 9% 9% 9%

Cluster5 20% 23% 23%

Cluster6 16% 10% 14%

Cluster7 12% 14% 12%

All clusters 100% 100% 100%

The percent clustering for various classes for experiment 2:

23 Sarah Hussein Master TIS | TIS04 Course

CONCLUSIONS

¨  Brief Summary ¨  Perspectives 4�

24 Sarah Hussein Master TIS | TIS04 Course

C O N C L U S I O N SBrief Summary & Perspectives

q  A band selection technique using principal components analysis (PCA) and

information gain (IG) functional was proposed for hyper spectral image

reduction

q  The process was tested on satellite image data for unsupervised classification

q  The comparison of the effects of PCA-IG method on the clustering results for

hyper spectral imaging application shows no significant difference between this

technique’ and the PCA and the original image’s clusters

q  The outcome of this research will be used in further steps for analysis tools in

hyper spectral image processing

25 Sarah Hussein Master TIS | TIS04 Course

THANK YOU

26