Data Visualization and Feature Selection: New Algorithms for Nongaussian Data Howard Hua Yang and...

Post on 23-Dec-2015

212 views 0 download

Transcript of Data Visualization and Feature Selection: New Algorithms for Nongaussian Data Howard Hua Yang and...

Data Visualization and Feature Selection: New Algorithms for

Nongaussian Data

Howard Hua Yang and John MoodyNIPS’99

Contents

Data visualizationGood 2-D projections for high dimensional data interpretation

Feature selectionEliminate redundancy

Joint mutual informationICA

Introduction

Visualization of input data and feature selection are intimately related.Input variable selection is the most important step in the model selection process.

Model-independent approaches to select input variables before model specification.Data visualization is very important for human to understand the structural relation among variables in a system.

Joint mutual information for input/feature selectionMutual information

Kullback-Leibler divergence

Joint mutual information

))()(||),(();( ypxpyxpKYXI iii

x xq

xpxpxqxpK

)()(

log)())(||)((

))(),...,(||),,...,(();,...,( ypxxpyxxpKYXXI kikiki

Conditional MI

When

Use joint mutual information instead of the mutual information to select inputs for a neural network classifier and for data visualization.

);,( YXXI ji

);( YXI i

0),...,|;();,...,();,,...,(X 111111 nnnnn XXYXIYXXIYXXI

)|;()|;();,();,( 13123121 XYXIXYXIYXXIYXXI

kj xx

kjkjikjkji xxypxxyxpKxxpXXYXI,

)),|(),|,((),(),|;(

);();();( 321 YXIYXIYXI

Data visualization methods

Supervised methods based on JMI cf) CCA

Unsupervised methods based on ICA cf) PCA

Efficient method for JMI

);,(maxarg ),( YXXI jiji

)|;();();,( ijiji XYXIYXIYXXI

Application to Signal Visualization and

ClassificationJMI and visualization of radar pulse patterns

Radar pattern 15-dimensional vector, 3 classes

Compute JMIs, select inputs

Radar pulse classification

7 hidden unitsExperiments

all inputs vs. 4 selected inputs4 inputs with the largest JMI vs. randomly selected 4 inputs

ConclusionsAdvantage of single JMI

Can distinguish inputs when all of them have the sameCan eliminate the redundancy in the inputs when one input is a function of other inputs