Feature Selection

21
Dr. Gheith Abandah 1

description

Feature Selection. Dr. Gheith Abandah. Definition. Feature selection is typically a search problem for finding an optimal or suboptimal subset of m features out of original M features. Benefits: For excluding irrelevant and redundant features, - PowerPoint PPT Presentation

Transcript of Feature Selection

Page 1: Feature Selection

Dr. Gheith Abandah

1

Page 2: Feature Selection

Feature selection is typically a search problem for finding an optimal or suboptimal subset of m features out of original M features.

Benefits:◦ For excluding irrelevant and redundant features,◦ it allows reducing system complexity and

processing time,◦ and often improves the recognition accuracy.

For large number of features, exhaustive search for best subset out of 2M possible subsets is infeasible.

2

Page 3: Feature Selection

Feature subset selection is applied on a set of feature values ijkx ; Ni ,,2,1 ;

Cj ,,2,1 ; and Mk ,,2,1 , where ijkx is the ith sample of the jth class of

the kth feature. Therefore, the average of the kth feature for letter form j is

N

iijkjk x

Nx

1

1.

And the overall average of the kth feature is

C

jjkk x

Cx

1

1.

3

Page 4: Feature Selection

Generally be classified according to the criterion function used in searching for good features.1. Wrapper algorithm: the performance of the

classifier is used to evaluate the feature subsets.

2. Filter algorithm: some feature evaluation function is used rather than optimizing the classifier’s performance.

Wrapper methods are usually slower than filter methods but offer better performance.

4

Page 5: Feature Selection

Select best individual features. A feature evaluation function is used to rank individual features, then the highest ranked m features are selected.

Although these methods can exclude irrelevant features, they often include redundant features.

“The m best features are not the best m features”

5

Page 6: Feature Selection

Examples:1.Scatter criterion 2.Symmetric uncertainty

6

Page 7: Feature Selection

Select the features that have highest values of the scatter criterion kJ , which is a

ratio of the mixture scatter to the within-class scatter. The within-class scatter of

the kth feature is

C

jjkjkw SPS

1, ,

where Sjk is the variance of class j , and Pj is the priori probability of this class

and found by:

N

ijkijkjk xx

NS

1

2)(1

and C

Pj

1 .

7

Page 8: Feature Selection

The between-class scatter is the variance of the class centers with respect to the

global center and is found by

C

jkjkjkb xxPS

1

2, )( .

And the mixture scatter is the sum of the within and between-class scatters, and

equals the variance of all values with respect to the global center.

C

j

N

ikijkkbkwkm xx

CNSSS

1 1

2,,, )(

1

8

Page 9: Feature Selection

The scatter criterion Jk of the kth feature is

kw

kmk S

SJ

,

, .

Higher value of this ratio indicates that the feature has high ability in separating

the various classes into distinct clusters.

9

Page 10: Feature Selection

10

First normalize the feature values for zero mean and unit variance by

k

kijkijk

xxx

ˆ ,

C

j

N

ikijkk xx

CN 1 1

22 )(1 .

Then the normalized values of continuous features are discretized into L finite

levels to facilitate finding probabilities. The corresponding discrete values are

ijkx~ . The mutual information of the kth feature is

L

l

C

j jljk

jljkjljkk PxP

xPxPI

1 12 )()~(

),~(log),~(),(

ωx

Page 11: Feature Selection

11

The symmetric uncertainty (SU) is derived from the mutual information by

normalizing it to the entropies of the feature values and target classes.

)()(

),(2),(

ωx

ωxωx

HH

ISU

k

kk ,

where the entropy of variable X is found by )(log)()( 2 ii

i xPxPXH .

Page 12: Feature Selection

Sequential < O(M2)◦ Forward selection, e.g.,

Fast correlation-based filter (FCBF) Minimal-redundancy-maximal-relevance

(mRMR) ◦ Backward selection◦ Bidirectional

Random◦ Genetic algorithm, e.g.,

Multi-objective genetic algorithms (MOGA)

12

Page 13: Feature Selection

13

Selects a subset of relevant features and exclude redundant features.

Uses the symmetric uncertainty ),( ωxkSU to estimate the relevance of

feature k to the target classes.

Uses the symmetric uncertainty between two features k and o ),( okSU xx

to approximate the redundancy between the two features.

Page 14: Feature Selection

14

Grows a subset of predominant features by adding the relevant features to

the empty set in descending ),( ωxkSU order.

Whenever feature k is added, FCBF excludes from consideration for

addition to the subset all remaining redundant features o that have

),(),( ωxxx ook SUSU .

In other words, it excludes all features that their respective correlation

with already selected features is larger than or equals their correlation with

the target classes.

Page 15: Feature Selection

15

For the complete set of features X, the subset S of m features that has the maximal

relevance criterion is the subset that satisfies the maximal mean value of all

mutual information values between individual features ix and class ω .

S

i

i

Im

DSDx

ωxω ),(1

),,(max

Page 16: Feature Selection

16

The subset S of m features that has the minimal redundancy criterion is the subset

that satisfies the minimal mean value of all mutual information values between all

pairs of features ix and jx .

S

ji

ji

Im

RSRxx

xx,

2),(

1),(min

Page 17: Feature Selection

17

In the mRMR algorithm, the subset S of m best features is grown iteratively using

forward search algorithm. The following criterion is used to add the jx feature to

the previous subset of 1m features:

1

1

),(1

1),(max

mimj S

jijSX

Im

Ix

xxxωx

Page 18: Feature Selection

Use NSGA to search for optimal set of solutions with two objectives:1. Minimize the number of features used in

classification.2. Minimize the classification error.

18

Page 19: Feature Selection

19

Page 20: Feature Selection

20

Page 21: Feature Selection

21