A Combinatorial Fusion Method for Feature Mining Ye Tian, Gary Weiss, D. Frank Hsu, Qiang Ma Fordham...

22
A Combinatorial A Combinatorial Fusion Method for Fusion Method for Feature Mining Feature Mining Ye Tian, Gary Weiss, D. Frank Hsu, Ye Tian, Gary Weiss, D. Frank Hsu, Qiang Ma Qiang Ma Fordham University Fordham University Presented by Gary Weiss Presented by Gary Weiss
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of A Combinatorial Fusion Method for Feature Mining Ye Tian, Gary Weiss, D. Frank Hsu, Qiang Ma Fordham...

A Combinatorial Fusion A Combinatorial Fusion Method for Feature MiningMethod for Feature Mining

Ye Tian, Gary Weiss, D. Frank Hsu, Qiang MaYe Tian, Gary Weiss, D. Frank Hsu, Qiang Ma

Fordham UniversityFordham University

Presented by Gary WeissPresented by Gary Weiss

22

IntroductionIntroduction

• Feature construction/engineering often a critical step in the data mining process– Can be very time-consuming and may require a lot of

manual effort

• Our approach is to use a combinatorial method to automatically construct new features– We refer to this as “feature fusion”– Geared toward helping to predict rare classes– For now it is restricted to numerical features, but can be

extended to other features

33

How does this relate to MMIS?How does this relate to MMIS?

• One MMIS category is local pattern analysis– How to efficiently identify quality knowledge

from a single data source– Listed data preparation and selection as

subtopics and also mentioned fusion

• We acknowledge that this work probably is not what most people think of as MMIS

44

How can we view this work as MMIS?How can we view this work as MMIS?

• Think of each feature as piece of information– Our fusion approach integrates these pieces

• Fusion itself is a proper topic for MMIS since it can also be used with multiple info sources– The fusion method we employ does not really care if the

information (i.e., features) are from a single source

• As complexity of constructed features increases, each can be viewed as a classifier– Each fused feature is an information source– This view is bolstered by other work on data fusion that

using ensembles to combine each fused feature

55

Description of the MethodDescription of the Method

1. A data set is a collection of records where each feature has a score

– We assume numerical features

2. We then replace scores by ranks– Ordering of ranks determined by whether larger or

small scores better predict class

3. Compute performance of each feature

4. Compute performance of feature combinations

5. Decide which combinations to evaluate/use

66

Step 1: A data setStep 1: A data set

F1 F2 F3 F4 F5 Class

A 1 4 3 2 8 1

B 3 3 5 5 4 0

C 5 5 2 6 7 1

D 7 6 15 3 2 0

E 11 13 16 7 14 0

F 15 16 4 13 11 0

G 9 7 14 1 18 1

H 17 15 9 8 3 0

77

Step 2: Scores replaced by RanksStep 2: Scores replaced by Ranks

  F1 F2 F3 F4 F5

A 1 2 2 2 5

B 2 1 4 4 3

C 3 3 1 5 4

D 4 4 7 3 1

E 6 6 8 6 7

F 7 8 3 8 6

G 5 5 6 1 8

H 8 7 5 7 2

88

Step 3: Compute Feature PerformanceStep 3: Compute Feature Performance

• Performance measures how well feature predicts minority class

• We sort rows by feature rank and measure performance on top n%, where n% belong to minority class

• In this case we evaluate on top 3 rows. Since 2 of 3 are minority (class=1), performance = .66

F2 Rank Class

B 1 0

A 2 1

C 3 1

D 4 0

G 5 1

E 6 0

H 7 0

F 8 0

99

Step 3 continuedStep 3 continued

Feature Performance

F1 0.67

F2 0.67

F3 0.67

F4 0.67

F5 0.00

1010

Step 4: Compute Performance of Step 4: Compute Performance of Feature CombinationsFeature Combinations

• Let F6 be fused F1F2F3F4F5

• Rank combination function is average of ranks

• Compute rank of F6 for each record

• Compute performance of F6 as in step 3

  F6Rank

F6Rank

A 2.4 E 6.6

B 2.8 F 6.4

C 3.2 G 5.0

D 3.8 H 5.8

1111

Step 5: What Combinations to Use? Step 5: What Combinations to Use?

• Given n features there are 2n – 1 possible combinations– C(n,1) + C(n,2) … C(n.n)– This “fully exhaustive” fusion strategy is practical for

many values of n

• We try other strategies in case not feasible– k-exhaustive strategy selects k best features and tries

all combinations– k-fusion strategy uses all n features but fuses at most

k features at once

1212

Combinatorial Fusion TableCombinatorial Fusion Table

NumberFeatures

k-fusion for values of k shown below

1 2 3 4 5 6 7 8 9 10

1 1

2 2 3

3 3 6 7

4 4 10 14 15

5 5 15 25 30 31

6 6 21 41 56 62 63

7 7 28 63 98 119 126 127

8 8 36 92 162 218 246 254 255

9 9 45 129 255 381 465 501 510 511

10 10 55 175 385 637 847 967 1012 1022 1023

1313

Combinatorial Fusion AlgorithmCombinatorial Fusion Algorithm

• Combinatorial strategy generates features– Performance metric determines which are best

• Used to determine which k features for k-fusion• Also used to determine order of features to add

• We add a feature if it leads to a statistically significant improvement (p ≤ .10)– As measured on validation data– This limits the number of features– But requires a lot of computation

1414

Example Run of AlgorithmExample Run of Algorithm

AUCp-valu

e

Feature

valid test(+ means

added)

0.670

0.682 -- {F1,F2,F3,F4,F5}

0.766

0.757 0.001 +F3F4

0.731 F1F2

0.771

0.774 0.063 +F1F3

1515

Description of ExperimentsDescription of Experiments

• We use Weka’s DT, 1-NN, and Naïve Bayes methods

• Analyze performance on 10 data sets– With and without fused features– Focus on AUC as the main metric

• More appropriate than accuracy especially with skewed data

• Use 3 combinatorial fusion strategies– 2-fusion, 3-fusion, and 6-exhaustive

1616

ResultsResults

Strategy AUC W-L-D AUC W-L-D AUC W-L-D

2-fusion 0.009 5-1-4 0.105 7-0-3 0.016 5-4-1

3-fusion 0.009 6-4-0 0.103 8-1-1 0.023 5-4-1

6-exhaustive 0.003 3-6-1 0.115 9-0-1 0.027 6-1-3

Bayes DT 1-NN

Strategy AUC W-L-D AUC W-L-D AUC W-L-D

2-fusion 0.004 1-1-2 0.195 4-0-0 0.065 4-0-0

3-fusion 0.012 2-2-0 0.185 3-1-0 0.059 3-0-1

6-exhaustive 0.006 1-3-0 0.194 4-0-0 0.062 4-0-0

Bayes DT 1-NN

Summary Results over all 10 Data Sets

Results over 4 Most Skewed Data Sets (< 10% Minority)

1717

Discussion of ResultsDiscussion of Results

• No one of the 3 fusion schemes is clearly best• The methods seem to help, but the biggest

improvement is clearly with the DT method– May be explained by traditional DT methods having

limited expressive power• They can only consider 1 feature at a time• Can never perfectly learn simple concepts like F1+F2 > 10,

but can with feature-fusion

– Bigger improvement for highly skewed data sets• Identifying rare cases is difficult and may require looking at

many features in parallel

1818

Future WorkFuture Work

• More comprehensive experiments– More data sets, more skewed data sets, more

combinatorial fusion strategies

• Use of heuristics to more intelligently choose fused features– Performance measure now used only to order– Use of diversity measures– Avoid building classifier to determine which fused

features to add

• Handle non-numerical features

1919

ConclusionConclusion

• Showed how a method from information fusion can be applied to feature construction

• Results encouraging but more study needed

• Extending the method should lead to further improvements

2020

Questions?Questions?

2121

Detailed Results: AccuracyDetailed Results: Accuracy

w/o w Diff w/o w Diff w/o w Diff

bio 98.8 98.8 0.0 99.4 99.4 0.0 99.2 99.2 0.0

letter-a 98.4 98.4 0.0 98.6 98.6 0.0 98.9 98.9 0.0

income 92.0 92.0 0.0 94.5 94.5 0.0 92.4 92.4 0.0

stock 80.4 80.4 0.0 90.3 90.3 0.0 86.3 86.3 0.0

hepatitis 84.0 84.0 0.0 86.2 80.7 -5.6 89.3 89.3 0.0

physics 68.9 75.3 6.5 75.1 75.2 0.1 62.6 75.0 12.4

german 73.2 73.2 0.0 69.5 73.0 3.5 68.1 71.6 3.5

crx 70.1 70.1 0.0 60.3 75.1 14.9 60.4 73.6 13.2

bands 67.0 67.0 0.0 61.4 61.4 0.0 65.3 65.3 0.0

boa1 55.0 57.0 2.0 51.0 56.9 6.0 52.6 57.5 5.0

1-NNDecision TreesBayesDataset

2222

w/o w Diff w/o w Diff w/o w Diff

2-F .923 -.020 .752 .256 .663 .1643-F .954 .010 .742 .247 .651 .1526-EX .926 -.017 .759 .264 .663 .1642-F .963 .000 .943 .021 .961 .0243-F .960 -.003 .919 -.003 .937 .0006-EX .962 -.001 .937 .014 .961 .0242-F .901 .000 .736 .241 .612 .0203-F .897 -.004 .734 .239 .621 .0286-EX .900 -.001 .739 .245 .612 .0202-F .762 .037 .755 .260 .575 .0513-F .767 .043 .751 .255 .578 .0546-EX .769 .044 .747 .252 .564 .0402-F .869 .005 .755 .000 .803 -.0163-F .868 .004 .759 .004 .826 .0076-EX .864 .000 .760 .005 .821 .0022-F .498 .000 .499 .000 .504 .0003-F .506 .008 .499 .000 .495 -.0086-EX .506 .008 .499 .000 .504 .0002-F .751 .011 .609 .118 .607 -.0013-F .723 -.017 .606 .115 .593 -.0156-EX .736 -.004 .654 .162 .609 .0002-F .762 .000 .646 .000 .653 .0143-F .779 .017 .670 .024 .673 .0346-EX .755 -.007 .722 .076 .667 .0282-F .779 .029 .611 .108 .559 -.0963-F .747 -.003 .603 .099 .644 -.0126-EX .744 -.006 .580 .076 .655 .0002-F .596 .024 .538 .041 .509 -.0053-F .602 .031 .548 .050 .509 -.0066-EX .589 .018 .553 .056 .509 -.005

physics .498 .499 .504

bio .943 .496 .499

income .901 .494 .593

boa1 .571 .497 .515

letter-a .963 .922 .937

bands .750 .504 .655

crx .762 .646 .639

.725 .496 .524

hepatitis .864 .755 .819

1-NN

german .740 .492 .609

Dataset Strat.Bayes Decision Trees

stock

Detailed Results: AUC