The impact of local data ... - Semantic Scholar

The impact of local data

characteristics on learning

from imbalanced data

Jerzy Stefanowski

Institute of Computing Sciences,

Poznań University of Technology

Poland

JRS, Madrid, June 12, 2014

Outline of my talk

Local characteristics of imbalanced data

Data difficulty factors

Identification method

Informed pre-processing Rule induction

Extensions of bagging - if time?

Ack. of co-operation with

Krystyna Napierała Jerzy Błaszczyński Szymon Wilk

My master students (PUT)

Tomasz Maciejewski (LN-SMOTE)

Łukasz Idkowiak (Extensions of bagging)

Imbalanced data

Class imbalance (minority) class includes much smaller

number of examples than other (majority) classes

A minority class is often of primary interest

Classifiers are biased

Focused on more frequent classes,…

Better recognition of majority classes and difficulties to

classify new objects from the minority class

An information retrieval system for highly imbalanced data

(the minority class 1%) total accuracy 99.8%, but fails to

recognize the important class

++

++ ++

++

++

+++++

++++++ +

+ ++

+++ ++

+ +

+

+ – –

––

–

Evaluation issues

Performance for the minority class

Sensitivity and specificity,

ROC curve analysis + AUC

Others curves (PR), …

Aggregated measures:

Original

Predicted

+ -

+ TP FN

- FP TN

FNTP

TPySensitivit

FPTN

TNySpecificit

More about occurrence of class imbalance

Medical Problems – rare but dangerous illness

Technical Diagnostics - Helicopter Gearbox Fault

Monitoring

Document Filtering

Direct Marketing

Detection of Oil Spills

Detection of Fraudulent Telephone Calls

Increase of interest in ML i DM community

Known in applications

Special events AAAI’2000 Workshop, (org: R. Holte, N. Japkowicz, C.

Ling, S. Matwin) ICML’2000 Worshop, also on cost sensitive

(org. Dietterich T. et al.) ICML’2003 Workshop, (org.: N. Chawla, N. Japkowicz,

A. Kolcz) ECAI 2004 Workshop, (org.: Ferri C., Flach P., Orallo J.

Lachice. N.) ICML 2007 – Session on Learning from Imbalanced Data PAKDD 2009 – Workshop on Imbalanced Data ICMLA 2012 – Special Session (org. Japkowicz,

Stefanowski, Chawla)

Special issues: ACM SIG KDD Explorations Newsletter, editors: N.

Chawla, N. Japkowicz, A. Kolcz

Book: He H., Yungian, Ma (eds): Imbalanced Learning. Foundations, Algorithms and Applications. IEEE - Wiley, 2013.

Specialized methods

General categorization of approaches

Data level (preprocessing)

Algorithm level

Different methods Re-sampling or re-weighting

Modify inductive bias, search, evaluation criteria

New classification strategies

Hybrid algorithms

Transformation to „cost-sensitive learning” Ensemble approaches (boosting, bagging or cost versions …)

One-class-learning …

Some reviews He H., Yungian, Ma (eds): Imbalanced Learning. Foundations, Algorithms and Applications. IEEE -

Wiley, 2013.

Weiss G.M., Mining with rarity: a unifying framework. ACM Newsletter, 2004.

Chawla N., Data mining for imbalanced datasets: an overview. In The Data mining and knowledge discovery handbook, Springer 2005.

He H, Garcia, Mining imbalanced data. IEEE Trans. Data and Knowledge 2009.

More Research on Class Imbalance?

Although many methods in the last decade?

Heuristic approaches,…

Different performance, ….

Complex characteristics of imbalanced data

„Nature” of imbalanced data

Minority vs. majority classes only?

Focus on data difficulty factors and

characteristics of local distributions!

%5min N

%95 Nmaj

Related works on data difficulty factors

Studies on simulated, artificial data N.Japkowicz co-operators 2002-2004

V. Garcia, J. Sanchez, R. Mollineda 2007

Prati, Batista, Monard 2004, Stefanowski et al. 2010, 2013

V.Lopez, P.Herrera et al. 2012,2014

Imbalanced data distributions

Why is it difficult?

The nature of the problem with respect to data distributions

[Jo, Japkowicz 2004]

Defragmentation and small disjuncts

[Garcia, Sanchez, Mollineda 2007]

Overlapping and rare examples

Data difficulty factors

• Lack of data and global class imbalance

• Small disjuncts, difficult borderline, rare examples, minority outliers

Other more complex data

J. Stefanowski 2009

Factors (concept shape, minority sub-clusters, overlapping and rare examples vs. global imbalance ratio) + non-linear shapes

Standard classifiers

Decomposing more influential than imb. ratio

Overlapping and rare examples even more

More in:

JS: Overlapping, Rare Examples and Class Decomposition

in Learning Classifiers from Imbalanced Data, 2013

Paw and clover data stes

Data Difficulty Factors Types of Examples

Distinguish four types of minority examples:

SAFE

RARE

OUTLIERBORDER

ECOLI

BORDERLINE

Visualization of imbalanced data

CLEVELAND

RARE & OUTLIER

THYROID

SAFE

t-SN visualisation

t-Distributed Stochastic Neighbor Embedding (t-SNE)

Keeps probability that i is a neighbor of j / local distances

New Thyroid

Safe Border

Ecoli

Analyse the distribution in the local neighbourhood

K-NN (k=5, and higher)

Distance HVDM

Alternatives – kernel functions

5:0 4:1 3:2 2:3 1:4 0:5

SAFE OUTLIERRAREBORDER

Identification of examples – local approach

Details K.Napierała, J. Stefanowski: The influence of minority class distribution on

learning from imbalance data. HAIS, 2012.

Choosing distance for mixed attributes

HVDM – Heterogeneous Value Distance Metric

A component distance of nominal attributes value

difference metric by Stanfill and Waltz

Comparison to other heterogeneous distance measures

(HEOM, Gower, GEM and variants of Mahalonobis)

J.Laurikkala et. al. / Wilson, Martinez

Tuning k (5,7,9,..) – experimental studies

ma aaa yxdyxHDVM 1

2 ),(),(

Re-discovery of known distributions

Labeling Minority Examples UCI Data sets

Most data sets unsafe

Mixed types of minority

examples

Very unsafe distribution of

the minority examples

cleveland 51%

outliers, no safe ones

solar flare, balance scale

Majority class quite safe

yeast 98,5% S

ecoli 91,7% S

Experiences with k (7,9,..) or

kernels similar

categorizations of data

Experimental setup

Aim compare standard popular classifiers and then pre-

processing methods (specialized for class imbalanced) and

verify differences in performance with respect to the types of

the minority class in data

Classifiers SVM, ANN, k-NN, trees J4.8 and rules PART

Pre-processing random over-sampling, NCR, SMOTE and

SPIDER

Measures sensitivity, specificity, G-mean, F1

Design of experiments:

10-fold stratified cross validation,

22 UCI real world imbalanced datasets

Friedman Test

5.5•SVM

4.7•PART

4.5•RBF

4.1•J48

3.3•1NN

2.2•3NN

5.9•PART

5.5•J48

4.2•1NN

3.6•RBF

2.8•3NN

2.5•SVM

Rankings of classifiers in data categories

More: K.Napierała: Improving rule classifiers for imbalanced data. Ph.D Thesis, PUT, 2013.

K.Napierała, J. Stefanowski: The influence of minority class distribution on learning from imbalance data. HAIS, 2012

Safe / borderRare/outlier

1NN, J48: similar orders

RBF: RO slighlty better

B

R

O

Friedman Tests

4.3•NCR

3.8•SPIDER

3.4•SMOTE

2.0•RO

1.3•NONE

3.8

•SPIDER

•NCR

3.2•

SMOTE

2.8•RO

1.4•NONE

4.3•SMOTE

3.5• SPIDER

3.4•NCR

2.2•RO

1.5•NONE

Pre-processing with respect to labels of testing examples

Modeling local data characteristics

Lessons

Most data sets are unsafe type of examples

Outliers and rare case important part of the minority class

distributions

The global imbalance ratio and size of data are not so influential

as types of examples

Local data characteristics differentiate performance of standard

classifiers and pre-processing methods

The attempt to approximate their competence

Classifiers

S – all classifiers comparable

B - SVM -> trees/rules, RBF -> kNN

R/O – trees/rules 1NN >

Preprocessing

B: undersampling (NCR)

R: hybrid (SPIDER) > SMOTE

O: SMOTE

Modeling local data characteristics

Some open issues

Tune the size of neighbourhood depending on data set

Noise vs. unsafe examples

Outliers and rare empty sub-space

Looking for small disjuncts which methods

Different local „density” how to find and exploit

Lower vs. higher attribute dimensionality (hubs)

Multi-class imbalance

Evaluation issues

Now basic info on applying local data

characteristics in algorithms specialized

for class imbalances

1. Improving informed pre-processing methods

2. Local neighbourhood inside rule induction

3. Extensions of sampling in bagging

Pre-processing approaches

Transform original data distribution: Simple random sampling

„Over-sampling” – minority class

„Under-sampling”- majority class

Informed and focused transformation Filtering majority examples

• One-side-sampling (Kubat, Matwin) + Tomek Links

• NCR - Laurikkala’s edited nearest neighbor rule

Oversampling with synthetic examples• SMOTE Chawla et al.

• Borderline SMOTE, Safe Level, …

Hybrid ones• SPIDER

• SMOTE and undersampling filters

Modifications of ensembles

Transfor

mationimbalanced

data

new

dataset

He, Garcia, Learning from imbalanced data. IEEE Trans. DKE , 21(9), 2009.

Selective Preprocesing of Imbalanced Data SPIDER

J.Stefanowski, Sz.Wilk, ECMLPKDD workshop 2007

Hybrid approach limited filtering

and local copying of some minority

examples

Two phases

Identifying types of examples with k

neighborhood (HVDM)

For the majority class selective

removing difficult examples or

relabeling them

The minority class – re-sampling unsafe

examples

weak or strong amplification

• amplify by creating as many

copies as O-safe examples in the

k-neighborhood

Experimental evaluation (rules, trees, k-NN)

Sensitivity

0.00

0.20

0.40

0.60

0.80

1.00

acl

bre

ast-

cancer

bupa

cle

vela

nd

ecoli

haberm

an

hepati

tis

new

-thyro

id

pim

a

BaseLine SMOTE NCR Relabel Weak Amp Strong Amp

All informed methods outperform the baseline approach

NCR – the highest improvement of sensitivity (haberman 0.386, bupa

0.353) but deteriorates specificity

Relabeling or strong amplification – the second best approach

(7 of 9 sets) for sensitivity + good balance with specificity (G-mean)

More: J.Stefanowski, Sz.Wilk. Selective pre-processing of imbalanced data for

improving classification performance. DAWAK 2008

Specificity

0.00

0.20

0.40

0.60

0.80

1.00

acl

brea

st-c

ance

r

bupa

clev

elan

d

ecol

i

habe

rman

hepa

titi

s

new

-thy

roid

pim

a

Baseline SMOTE NCR Relabel Weak Amp Strong Amp

SMOTE - Synthetic Minority Oversampling Technique

N.Chawla, Hall, Kegelmeyer 2002

For each p from the minority class

Find its k-nearest neighbours also from the minority class• HVDM distance

Randomly select o of these neighbours(o - the amount of over-sampling desired)

Generate a synthetic example along the line between p and randomly selected example n

Can work with mixed attributes

o influential !

However ….

)( iiinew pnpx

SMOTE limitations

„Blind” over-generalization in the directions of majority neighbors class

Number of synthetic examples - o –a global parameter not easy to tune

Insufficient minority class with distant

small sub-part

All examples equally important

SMOTE Extensions

Combine with post-processing (filters)

Internal modifications

Borderline SMOTE

SafeLevel SMOTE

LN-SMOTE

…

LN – SMOTE / Maciejewski, Stefanowski IEEE CIDM 2011

Analyse neighbourhoods of seed p and its selected neighbour n

Safe level sl() no. of minority examples

Compare sl(p) and sl(n) generate new x closer to the safer local region

Another view on local safe levels (p n)

Generation of x if n in the majority class

Direct x more to the minority example p by modifying random interval with depending on sl(n)/k

LN-SMOTE 2 – combination with edited nearest rule / first remove difficult unsafe examples from the majority class

p n

T. Maciejewski, J. Stefanowski: Local Neighbourhood Extension of SMOTE for Mining Imbalanced Data.

Proc. of IEEE Symposium on Computational Intelligence and Data Mining, Paris, IEEE Press, 2011, 104—111

C4.5 trees F-measure

Wstawie wykres slupkowy metod

LN SMOTE – the highest improvements (balance 7.25., solar flare 5.24)

The best for 11 of 14 sets; LN SMOTE ver 2 > LN SMOTE ver. 1

Similar results for sensitivity and G-means + PART rules, k-NN

Rule based classifiers

Rules / algorithms for inducing rule sets and

constructing of rule base classifier

Rule classifiers quite sensitive to the class imbalance

problem

Still about the newest BRACID rule algorithm

IF(age > 35)THEN buy A

IF(income 1000)and(gender = male) THEN not buy A

IF (income 3000)and (education=high) THEN buy A

Rule classifiers and class imbalance

Data Ecoli: 336 ob. and 35 ob. (M class) ; 7 atr. numerical

MODLEM (no pruning) 18 rules, with 7 for the minority class

r1.(a7<0.62)&(a5>=0.11) => (Dec=O); [230,76.41%, 100%]

r2.(a1<0.75)&(a6>=0.78)&(a5<0.57) => (Dec=O); [27,8.97%, 100%]

r3.(a1<0.46) => (Dec=O); [148, 49.17%, 100%]

r4.(a1<0.75)&(a5<0.63)&(a2[0.49,0.6]) => (Dec=O); [65, 21.59%, 100%]

r5.(a1<0.75)&(a7<0.74)&(a2>=0.46) => (Dec=O); [135, 44.85%, 100%]

r6.(a2>=0.45)&(a6>=0.75)&(a1<0.69) => (Dec=O); [34, 11.3%, 100%]

...

r12.(a7>=0.62)&(a6<0.78)&(a2<0.49)&(a1 [0.57,0.68]) => (Dec=M) [6, 17.14%, 100%]

r13.(a7>=0.62)&(a6<0.76)&(a5<0.65)&(a1 [0.73,0.82]) => (Dec=M)[7, 20%, 100%]

r14.(a7>=0.74)&(a1>=0.47)&(a2>=0.45)&(a6<0.75)&(a5>=0.59) => (Dec=M); [3, 8.57%, 100%]

r15.(a5>=0.56)&(a1>=0.49)&(a2 [0.42,0.44]) => (Dec=M); [3, 8.57%, 100%]

r16.(a7>=0.74)&(a2 [0.53,0.54]) => (Dec=M); [2, 5.71%, 100%]

...

Classification strategies:

Multiple matching? Voting with supports

No matching? – partical matching or nearest rules

Rule induction - limitations for the minority class

Greedy sequential covering and top down approach

data fragmentation + small disjuncts

Connected with evaluation criteria in search

(biased toward the majority classes)

The maximum-generality bias selects the most general set of

conditions that satisfy positive examples

Biased classification strategies (decision list and unordered lists)

More in our study

K.Napierała, J. Stefanowski: BRACID A comprehesive approach to rule induction from

imbalanced data. Int. Journal of Intelligent Information Systems. 2012.

Modifications of rule algorithms

However, touching single limitationsReview

K.Napierała: Improving Rule Classifiers for Imbalanced Data. Ph.D. Thesis, PUT, 2013.

K.Napierała, J. Stefanowski: BRACID A comprehesive approach to rule induction from

imbalanced data. Int. Journal of Intelligent Information Systems, 2012.

More on related works

Changing search for candidates

Use another inductive bias

Modification of CNx to prevent small disjuncts ([R. C. Holte, L. E. Acker,

and B. W. Porter 1989]

PN-Rule: optimization precision and recall [M. Joshi, R. Agarwal, V. Kumar

2001]

ABMODLEM – Expert’s explanations [K.Napierała, J.Stefanowski 2009]

Use less greedy search for rules

Exhaustive depth-bounded search for accurate conjunctions

BRUTE (P. Riddle, R. Segal and O. Etzioni 1994)

Modification of Apriori like algorithm to handle multiple levels of

support (Liu at al.), EXPLORE [J.Stefanowski, Sz.Wilk 2004]

RLSD – one class learning

[J. Zhang, E. Bloedorn, L. Rosen, and D. Venese 2004]

Changing rule pruning

Minority rules more general [A. An, N. Cercone and X. Huang 2001]

Other related works

Changing classification strategies

Increasing rule evaluation measures [J. Grzymala-Busse, L. Goodwin, W. Grzymala-Busse, X.Zheng 2000]

ABMODLEM [K.Napierała, J.Stefanowski 2009]

SHRINK: overlapping region for minority class [M. Kubat , R. Holte , S. Matwin 1997]

Rough Sets Inspirations

Upper approximations in MODLEM [Stefanowski, Wilk 2004]

Fuzzy Rough Set Approximations [Y. Liu, B. Feng and G. Bai 2008]

Genetic / Evolutionary looking for rules

Specific genetic search – more powerful global search (Freitas and Lavington,

Weiss et al. 2004, A Orriols-Puig, F. Bernad-Mansilla 2007][C. Milar, G. Batista, A.

Carvalho 2011]

Using flexible double representations of knowledge

BRACIDBottom-up induction of Rules And Cases from Imbalanced Data

Co-authored with Krystyna Napierała (Ph.D.)

More comprehensive handling various algorithmic and data

difficulty factors

Assumptions:

Hybrid knowledge representation: rule and instances

Induction rules by bottom-up strategy

Resigning from greedy sequential covering

Some inspirations from RISE [P.Domingos 1996]

Considering info about types of difficult examples

Local neighbours with HVDM

Internal evaluation criterion (F-measure)

Local nearest rules classification strategy

More

K.Napierała, J. Stefanowski: BRACID- A comprehesive approach to rule induction from imbalanced

data. Int. Journal of Intelligent Information Systems, 2012.

From instances to rules bottom up generalization

Single example „seed” for the most specific rule

x1 (a1=L),(a2=s),(a3=2),(a4=3) – Class A

r1: IF (a1=L) and (a2=s) and (a3=2) and (a4=3) THEN Class A

Bottom up generalization

New examples and the nearest rule

x2 (a1=L),(a2=t),(a3=2.7),(a4=3) – Class A

r1’: IF (a1=L) and (a3[2;2.7]) and (a4=3) THEN Class A

Rule syntax

For nominal attribute (ai = value_ij)

For numerical attribute (vi_lower a_i vi_upper)


BRACID(Examples ES)1 RS = ES2 Ready_rules = empty_set3 Labels = Calculate labels for minority class examples 4 Iteration=0

5 Repeat6 For each rule R in RS not belonging to Ready_rules7 If R’s class is minority class8 Find Ek=k nearest examples to R not already covered

by it, and of R’s class9 If Labels[R’s seed]=safe10 Improved = AddBestRule(Ek, R,RS)11 Else12 Improved = AddAllGoodRules(Ek,R,RS) 13 If Improved=false and not Iteration=014 Extend (R)15 Add R to Ready_rules16 Else #R’s class is majority class17 Find Ek=k nearest examples to R not already

covered by it and of R’s class18 Improved = AddBestRule(Ek, R,RS, Label[R’s seed])19 If Improved=false 20 If Iteration=0 #Treat as noise21 Remove R from RS and R’s seed from ES22 Else23 Add R to Ready_rules24 Until any rule improves evaluation

25 Return RS

• Bottom-up

• Non-sequential covering

• Evaluation of new rules

with F-measues – efficient

updating classification records


BRACID(Examples ES)1 RS = ES2 Ready_rules = empty_set3 Labels = Calculate labels for minority class examples4 Iteration=0

5 Repeat6 For each rule R in RS not belonging to Ready_rules7 If R’s class is minority class8Find Ek=k nearest examples to R not already covered by it,

and of R’s class9 If Labels[R’s seed]=safe10 Improved = AddBestRule(Ek, R,RS)11 Else12 Improved = AddAllGoodRules(Ek,R,RS)

13 If Improved=false and not Iteration=014 Extend (R)15 Add R to Ready_rules

16 Else #R’s class is majority class17 Find Ek=k nearest examples to R not already covered by it,

and of R’s class18 Improved = AddBestRule(Ek, R,RS, Label[R’s seed])

19 If Improved=false 20 If Iteration=0 #Treat as noise21 Remove R from RS and R’s seed from ES22 Else23 Add R to Ready_rules24 Until any rule improved the overall classification

25 Return RS

Local neigbourhood

analysis of examples




and of R’s class9 If Labels[R’s seed]=safe10 Improved = AddBestRule(Ek, R,RS)11 Else12 Improved = AddAllGoodRules(Ek,R,RS)

13 If Improved=false and not Iteration=014 Extend (R)15 Add R to Ready_rules

16 Else #R’s class is majority class17 Find Ek=k nearest examples to R not already covered by it,


19 If Improved=false 20 If Iteration=0 #Treat as noise21 Remove R from RS and R’s seed from ES22 Else23 Add R to Ready_rules24 Until any rule improved the overall classification

25 Return RS

Remove noisy majority

examples




and of R’s class9 If Labels[R’s seed]=safe10 Improved = AddBestRule(Ek, R,RS)11 Else12 Improved = AddAllGoodRules(Ek,R,RS) 13 If Improved=false and not Iteration=014 Extend (R)15 Add R to Ready_rules16 Else #R’s class is majority class17 Find Ek=k nearest examples to R not already covered by it,


19 If Improved=false 20 If Iteration=0 #Treat as noise21 Remove R from RS and R’s seed from ES22 Else23 Add R to Ready_rules24 Until any rule improves evaluation measure

25 Return RS

Generalize the minority

rule with more

candidates


Classification strategy – 1st nearest rule

?

Conflicts with many

nearest rules from

both classes

Calculate total support

BRACID – experimental evaluation

Aims:

Many different imbalanced data sets

Study BRACID components:

Comparative study against well known rule classifiers

CN2

MODLEM

C4.5 Rules

RIPPER

PART

MODLEM-C multiplier classification strategy

RISE

Consider also hybrid with informed pre-processing

PART + SMOTE

Analyse statistics of rules

Comparing BRACID vs. classifiers - G-mean

Zbiór BRACID RISE kNN C45.rules CN2 PART RIPPER Modlem Modlem-C

abalone 0,65 0,34 0,36 0,57 0,40 0,42 0,42 0,48 0,51

b-cancer 0,56 0,54 0,47 0,49 0,46 0,53 0,48 0,49 0,53

car 0,87 0,75 0,08 0,86 0,71 0,94 0,71 0,88 0,88

cleveland 0,57 0,23 0,08 0,26 0,00 0,38 0,26 0,15 0,23

cmc 0,64 0,51 0,52 0,59 0,26 0,54 0,25 0,47 0,54

credit-g 0,61 0,54 0,57 0,55 0,47 0,60 0,44 0,56 0,65

ecoli 0,83 0,64 0,70 0,72 0,28 0,55 0,59 0,57 0,63

haberman 0,58 0,38 0,33 0,43 0,35 0,47 0,36 0,40 0,53

hepatitis 0,75 0,60 0,62 0,51 0,05 0,55 0,50 0,50 0,64

new-thyroid 0,98 0,95 0,92 0,90 0,92 0,95 0,91 0,88 0,90

solar-flareF 0,64 0,14 0,00 0,27 0,00 0,32 0,02 0,13 0,32

transfusion 0,64 0,51 0,53 0,58 0,34 0,60 0,27 0,53 0,58

vehicle 0,94 0,90 0,91 0,91 0,51 0,92 0,92 0,92 0,94

yeast-ME2 0,71 0,44 0,34 0,51 0,00 0,42 0,45 0,34 0,37

BRACID also better for F, Sensitivity; and better than PART + pre-process

More K.Napierała, J. Stefanowski: BRACID Int. Journal of Intelligent Information Systems, 2012.

BRACID – components

Various elements inside BRACID – sensitivity

C – classification strategy; N – remove majority examples;

E –extend minority rules

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

BRACID

BRACID-C

BRACID-N-C

BRACID-N-E-C

BRACID rules

Comparing BRACID variants – support of minority rules

0

5

10

15

20

25

30

35

40

BRACID

BRACID-N-E

BRACID – summary

BRACID improves recognition of the minority class

Also G-mean, F-measure, and others

Better than other rule classifiers

Competitive to SMOTE/ENN used with rule classifiers

Usually more rules but with higher supports

Testing examples – good for border and rare ones (+ safe)

However, still think about

Other classification strategies

Possible reducing a number of considered rules

Using Expert Explanations on Difficult Examples

Allow experts to give reasons why the most difficult

examples are assigned to a given class

Particularly for rare and outliers in the minority class

Argument based machine learning (Bratko et al.

2004)

Our adaptation into AB-MODLEM rule induction

Proposing a strategy to select examples to explain

Using active based learning

Experiments improving recognition of the minority

class without decreasing it for majority classes

More K.Napierała, J. Stefanowski: Argument Based Generalization of MODLEM Rule Induction

Algorithm.In Proc., RSCTC 2010, 138-147.

Generalizations of Ensembles

Data preprocessing + ensemble

Boosting-based• SMOTEBoost, DataBoost

Bagging-based

• Exactly Balanced Bagging

• Roughly Balanced Bagging

• OverBagging

• UnderOverBagging

• SMOTEBagging

• Ensemble Variation

Hybrid IIvotes

Others or Hybrid (EasyEnsemble)

Cost Sensitive Boosting

AdaCost (C1-C3)

RareBoost

Related: Galar et. al., A Review on Ensembles for the Class Imbalance Problem. IEEE TDKE 2011.

Transfor

mationimbalanced

data

new

dataset

Evaluations of New Ensembles for Imbalanced Data

Comparative studies

Galar, Herrera et al. [2011]

Simpler generalizations better

than more complex or cost

based ones

Khoshgoftaar et al. [2011]

EBBag, RBBag better then

SMOTEBoost and RUBoost

Our study on bagging [2013]

RBBag EBBag >OverBag>

SMOTEBag>Bagging

0

10

20

30

40

50

60

70

80

90

100

car cleveland ecoli hepati solarflare yeast

Bag EBBag RBBag Over SMOTEB BSMOTE

J. Blaszczynski, J., Stefanowski, L. Idkowiak: Extending bagging for imbalanced data.

Proc. CORES 2013.

Neighbourhood Balanced Bagging

Current use standard sampling with equal probabilities

and integrate with post-processing methods

Think differently

Modify the probability of drawing the example into the

bootstrap depending on its „nature”

Shift toward unsafe minority examples while decreasing

probability for majority class

Use local characteristics of examples

Another view on the bootstrap size hybrid

Some inspirations from our earlier Variable Consistency Bagging

[LEGO ECMLPKDD09]

More J.Błaszczyński, J. Stefanowski: Local Neighbourhood in Generalizing Bagging

for Imbalanced Data. COPEM – ECMLPKDD 2013.

Modifications of Sampling

Global level

(minority class)

(decrease inverse global imbalance)

Local level

„The more unsafe example the more amplified its probability”

Minority local neighb.

1

Minority

Majority

Final aggregation

11min p

majmaj N

Np min1

k

NL

maj

'

)1(5.02

min Lp

5.02 majp

localglobal PP 0

0,5

1

1,5

2

2,5

safe unsafe

1

k

1

Experiments with Nearest Balanced Bagging

Under-bagging (NBuBag)

Better than its over-sampling

variant NBoBag

Better than other extensions

of bagging except RBBag

Sensitivity NBuBag over RBBAg

(Wilcoxon)

G-mean and F measure

(NBuBag RBBAg) but according

to Friedman better than others

Analysis of bootstraps

RBBag and NBBag strongly

change data characteristics

safer types of minority

examples

Some other observations

Ensemble performance vs. labeling data

Safe examples – all ensembles works similarly

Unsafe … strong differences

RBBag and NBBag – superior for the difficult datasets

G-mean SMOBag OverBag RBBag NNBag

Balance scale 0 1.8 58.15 61.07

Ecoli 58.38 51.42 72.91 86.52

G-mean SMOBag OverBag RBBag NNBag

New thyroid 95.18 95.38 97.15 97.02

More J.Błaszczyński, J. Stefanowski: Local Neighbourhood in Generalizing Bagging

for Imbalanced Data. COPEM – ECMLPKDD 2013.

Solar flare Safe % Border % Rare % Outlier %

bagging 2.33 41.86 16.28 39.53

NBBag 84.83 7.67 3.76 3.74

RBBag 70.52 21.37 2.69 5.43

Some questions for discussing

Better understanding the imbalance problem

Multi-class imbalanced data

Tuning local methods (the size of the neighborhood)

depending on data characteristic

Better detecting sub-concepts

Studies with simulated or real data

Real model simulation

Evaluation issues

Incremental learning

Data shift and drifting concepts

Some references J.Stefanowski: Overlapping, Rare Examples and Class Decomposition in Learning

Classifiers from Imbalanced Data.In Emerging Paradigms in Machine Learning 2013.

K. Napierała, J. Stefanowski: BRACID. Journal of Intelligent Information Systems, vol. 39, no. 2, 2012, 335-373.

K.Napierała, J.Stefanowski: Identification of Different Types of Minority Class Examples in Imbalanced Data. Proc. HAIS 2012, Part II, LNAI vol. 7209, Springer Verlag 2012, 139–150.

K.Napierała, J.Stefanowski, Sz.Wilk: Learning from Imbalanced Data in Presence of Noisy and Borderline Examples. RSCTC 2010, LNAI vol. 6086, 2010, 158-167

T. Maciejewski, J. Stefanowski: Local Neighbourhood Extension of SMOTE for Mining Imbalanced Data. Proc. of IEEE Symposium on Computational Intelligence and Data Mining, Paris, IEEE Press, 2011, 104—111.

J.Stefanowski, Sz.Wilk: Improving Rule Based Classifiers Induced by MODLEM by Selective Pre-processing of Imbalanced Data. Proceedings of the RSKT Workshop ECML/PKDD, 2007, 54-65.

J.Stefanowski, Sz.Wilk: Selective pre-processing of imbalanced data for improving classification performance. Proc. of 10th Int. Conf. DaWaK 2008,LNCS vol. 5182, 2008, 283-292.

J. Błaszczyński, M. Deckert, J. Stefanowski, Sz. Wilk: IIvotes Ensemble for Imbalanced Data. Intelligent Data Analysis journal, vol. 16(5), 2012, 777–801.

K.Napierała, J.Stefanowski: Argument Based Generalization of MODLEM Rule Induction Algorithm.In Proc., RSCTC 2010, LNAI vol. 6086, Springer 2010, 138-147.

J.W. Grzymala-Busse, J.Stefanowski, S. Wilk: A Comparison of Two Approaches to Data Mining from Imbalanced Data, Proc. of the 8th Int. Conference KES 2004, LCNS, vol. 3213, 2004, 757-763

Thanks for your attention

Contact, remarks:

[email protected]

http://www.cs.put.poznan.pl/jstefanowski

The impact of local data ... - Semantic Scholar

Documents

Transcript of The impact of local data ... - Semantic Scholar