The impact of local data ... - Semantic Scholar
Transcript of The impact of local data ... - Semantic Scholar
The impact of local data
characteristics on learning
from imbalanced data
Jerzy Stefanowski
Institute of Computing Sciences,
Poznań University of Technology
Poland
JRS, Madrid, June 12, 2014
Outline of my talk
Local characteristics of imbalanced data
Data difficulty factors
Identification method
Informed pre-processing Rule induction
Extensions of bagging - if time?
Ack. of co-operation with
Krystyna Napierała Jerzy Błaszczyński Szymon Wilk
My master students (PUT)
Tomasz Maciejewski (LN-SMOTE)
Łukasz Idkowiak (Extensions of bagging)
Imbalanced data
Class imbalance (minority) class includes much smaller
number of examples than other (majority) classes
A minority class is often of primary interest
Classifiers are biased
Focused on more frequent classes,…
Better recognition of majority classes and difficulties to
classify new objects from the minority class
An information retrieval system for highly imbalanced data
(the minority class 1%) total accuracy 99.8%, but fails to
recognize the important class
++
++ ++
++
++
+++++
++++++ +
+ ++
+++ ++
+ +
+
+ – –
––
–
Evaluation issues
Performance for the minority class
Sensitivity and specificity,
ROC curve analysis + AUC
Others curves (PR), …
Aggregated measures:
Original
Predicted
+ -
+ TP FN
- FP TN
FNTP
TPySensitivit
FPTN
TNySpecificit
More about occurrence of class imbalance
Medical Problems – rare but dangerous illness
Technical Diagnostics - Helicopter Gearbox Fault
Monitoring
Document Filtering
Direct Marketing
Detection of Oil Spills
Detection of Fraudulent Telephone Calls
Increase of interest in ML i DM community
Known in applications
Special events AAAI’2000 Workshop, (org: R. Holte, N. Japkowicz, C.
Ling, S. Matwin) ICML’2000 Worshop, also on cost sensitive
(org. Dietterich T. et al.) ICML’2003 Workshop, (org.: N. Chawla, N. Japkowicz,
A. Kolcz) ECAI 2004 Workshop, (org.: Ferri C., Flach P., Orallo J.
Lachice. N.) ICML 2007 – Session on Learning from Imbalanced Data PAKDD 2009 – Workshop on Imbalanced Data ICMLA 2012 – Special Session (org. Japkowicz,
Stefanowski, Chawla)
Special issues: ACM SIG KDD Explorations Newsletter, editors: N.
Chawla, N. Japkowicz, A. Kolcz
Book: He H., Yungian, Ma (eds): Imbalanced Learning. Foundations, Algorithms and Applications. IEEE - Wiley, 2013.
Specialized methods
General categorization of approaches
Data level (preprocessing)
Algorithm level
Different methods Re-sampling or re-weighting
Modify inductive bias, search, evaluation criteria
New classification strategies
Hybrid algorithms
Transformation to „cost-sensitive learning” Ensemble approaches (boosting, bagging or cost versions …)
One-class-learning …
Some reviews He H., Yungian, Ma (eds): Imbalanced Learning. Foundations, Algorithms and Applications. IEEE -
Wiley, 2013.
Weiss G.M., Mining with rarity: a unifying framework. ACM Newsletter, 2004.
Chawla N., Data mining for imbalanced datasets: an overview. In The Data mining and knowledge discovery handbook, Springer 2005.
He H, Garcia, Mining imbalanced data. IEEE Trans. Data and Knowledge 2009.
More Research on Class Imbalance?
Although many methods in the last decade?
Heuristic approaches,…
Different performance, ….
Complex characteristics of imbalanced data
„Nature” of imbalanced data
Minority vs. majority classes only?
Focus on data difficulty factors and
characteristics of local distributions!
%5min N
%95 Nmaj
Related works on data difficulty factors
Studies on simulated, artificial data N.Japkowicz co-operators 2002-2004
V. Garcia, J. Sanchez, R. Mollineda 2007
Prati, Batista, Monard 2004, Stefanowski et al. 2010, 2013
V.Lopez, P.Herrera et al. 2012,2014
Imbalanced data distributions
Why is it difficult?
The nature of the problem with respect to data distributions
[Jo, Japkowicz 2004]
Defragmentation and small disjuncts
[Garcia, Sanchez, Mollineda 2007]
Overlapping and rare examples
Data difficulty factors
• Lack of data and global class imbalance
• Small disjuncts, difficult borderline, rare examples, minority outliers
Other more complex data
J. Stefanowski 2009
Factors (concept shape, minority sub-clusters, overlapping and rare examples vs. global imbalance ratio) + non-linear shapes
Standard classifiers
Decomposing more influential than imb. ratio
Overlapping and rare examples even more
More in:
JS: Overlapping, Rare Examples and Class Decomposition
in Learning Classifiers from Imbalanced Data, 2013
Paw and clover data stes
Data Difficulty Factors Types of Examples
Distinguish four types of minority examples:
SAFE
RARE
OUTLIERBORDER
ECOLI
BORDERLINE
Visualization of imbalanced data
CLEVELAND
RARE & OUTLIER
THYROID
SAFE
t-SN visualisation
t-Distributed Stochastic Neighbor Embedding (t-SNE)
Keeps probability that i is a neighbor of j / local distances
New Thyroid
Safe Border
Ecoli
Analyse the distribution in the local neighbourhood
K-NN (k=5, and higher)
Distance HVDM
Alternatives – kernel functions
5:0 4:1 3:2 2:3 1:4 0:5
SAFE OUTLIERRAREBORDER
Identification of examples – local approach
Details K.Napierała, J. Stefanowski: The influence of minority class distribution on
learning from imbalance data. HAIS, 2012.
Choosing distance for mixed attributes
HVDM – Heterogeneous Value Distance Metric
A component distance of nominal attributes value
difference metric by Stanfill and Waltz
Comparison to other heterogeneous distance measures
(HEOM, Gower, GEM and variants of Mahalonobis)
J.Laurikkala et. al. / Wilson, Martinez
Tuning k (5,7,9,..) – experimental studies
ma aaa yxdyxHDVM 1
2 ),(),(
Re-discovery of known distributions
Labeling Minority Examples UCI Data sets
Most data sets unsafe
Mixed types of minority
examples
Very unsafe distribution of
the minority examples
cleveland 51%
outliers, no safe ones
solar flare, balance scale
Majority class quite safe
yeast 98,5% S
ecoli 91,7% S
Experiences with k (7,9,..) or
kernels similar
categorizations of data
Experimental setup
Aim compare standard popular classifiers and then pre-
processing methods (specialized for class imbalanced) and
verify differences in performance with respect to the types of
the minority class in data
Classifiers SVM, ANN, k-NN, trees J4.8 and rules PART
Pre-processing random over-sampling, NCR, SMOTE and
SPIDER
Measures sensitivity, specificity, G-mean, F1
Design of experiments:
10-fold stratified cross validation,
22 UCI real world imbalanced datasets
Friedman Test
5.5•SVM
4.7•PART
4.5•RBF
4.1•J48
3.3•1NN
2.2•3NN
5.9•PART
5.5•J48
4.2•1NN
3.6•RBF
2.8•3NN
2.5•SVM
Rankings of classifiers in data categories
More: K.Napierała: Improving rule classifiers for imbalanced data. Ph.D Thesis, PUT, 2013.
K.Napierała, J. Stefanowski: The influence of minority class distribution on learning from imbalance data. HAIS, 2012
Safe / borderRare/outlier
1NN, J48: similar orders
RBF: RO slighlty better
B
R
O
Friedman Tests
4.3•NCR
3.8•SPIDER
3.4•SMOTE
2.0•RO
1.3•NONE
3.8
•SPIDER
•NCR
3.2•
SMOTE
2.8•RO
1.4•NONE
4.3•SMOTE
3.5• SPIDER
3.4•NCR
2.2•RO
1.5•NONE
Pre-processing with respect to labels of testing examples
Modeling local data characteristics
Lessons
Most data sets are unsafe type of examples
Outliers and rare case important part of the minority class
distributions
The global imbalance ratio and size of data are not so influential
as types of examples
Local data characteristics differentiate performance of standard
classifiers and pre-processing methods
The attempt to approximate their competence
Classifiers
S – all classifiers comparable
B - SVM -> trees/rules, RBF -> kNN
R/O – trees/rules 1NN >
Preprocessing
B: undersampling (NCR)
R: hybrid (SPIDER) > SMOTE
O: SMOTE
Modeling local data characteristics
Some open issues
Tune the size of neighbourhood depending on data set
Noise vs. unsafe examples
Outliers and rare empty sub-space
Looking for small disjuncts which methods
Different local „density” how to find and exploit
Lower vs. higher attribute dimensionality (hubs)
Multi-class imbalance
Evaluation issues
Now basic info on applying local data
characteristics in algorithms specialized
for class imbalances
1. Improving informed pre-processing methods
2. Local neighbourhood inside rule induction
3. Extensions of sampling in bagging
Pre-processing approaches
Transform original data distribution: Simple random sampling
„Over-sampling” – minority class
„Under-sampling”- majority class
Informed and focused transformation Filtering majority examples
• One-side-sampling (Kubat, Matwin) + Tomek Links
• NCR - Laurikkala’s edited nearest neighbor rule
Oversampling with synthetic examples• SMOTE Chawla et al.
• Borderline SMOTE, Safe Level, …
Hybrid ones• SPIDER
• SMOTE and undersampling filters
Modifications of ensembles
Transfor
mationimbalanced
data
new
dataset
He, Garcia, Learning from imbalanced data. IEEE Trans. DKE , 21(9), 2009.
Selective Preprocesing of Imbalanced Data SPIDER
J.Stefanowski, Sz.Wilk, ECMLPKDD workshop 2007
Hybrid approach limited filtering
and local copying of some minority
examples
Two phases
Identifying types of examples with k
neighborhood (HVDM)
For the majority class selective
removing difficult examples or
relabeling them
The minority class – re-sampling unsafe
examples
weak or strong amplification
• amplify by creating as many
copies as O-safe examples in the
k-neighborhood
Experimental evaluation (rules, trees, k-NN)
Sensitivity
0.00
0.20
0.40
0.60
0.80
1.00
acl
bre
ast-
cancer
bupa
cle
vela
nd
ecoli
haberm
an
hepati
tis
new
-thyro
id
pim
a
BaseLine SMOTE NCR Relabel Weak Amp Strong Amp
All informed methods outperform the baseline approach
NCR – the highest improvement of sensitivity (haberman 0.386, bupa
0.353) but deteriorates specificity
Relabeling or strong amplification – the second best approach
(7 of 9 sets) for sensitivity + good balance with specificity (G-mean)
More: J.Stefanowski, Sz.Wilk. Selective pre-processing of imbalanced data for
improving classification performance. DAWAK 2008
Specificity
0.00
0.20
0.40
0.60
0.80
1.00
acl
brea
st-c
ance
r
bupa
clev
elan
d
ecol
i
habe
rman
hepa
titi
s
new
-thy
roid
pim
a
Baseline SMOTE NCR Relabel Weak Amp Strong Amp
SMOTE - Synthetic Minority Oversampling Technique
N.Chawla, Hall, Kegelmeyer 2002
For each p from the minority class
Find its k-nearest neighbours also from the minority class• HVDM distance
Randomly select o of these neighbours(o - the amount of over-sampling desired)
Generate a synthetic example along the line between p and randomly selected example n
Can work with mixed attributes
o influential !
However ….
)( iiinew pnpx
SMOTE limitations
„Blind” over-generalization in the directions of majority neighbors class
Number of synthetic examples - o –a global parameter not easy to tune
Insufficient minority class with distant
small sub-part
All examples equally important
SMOTE Extensions
Combine with post-processing (filters)
Internal modifications
Borderline SMOTE
SafeLevel SMOTE
LN-SMOTE
…
LN – SMOTE / Maciejewski, Stefanowski IEEE CIDM 2011
Analyse neighbourhoods of seed p and its selected neighbour n
Safe level sl() no. of minority examples
Compare sl(p) and sl(n) generate new x closer to the safer local region
Another view on local safe levels (p n)
Generation of x if n in the majority class
Direct x more to the minority example p by modifying random interval with depending on sl(n)/k
LN-SMOTE 2 – combination with edited nearest rule / first remove difficult unsafe examples from the majority class
p n
T. Maciejewski, J. Stefanowski: Local Neighbourhood Extension of SMOTE for Mining Imbalanced Data.
Proc. of IEEE Symposium on Computational Intelligence and Data Mining, Paris, IEEE Press, 2011, 104—111
C4.5 trees F-measure
Wstawie wykres slupkowy metod
LN SMOTE – the highest improvements (balance 7.25., solar flare 5.24)
The best for 11 of 14 sets; LN SMOTE ver 2 > LN SMOTE ver. 1
Similar results for sensitivity and G-means + PART rules, k-NN
Rule based classifiers
Rules / algorithms for inducing rule sets and
constructing of rule base classifier
Rule classifiers quite sensitive to the class imbalance
problem
Still about the newest BRACID rule algorithm
IF(age > 35)THEN buy A
IF(income 1000)and(gender = male) THEN not buy A
IF (income 3000)and (education=high) THEN buy A
Rule classifiers and class imbalance
Data Ecoli: 336 ob. and 35 ob. (M class) ; 7 atr. numerical
MODLEM (no pruning) 18 rules, with 7 for the minority class
r1.(a7<0.62)&(a5>=0.11) => (Dec=O); [230,76.41%, 100%]
r2.(a1<0.75)&(a6>=0.78)&(a5<0.57) => (Dec=O); [27,8.97%, 100%]
r3.(a1<0.46) => (Dec=O); [148, 49.17%, 100%]
r4.(a1<0.75)&(a5<0.63)&(a2[0.49,0.6]) => (Dec=O); [65, 21.59%, 100%]
r5.(a1<0.75)&(a7<0.74)&(a2>=0.46) => (Dec=O); [135, 44.85%, 100%]
r6.(a2>=0.45)&(a6>=0.75)&(a1<0.69) => (Dec=O); [34, 11.3%, 100%]
...
r12.(a7>=0.62)&(a6<0.78)&(a2<0.49)&(a1 [0.57,0.68]) => (Dec=M) [6, 17.14%, 100%]
r13.(a7>=0.62)&(a6<0.76)&(a5<0.65)&(a1 [0.73,0.82]) => (Dec=M)[7, 20%, 100%]
r14.(a7>=0.74)&(a1>=0.47)&(a2>=0.45)&(a6<0.75)&(a5>=0.59) => (Dec=M); [3, 8.57%, 100%]
r15.(a5>=0.56)&(a1>=0.49)&(a2 [0.42,0.44]) => (Dec=M); [3, 8.57%, 100%]
r16.(a7>=0.74)&(a2 [0.53,0.54]) => (Dec=M); [2, 5.71%, 100%]
...
Classification strategies:
Multiple matching? Voting with supports
No matching? – partical matching or nearest rules
Rule induction - limitations for the minority class
Greedy sequential covering and top down approach
data fragmentation + small disjuncts
Connected with evaluation criteria in search
(biased toward the majority classes)
The maximum-generality bias selects the most general set of
conditions that satisfy positive examples
Biased classification strategies (decision list and unordered lists)
More in our study
K.Napierała, J. Stefanowski: BRACID A comprehesive approach to rule induction from
imbalanced data. Int. Journal of Intelligent Information Systems. 2012.
Modifications of rule algorithms
However, touching single limitationsReview
K.Napierała: Improving Rule Classifiers for Imbalanced Data. Ph.D. Thesis, PUT, 2013.
K.Napierała, J. Stefanowski: BRACID A comprehesive approach to rule induction from
imbalanced data. Int. Journal of Intelligent Information Systems, 2012.
More on related works
Changing search for candidates
Use another inductive bias
Modification of CNx to prevent small disjuncts ([R. C. Holte, L. E. Acker,
and B. W. Porter 1989]
PN-Rule: optimization precision and recall [M. Joshi, R. Agarwal, V. Kumar
2001]
ABMODLEM – Expert’s explanations [K.Napierała, J.Stefanowski 2009]
Use less greedy search for rules
Exhaustive depth-bounded search for accurate conjunctions
BRUTE (P. Riddle, R. Segal and O. Etzioni 1994)
Modification of Apriori like algorithm to handle multiple levels of
support (Liu at al.), EXPLORE [J.Stefanowski, Sz.Wilk 2004]
RLSD – one class learning
[J. Zhang, E. Bloedorn, L. Rosen, and D. Venese 2004]
Changing rule pruning
Minority rules more general [A. An, N. Cercone and X. Huang 2001]
Other related works
Changing classification strategies
Increasing rule evaluation measures [J. Grzymala-Busse, L. Goodwin, W. Grzymala-Busse, X.Zheng 2000]
ABMODLEM [K.Napierała, J.Stefanowski 2009]
SHRINK: overlapping region for minority class [M. Kubat , R. Holte , S. Matwin 1997]
Rough Sets Inspirations
Upper approximations in MODLEM [Stefanowski, Wilk 2004]
Fuzzy Rough Set Approximations [Y. Liu, B. Feng and G. Bai 2008]
Genetic / Evolutionary looking for rules
Specific genetic search – more powerful global search (Freitas and Lavington,
Weiss et al. 2004, A Orriols-Puig, F. Bernad-Mansilla 2007][C. Milar, G. Batista, A.
Carvalho 2011]
Using flexible double representations of knowledge
BRACIDBottom-up induction of Rules And Cases from Imbalanced Data
Co-authored with Krystyna Napierała (Ph.D.)
More comprehensive handling various algorithmic and data
difficulty factors
Assumptions:
Hybrid knowledge representation: rule and instances
Induction rules by bottom-up strategy
Resigning from greedy sequential covering
Some inspirations from RISE [P.Domingos 1996]
Considering info about types of difficult examples
Local neighbours with HVDM
Internal evaluation criterion (F-measure)
Local nearest rules classification strategy
More
K.Napierała, J. Stefanowski: BRACID- A comprehesive approach to rule induction from imbalanced
data. Int. Journal of Intelligent Information Systems, 2012.
From instances to rules bottom up generalization
Single example „seed” for the most specific rule
x1 (a1=L),(a2=s),(a3=2),(a4=3) – Class A
r1: IF (a1=L) and (a2=s) and (a3=2) and (a4=3) THEN Class A
Bottom up generalization
New examples and the nearest rule
x2 (a1=L),(a2=t),(a3=2.7),(a4=3) – Class A
r1’: IF (a1=L) and (a3[2;2.7]) and (a4=3) THEN Class A
Rule syntax
For nominal attribute (ai = value_ij)
For numerical attribute (vi_lower a_i vi_upper)
BRACIDBottom-up induction of Rules And Cases from Imbalanced Data
BRACID(Examples ES)1 RS = ES2 Ready_rules = empty_set3 Labels = Calculate labels for minority class examples 4 Iteration=0
5 Repeat6 For each rule R in RS not belonging to Ready_rules7 If R’s class is minority class8 Find Ek=k nearest examples to R not already covered
by it, and of R’s class9 If Labels[R’s seed]=safe10 Improved = AddBestRule(Ek, R,RS)11 Else12 Improved = AddAllGoodRules(Ek,R,RS) 13 If Improved=false and not Iteration=014 Extend (R)15 Add R to Ready_rules16 Else #R’s class is majority class17 Find Ek=k nearest examples to R not already
covered by it and of R’s class18 Improved = AddBestRule(Ek, R,RS, Label[R’s seed])19 If Improved=false 20 If Iteration=0 #Treat as noise21 Remove R from RS and R’s seed from ES22 Else23 Add R to Ready_rules24 Until any rule improves evaluation
25 Return RS
• Bottom-up
• Non-sequential covering
• Evaluation of new rules
with F-measues – efficient
updating classification records
BRACIDBottom-up induction of Rules And Cases from Imbalanced Data
BRACID(Examples ES)1 RS = ES2 Ready_rules = empty_set3 Labels = Calculate labels for minority class examples4 Iteration=0
5 Repeat6 For each rule R in RS not belonging to Ready_rules7 If R’s class is minority class8Find Ek=k nearest examples to R not already covered by it,
and of R’s class9 If Labels[R’s seed]=safe10 Improved = AddBestRule(Ek, R,RS)11 Else12 Improved = AddAllGoodRules(Ek,R,RS)
13 If Improved=false and not Iteration=014 Extend (R)15 Add R to Ready_rules
16 Else #R’s class is majority class17 Find Ek=k nearest examples to R not already covered by it,
and of R’s class18 Improved = AddBestRule(Ek, R,RS, Label[R’s seed])
19 If Improved=false 20 If Iteration=0 #Treat as noise21 Remove R from RS and R’s seed from ES22 Else23 Add R to Ready_rules24 Until any rule improved the overall classification
25 Return RS
Local neigbourhood
analysis of examples
BRACIDBottom-up induction of Rules And Cases from Imbalanced Data
BRACID(Examples ES)1 RS = ES2 Ready_rules = empty_set3 Labels = Calculate labels for minority class examples 4 Iteration=0
5 Repeat6 For each rule R in RS not belonging to Ready_rules7 If R’s class is minority class8Find Ek=k nearest examples to R not already covered by it,
and of R’s class9 If Labels[R’s seed]=safe10 Improved = AddBestRule(Ek, R,RS)11 Else12 Improved = AddAllGoodRules(Ek,R,RS)
13 If Improved=false and not Iteration=014 Extend (R)15 Add R to Ready_rules
16 Else #R’s class is majority class17 Find Ek=k nearest examples to R not already covered by it,
and of R’s class18 Improved = AddBestRule(Ek, R,RS, Label[R’s seed])
19 If Improved=false 20 If Iteration=0 #Treat as noise21 Remove R from RS and R’s seed from ES22 Else23 Add R to Ready_rules24 Until any rule improved the overall classification
25 Return RS
Remove noisy majority
examples
BRACIDBottom-up induction of Rules And Cases from Imbalanced Data
BRACID(Examples ES)1 RS = ES2 Ready_rules = empty_set3 Labels = Calculate labels for minority class examples 4 Iteration=0
5 Repeat6 For each rule R in RS not belonging to Ready_rules7 If R’s class is minority class8Find Ek=k nearest examples to R not already covered by it,
and of R’s class9 If Labels[R’s seed]=safe10 Improved = AddBestRule(Ek, R,RS)11 Else12 Improved = AddAllGoodRules(Ek,R,RS) 13 If Improved=false and not Iteration=014 Extend (R)15 Add R to Ready_rules16 Else #R’s class is majority class17 Find Ek=k nearest examples to R not already covered by it,
and of R’s class18 Improved = AddBestRule(Ek, R,RS, Label[R’s seed])
19 If Improved=false 20 If Iteration=0 #Treat as noise21 Remove R from RS and R’s seed from ES22 Else23 Add R to Ready_rules24 Until any rule improves evaluation measure
25 Return RS
Generalize the minority
rule with more
candidates
BRACIDBottom-up induction of Rules And Cases from Imbalanced Data
Classification strategy – 1st nearest rule
?
Conflicts with many
nearest rules from
both classes
Calculate total support
BRACID – experimental evaluation
Aims:
Many different imbalanced data sets
Study BRACID components:
Comparative study against well known rule classifiers
CN2
MODLEM
C4.5 Rules
RIPPER
PART
MODLEM-C multiplier classification strategy
RISE
Consider also hybrid with informed pre-processing
PART + SMOTE
Analyse statistics of rules
Comparing BRACID vs. classifiers - G-mean
Zbiór BRACID RISE kNN C45.rules CN2 PART RIPPER Modlem Modlem-C
abalone 0,65 0,34 0,36 0,57 0,40 0,42 0,42 0,48 0,51
b-cancer 0,56 0,54 0,47 0,49 0,46 0,53 0,48 0,49 0,53
car 0,87 0,75 0,08 0,86 0,71 0,94 0,71 0,88 0,88
cleveland 0,57 0,23 0,08 0,26 0,00 0,38 0,26 0,15 0,23
cmc 0,64 0,51 0,52 0,59 0,26 0,54 0,25 0,47 0,54
credit-g 0,61 0,54 0,57 0,55 0,47 0,60 0,44 0,56 0,65
ecoli 0,83 0,64 0,70 0,72 0,28 0,55 0,59 0,57 0,63
haberman 0,58 0,38 0,33 0,43 0,35 0,47 0,36 0,40 0,53
hepatitis 0,75 0,60 0,62 0,51 0,05 0,55 0,50 0,50 0,64
new-thyroid 0,98 0,95 0,92 0,90 0,92 0,95 0,91 0,88 0,90
solar-flareF 0,64 0,14 0,00 0,27 0,00 0,32 0,02 0,13 0,32
transfusion 0,64 0,51 0,53 0,58 0,34 0,60 0,27 0,53 0,58
vehicle 0,94 0,90 0,91 0,91 0,51 0,92 0,92 0,92 0,94
yeast-ME2 0,71 0,44 0,34 0,51 0,00 0,42 0,45 0,34 0,37
BRACID also better for F, Sensitivity; and better than PART + pre-process
More K.Napierała, J. Stefanowski: BRACID Int. Journal of Intelligent Information Systems, 2012.
BRACID – components
Various elements inside BRACID – sensitivity
C – classification strategy; N – remove majority examples;
E –extend minority rules
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
BRACID
BRACID-C
BRACID-N-C
BRACID-N-E-C
BRACID rules
Comparing BRACID variants – support of minority rules
0
5
10
15
20
25
30
35
40
BRACID
BRACID-N-E
BRACID – summary
BRACID improves recognition of the minority class
Also G-mean, F-measure, and others
Better than other rule classifiers
Competitive to SMOTE/ENN used with rule classifiers
Usually more rules but with higher supports
Testing examples – good for border and rare ones (+ safe)
However, still think about
Other classification strategies
Possible reducing a number of considered rules
Using Expert Explanations on Difficult Examples
Allow experts to give reasons why the most difficult
examples are assigned to a given class
Particularly for rare and outliers in the minority class
Argument based machine learning (Bratko et al.
2004)
Our adaptation into AB-MODLEM rule induction
Proposing a strategy to select examples to explain
Using active based learning
Experiments improving recognition of the minority
class without decreasing it for majority classes
More K.Napierała, J. Stefanowski: Argument Based Generalization of MODLEM Rule Induction
Algorithm.In Proc., RSCTC 2010, 138-147.
Generalizations of Ensembles
Data preprocessing + ensemble
Boosting-based• SMOTEBoost, DataBoost
Bagging-based
• Exactly Balanced Bagging
• Roughly Balanced Bagging
• OverBagging
• UnderOverBagging
• SMOTEBagging
• Ensemble Variation
Hybrid IIvotes
Others or Hybrid (EasyEnsemble)
Cost Sensitive Boosting
AdaCost (C1-C3)
RareBoost
Related: Galar et. al., A Review on Ensembles for the Class Imbalance Problem. IEEE TDKE 2011.
Transfor
mationimbalanced
data
new
dataset
Evaluations of New Ensembles for Imbalanced Data
Comparative studies
Galar, Herrera et al. [2011]
Simpler generalizations better
than more complex or cost
based ones
Khoshgoftaar et al. [2011]
EBBag, RBBag better then
SMOTEBoost and RUBoost
Our study on bagging [2013]
RBBag EBBag >OverBag>
SMOTEBag>Bagging
0
10
20
30
40
50
60
70
80
90
100
car cleveland ecoli hepati solarflare yeast
Bag EBBag RBBag Over SMOTEB BSMOTE
J. Blaszczynski, J., Stefanowski, L. Idkowiak: Extending bagging for imbalanced data.
Proc. CORES 2013.
Neighbourhood Balanced Bagging
Current use standard sampling with equal probabilities
and integrate with post-processing methods
Think differently
Modify the probability of drawing the example into the
bootstrap depending on its „nature”
Shift toward unsafe minority examples while decreasing
probability for majority class
Use local characteristics of examples
Another view on the bootstrap size hybrid
Some inspirations from our earlier Variable Consistency Bagging
[LEGO ECMLPKDD09]
More J.Błaszczyński, J. Stefanowski: Local Neighbourhood in Generalizing Bagging
for Imbalanced Data. COPEM – ECMLPKDD 2013.
Modifications of Sampling
Global level
(minority class)
(decrease inverse global imbalance)
Local level
„The more unsafe example the more amplified its probability”
Minority local neighb.
1
Minority
Majority
Final aggregation
11min p
majmaj N
Np min1
k
NL
maj
'
)1(5.02
min Lp
5.02 majp
localglobal PP 0
0,5
1
1,5
2
2,5
safe unsafe
1
k
1
Experiments with Nearest Balanced Bagging
Under-bagging (NBuBag)
Better than its over-sampling
variant NBoBag
Better than other extensions
of bagging except RBBag
Sensitivity NBuBag over RBBAg
(Wilcoxon)
G-mean and F measure
(NBuBag RBBAg) but according
to Friedman better than others
Analysis of bootstraps
RBBag and NBBag strongly
change data characteristics
safer types of minority
examples
Some other observations
Ensemble performance vs. labeling data
Safe examples – all ensembles works similarly
Unsafe … strong differences
RBBag and NBBag – superior for the difficult datasets
G-mean SMOBag OverBag RBBag NNBag
Balance scale 0 1.8 58.15 61.07
Ecoli 58.38 51.42 72.91 86.52
G-mean SMOBag OverBag RBBag NNBag
New thyroid 95.18 95.38 97.15 97.02
More J.Błaszczyński, J. Stefanowski: Local Neighbourhood in Generalizing Bagging
for Imbalanced Data. COPEM – ECMLPKDD 2013.
Solar flare Safe % Border % Rare % Outlier %
bagging 2.33 41.86 16.28 39.53
NBBag 84.83 7.67 3.76 3.74
RBBag 70.52 21.37 2.69 5.43
Some questions for discussing
Better understanding the imbalance problem
Multi-class imbalanced data
Tuning local methods (the size of the neighborhood)
depending on data characteristic
Better detecting sub-concepts
Studies with simulated or real data
Real model simulation
Evaluation issues
Incremental learning
Data shift and drifting concepts
Some references J.Stefanowski: Overlapping, Rare Examples and Class Decomposition in Learning
Classifiers from Imbalanced Data.In Emerging Paradigms in Machine Learning 2013.
K. Napierała, J. Stefanowski: BRACID. Journal of Intelligent Information Systems, vol. 39, no. 2, 2012, 335-373.
K.Napierała, J.Stefanowski: Identification of Different Types of Minority Class Examples in Imbalanced Data. Proc. HAIS 2012, Part II, LNAI vol. 7209, Springer Verlag 2012, 139–150.
K.Napierała, J.Stefanowski, Sz.Wilk: Learning from Imbalanced Data in Presence of Noisy and Borderline Examples. RSCTC 2010, LNAI vol. 6086, 2010, 158-167
T. Maciejewski, J. Stefanowski: Local Neighbourhood Extension of SMOTE for Mining Imbalanced Data. Proc. of IEEE Symposium on Computational Intelligence and Data Mining, Paris, IEEE Press, 2011, 104—111.
J.Stefanowski, Sz.Wilk: Improving Rule Based Classifiers Induced by MODLEM by Selective Pre-processing of Imbalanced Data. Proceedings of the RSKT Workshop ECML/PKDD, 2007, 54-65.
J.Stefanowski, Sz.Wilk: Selective pre-processing of imbalanced data for improving classification performance. Proc. of 10th Int. Conf. DaWaK 2008,LNCS vol. 5182, 2008, 283-292.
J. Błaszczyński, M. Deckert, J. Stefanowski, Sz. Wilk: IIvotes Ensemble for Imbalanced Data. Intelligent Data Analysis journal, vol. 16(5), 2012, 777–801.
K.Napierała, J.Stefanowski: Argument Based Generalization of MODLEM Rule Induction Algorithm.In Proc., RSCTC 2010, LNAI vol. 6086, Springer 2010, 138-147.
J.W. Grzymala-Busse, J.Stefanowski, S. Wilk: A Comparison of Two Approaches to Data Mining from Imbalanced Data, Proc. of the 8th Int. Conference KES 2004, LCNS, vol. 3213, 2004, 757-763
Thanks for your attention
Contact, remarks:
http://www.cs.put.poznan.pl/jstefanowski