Post on 10-Nov-2021
Incremental Approach to Interpretable Classification RuleLearning
Bishwamittra Ghosh and Kuldeep S. MeelSchool of Computing, National University of Singapore
CP 2019
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 1
Introduction
Practical applications of machine learning
I Hiring employees
I Giving a loan to a person
I Predicting recidivism: likelihood of a person convicted of a crime tooffend again
I . . .
Should we believe the prediction of machine learning models?
Interpretable classification model
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 2
Introduction
Practical applications of machine learning
I Hiring employees
I Giving a loan to a person
I Predicting recidivism: likelihood of a person convicted of a crime tooffend again
I . . .
Should we believe the prediction of machine learning models?
Interpretable classification model
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 2
Introduction
Practical applications of machine learning
I Hiring employees
I Giving a loan to a person
I Predicting recidivism: likelihood of a person convicted of a crime tooffend again
I . . .
Should we believe the prediction of machine learning models?
Interpretable classification model
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 2
Introduction
Example Dataset
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 3
Introduction
Representation of an interpretable model and a black boxmodel
A sample is predicted as Iris Versicolor if(sepal length > 6.3 OR sepal width > 3OR petal width ≤ 1.5 )
AND(sepal width ≤ 2.7 OR petal length > 4OR petal width > 1.2)
AND(petal length ≤ 5)
Interpretable Model Black Box Model
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 4
Introduction
Formula
I A CNF (Conjunctive Normal Form) formula is a conjunction ofclauses where each clause is a disjunction of literals
(a ∨ ¬b ∨ c) ∧ (d ∨ e)
I A DNF (Disjunctive Normal Form) formula is a disjunction of clauseswhere each clause is a conjunction of literals
(a ∧ b ∧ ¬c) ∨ (d ∧ e)
I Decision rules in CNF and DNF are highly interpretable[Malioutov’18; Lakkaraju’19]
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 5
Introduction
Formula
I A CNF (Conjunctive Normal Form) formula is a conjunction ofclauses where each clause is a disjunction of literals
(a ∨ ¬b ∨ c) ∧ (d ∨ e)
I A DNF (Disjunctive Normal Form) formula is a disjunction of clauseswhere each clause is a conjunction of literals
(a ∧ b ∧ ¬c) ∨ (d ∧ e)
I Decision rules in CNF and DNF are highly interpretable[Malioutov’18; Lakkaraju’19]
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 5
Preliminaries
Definition of interpretability in rule-based classifiers
I There exists different notions of interpretability of rules
R =(a ∨ b ∨ ¬c ∨ d ∨ e)∧(f ∨ g ∨ h ∨ ¬i)∧(j ∨ k ∨ ¬l)∧(¬m ∨ n ∨ o ∨ p ∨ q)∧
R = (a ∨ b ∨ ¬c) ∧ (f ∨ g)
I Rules with fewer terms are considered interpretable in medicaldomains [Letham’15]
I We refer rule size as a proxy of interpretability in rule-based classifiers
I For rules expressed as CNF/DNF, rule size = number of literals
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 6
Preliminaries
Definition of interpretability in rule-based classifiers
I There exists different notions of interpretability of rules
R =(a ∨ b ∨ ¬c ∨ d ∨ e)∧(f ∨ g ∨ h ∨ ¬i)∧(j ∨ k ∨ ¬l)∧(¬m ∨ n ∨ o ∨ p ∨ q)∧
R = (a ∨ b ∨ ¬c) ∧ (f ∨ g)
I Rules with fewer terms are considered interpretable in medicaldomains [Letham’15]
I We refer rule size as a proxy of interpretability in rule-based classifiers
I For rules expressed as CNF/DNF, rule size = number of literals
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 6
Preliminaries
Definition of interpretability in rule-based classifiers
I There exists different notions of interpretability of rules
R =(a ∨ b ∨ ¬c ∨ d ∨ e)∧(f ∨ g ∨ h ∨ ¬i)∧(j ∨ k ∨ ¬l)∧(¬m ∨ n ∨ o ∨ p ∨ q)∧
R = (a ∨ b ∨ ¬c) ∧ (f ∨ g)
I Rules with fewer terms are considered interpretable in medicaldomains [Letham’15]
I We refer rule size as a proxy of interpretability in rule-based classifiers
I For rules expressed as CNF/DNF, rule size = number of literals
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 6
Design of an interpretable rule-based classifier
Outline
1 Introduction
2 Preliminaries
3 Design of an interpretable rule-based classifier
4 Incremental learning
5 Experimental Evaluation
6 Conclusion
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 7
Design of an interpretable rule-based classifier
Design of an interpretable classifier [Malioutov’18]
I We design objective function toI minimize prediction errorI minimize rule size (i.e., maximize interpretability)
I Consider decision variables:I feature variables bji = 1{j-th feature is selected in i-th clause}I noise variables ηq = 1{sample q is misclassified}
min∑i ,j
bji + λ∑q
ηq
I Constraints:I a positive labeled sample satisfies the ruleI a negative labeled sample does not satisfy the ruleI otherwise the sample is considered as noise
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 8
Design of an interpretable rule-based classifier
Design of an interpretable classifier [Malioutov’18]
I We design objective function toI minimize prediction errorI minimize rule size (i.e., maximize interpretability)
I Consider decision variables:I feature variables bji = 1{j-th feature is selected in i-th clause}I noise variables ηq = 1{sample q is misclassified}
min∑i ,j
bji + λ∑q
ηq
I Constraints:I a positive labeled sample satisfies the ruleI a negative labeled sample does not satisfy the ruleI otherwise the sample is considered as noise
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 8
Design of an interpretable rule-based classifier
MaxSAT
In MaxSAT
I Hard Clause: always satisfied, weight = ∞I Soft Clause: can be falsified, weight = R+
MaxSAT finds an assignment that satisfies all hard clauses and most softclauses such that the weight of satisfied soft clauses is maximized
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 9
Design of an interpretable rule-based classifier
MaxSAT-based approach for interpretable rule-basedclassification
I the objective function is encoded as soft clauses
I the constraints are encoded as hard clauses
Analysis
I To generate a k-clause CNF rule for a dataset of n samples over mboolean features, the number of clauses of the MaxSAT instance isO(n ·m · k)
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 10
Incremental learning
An Incremental Rule-learning Approach [Ghosh’19]
I We attribute large formula size of the MaxSAT instance for the poorscalability
I We propose mini-batch incremental learning
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 11
Incremental learning
Solution Technique
I We propose a mini-batch incremental learning framework with thefollowing objective function on batch t
min∑i ,j
bji · I (bji ) + λ
∑q
ηq.
where indicator function I (·) is defined as follows.
I (bji ) =
{−1 if bji = 1 in the (t − 1)-th batch (t 6= 1)
1 otherwise
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 12
Incremental learning
Continued. . .
(t − 1)-th batchwe learn assignment
I b1 = 0
I b2 = 1
I b3 = 0
I b4 = 1
t-th batchwe construct soft unit clause
I ¬b1
I b2
I ¬b3
I b4
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 13
Experimental Evaluation
Experimental Results
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 14
Experimental Evaluation
Accuracy and training time of different classifiers
Dataset Size n Features m LR SVC RIPPER IMLI
PIMA 768 13475.32 75.32 75.32 73.38(0.3s) (0.37s) (2.58s) (0.74s)
Credit-default 30000 33480.81 80.69 80.97 79.41
(6.87s) (847.93s) (20.37s) (32.58s)
Twitter 49999 105095.67
Timeout95.56 94.69
(3.99s) (98.21s) (59.67s)
Table: Each cell in the last 5 columns refers to test accuracy (%) and trainingtime (s).
IMLI exhibits better training time by costing a little bit of accuracy
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 15
Experimental Evaluation
Size of rules of different rule-based classifiers
Dataset RIPPER IMLI
PIMA 8.25 3.5
Twitter 21.6 6
Credit 14.25 3
Table: Average size of the rules of different rule-based models.
IMLI generates shorter rules compared to other rule-based models
Bishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 16
Conclusion
Conclusion
I Interpretable ML model ensures reliability of prediction models inpractice
I We propose an incremental learning approach of classification rules
I IMLI1 achieves up to three orders of magnitude improvement intraining time by sacrificing a bit of accuracy
I The generated rules appear to be more interpretable
Python library:
$ p i p i n s t a l l r u l e l e a r n i n g
Thank You !!
1Source code: https://github.com/meelgroup/MLICBishwamittra Ghosh Incremental Approach to Interpretable Classification Rule Learning CP 2019 17