Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin...

25
Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory The Chinese University of Hong Kong Shatin, NT. Hong Kong {kzhuang, king, lyu}@cse.cuhk.edu.hk SMC2002, October 8, 2002 Hammamet, Tunisia
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    1

Transcript of Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin...

Page 1: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier

Kaizhu Huang, Irwin King, Michael R. Lyu

Multimedia Information Processing Laboratory

The Chinese University of Hong KongShatin, NT. Hong Kong

{kzhuang, king, lyu}@cse.cuhk.edu.hk

SMC2002, October 8, 2002Hammamet, Tunisia

Page 2: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

2

Outline

Abstract Background

Classifiers Naïve Bayesian Classifiers Semi-Naïve Bayesian Classifiers Chow-Liu Tree

Bounded Semi-Naïve Bayesian Classifiers Experimental Results Discussion Conclusion

Page 3: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

3

Abstract

Propose a technique for constructing semi-naïve Bayesian classifiers. It is bounded by the number of variables that can be

combined into a node. It has a less computational cost than the traditional

semi-naïve Bayesian networks. Experiments show the proposed technique is more

accurate.

Page 4: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

4

A Typical Classification Problem

Given a set of symptoms, one wants to find out whether these symptoms give rise to a particular disease.

Page 5: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

5

Classifiers Given a pre-classified dataset D,

where is the training data

in m-dimension real space, is the class label.

A classifier is defined as a mapping function:

to satisfy .

Background

Page 6: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

6

Probabilistic Classifiers The classification mapping function is defined as:

The joint probability is not easily estimated from the dataset; however, the assumption about the distribution has to be made, e.g., dependent or independent?

a constant for a given x

Background

Page 7: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

7

Naïve Bayesian Classifiers (NB) Assumption: Given the class label C, the attributes

are independent: Classification mapping function

Related Work

Page 8: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

8

Related Work

Naïve Bayesian Classifiers NB’s performance is comparable with some state-

of-the-art classifiers even when its independency assumption does not hold in normal cases.

Question: Question: Can the performance be better when the conditional Can the performance be better when the conditional

independency assumption of NB is independency assumption of NB is relaxedrelaxed??

Page 9: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

9

Semi-Naïve Bayesian Classifiers(SNB) A looser assumption than NB. Independency occurs among the jointed variables,

given the class label C.

Related Work

Page 10: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

10

A tree dependence structure

Related Work

Chow-Liu Tree (CLT) Another looser assumption than NB. A dependence tree exists among the variables,

given the class variable C.

Page 11: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

11

A conditional tree

dependency assumption

among variables

A conditional independency

assumption among jointed

variables

Chow & Liu68 developed a

global optimal and polynomial

time cost algorithm

Traditional SNBs are not

well developed like CLT

Summary of Related Work

Page 12: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

12

Kononenko91 Pazzani96

Local heuristicLocal heuristic

Efficient?

Accurate?

NoInefficient even in

jointing 3 variables

No

Exponential time cost

Problems of Traditional SNBs

Page 13: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

13

Our Novel Bounded Semi-Naïve Bayesian Network

Accurate? We use a global combinatorial optimization method.

Efficient? We find the network based on Linear Programming,

which can be solved in polynomial time.

Page 14: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

14

Jointed variables

Completely covering the variable set without overlapping

Conditional independency

Bounded

Bounded Semi-Naïve Bayesian Network Model Definition

Page 15: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

15

Large search space

Reduced by adding the constraint as follows: The cardinality of each jointed variable is exactly equal to K

Hidden principle: When K is small, a K cardinality of jointed variables will be more accurate than

separating them into several jointed variables. Example: P(a,b) P(c,d) is more close to P(a,b,c,d) than P(a,b)P(c)P(d).

Search space after reduction:

Constraining the Search Space

Page 16: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

16

How to search for the appropriate model? Finding the m= [n/K ] K-cardinality subsets (jointed variables)

from variables (features) set which satisfy the SNB conditions to maximize the Log likelihood.

[x] means rounding the x to the nearest integer

Searching K-Bounded-SNB Model

Page 17: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

17

Relax the previous constraints into 0x1--an integer programming

(IP) problem is changed into a linear programming (LP)

problem

Relax the previous constraints into 0x1--an integer programming

(IP) problem is changed into a linear programming (LP)

problem

No coverage among jointed

variables

All the jointed variables forms the variable set

Rounding Scheme:Rounding LP solution into an IP

Solution.

Rounding Scheme:Rounding LP solution into an IP

Solution.

Global Optimization Procedure

Page 18: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

18

Rounding Scheme

Page 19: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

19

Experimental Setup

Datasets 6 benchmark datasets from UCI machine learning repository 1 synthetically generated dataset named “XOR”

Experimental Environments Platform:Windows 2000 Developing tool: Matlab 6.1

Page 20: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

20

Overall Prediction Rate(%)

• We set the bound parameter K to 2 and 3.• 2-BSNB means the BSNB model for bounded parameter set to 2.

Experimental Results

Page 21: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

21

Experimental Results

0

0.05

0.1

0.15

0.2

0.25

0.3

1

Erro

r rat

e NB

CLT

2-BSNB

3-BSNB

Average Error Rate Chart

Page 22: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

22

1 2 3

4 5 6

7 8 9

Results on Tic-Tac-Toe Dataset

9 attributes for Tic-Tac-Toe dataset

Page 23: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

23

Observations

Large K B-SNBs are not good for sparse datasets. Post dataset: 90 samples; K=3, the accuracy

decreases.

Which value for K is good depends on the properties of the datasets. For example, Tic-Tac-Toe, Vehicle: 3-variable bias;

K=3, the accuracy increases.

Page 24: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

24

Discussion

When n cannot be divided by K exactly (n mod K)=l, l0, The assumption that all the joined variable has the

same cardinality K will be violated.Solution:

Find an l-cardinality jointed variable with the minimum entropy Do the optimization on the other n-l variables since (n-l mod K) will be 0.

How to choose K ? When the sample number of the dataset is small, a large K may not

get a good performance. A good K should be related to the nature of the datasets.

How to relax SNB? SNB is still strongly constrained. Upgrading into a mixture of SNBs.

Page 25: Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab

25

Conclusion

A novel Bounded Semi-Naïve Bayesian classifier is proposed. Direct combinatorial optimization method enables

B-SNB to have global optimization.

The transformation from IP into a LP problem reduces the computational complexity into a polynomial one.

It outperforms NB and CLT in our experiments.