PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers

PAC-Bayes Risk Bounds for

Sample-Compressed Gibbs Classifiers

ICML 2005ICML 2005

François Laviolette and Mario MarchandUniversité Laval

PLAN The “traditional” PAC-Bayes theorem

(for the usual data-independent setting )

The “generalized” PAC-Bayes theorem (for the more general sample compression setting)

Implications and follow-ups

A result from folklore :

In particular, for Gibbs classifiers:

What if we choose P after observing the data?

The “traditional” PAC-Bayes Theorem

The Gibbs and the majority vote We have a bound for GQ but we normally use instead the Bayes

classifier BQ (which is the Q-weighted majority vote classifier)

Consequently R(BQ) · 2R(GQ) (can be improved with the “de-randomization” technique of Langford-Shaw-Taylor 2003)

So the PAC-Bayes theorem also gives a bound on the Majority vote classifier.

The sample compression setting Theorem 1 is valid in the usual data-independent

setting where H is defined without reference to the training data

Example: H = the set of all linear classifiers h: Rn!{-1,+1}

In the more general sample compression setting, each classifier is identified by 2 different sources of information:

The compression set: an (ordered) subset of the training set A message string of additional information needed to identify a

classifier

Theorem 1 is not valid in this more general setting

To be more precise: In the sample compression setting, there exists a

“reconstruction” function R that gives a classifier

h = R(, Si)

when given a compression set Si and a message string .

Recall that Si is an ordered subset of the training set S where the order is specified by i=(i1, i2, … , i|i|).

Examples

Set Covering Machines (SCM) [Marchand and Shaw-Taylor JMLR 2002]

Decision List Machines (DLM) [Marchand and Sokolova JMLR 2005]

Support Vector Machines (SVM) Nearest neighbour classifiers (NNC) …

We will thus use priors defined over the set of all the parameters (i,) needed by the reconstruction function R, once a training set S is given.

The priors should be written as:

Priors in the sample compression setting

The priors must be Data-independent

The “generalized” PAC-Bayes Theorem

a (the rescaled ) incorporates Occam’s principle of parsimony

The new PAC-Bayes theorem states that the risk bound for is lower than the risk bound for any .

The PAC-Bayes theorem for bounded compression set size

Conclusion The new PAC-Bayes bound

is valid in the more general sample compression setting.

incorporates automatically the Occam’s principle of parsimony

A sample compressed Gibbs classifier can have a smaller risk bound than any of its member.

The next steps Finding derived bounds for particular sample

compressed classifiers like: majority votes of SCMs and DLMs, SVMs NNCs.

Developing new learning algorithms based on the theoretical information given by the bound.

A tight Risk bound for Majority vote classifiers ?

PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers

Documents

Transcript of PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers

L5: Quadratic classifierscourses.cs.tamu.edu/rgutier/csce666_f13/l5.pdf · L5: Quadratic classifiers • Bayes classifiers for Normally distributed classes –Case 1: Σ =𝜎2𝐼

Non-Bayes classifiers. Linear discriminants, neural networks.

Lecture 6 Classiﬁcation: Naive Bayesbrenocon/inlp2014/lectures/... · 2014-09-29 · Features for model: Bag-of-words DRAFT 5 6.1 • NAIVE BAYES CLASSIFIERS 3 6.1 Naive Bayes Classiﬁers

CS340: Machine Learning Naive Bayes classifiers Kevin Murphy · apply Bayes rule to compute the posterior p(y|x) = ... Naive Bayes assumption ... In matlab, use logsumexp.m. 15. Softmax

Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

Gaussian Naive Bayes and Linear Regressionaritter.github.io/courses/5523_slides/linear_regression.pdf · 2020-06-13 · Naïve Bayes: What you should know • Designing classifiers

L5: Quadratic classifiers - Texas A&M Universityresearch.cs.tamu.edu/prism/lectures/pr/pr_l5.pdfCSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 2 Bayes classifiers for

Winter 2021 Lecture 3: Bayes Classiﬁers · Roy Fox | CS 273A | Winter 2021 | Lecture 3: Bayes Classifiers Naïve Bayes models • We want to predict some value , e.g. auto accident

Ebooksclub.org Approximations of Bayes Classifiers for Statistical Learning of Clusters

MLE’s, Bayesian Classifiers and Naïve Bayes

Optimal Bayes Classifiers for Functional Data and Density Ratiosanson.ucdavis.edu/~mueller/bayes_classification.pdf · 2016. 5. 24. · Optimal Bayes Classifiers for Functional Data

Bayes Classifiers

Bayes Theorem & Naïve Bayes - Penn Engineeringcis521/Lectures/naive-bayes-spam.pdfUsing Naive Bayes Classifiers to Classify Text: Basic method for Multinomial Variables • As a generative

Learning Fair Naive Bayes Classifiers by Discovering and ...

Naïve Bayes for Text Classification - Penn Engineeringcis520/lectures/naive_bayes.pdf · Using Naive Bayes Classifiers to Classify Text: Bag of Words u General model: Features are

On Discriminative vs. Generative Classifiers: A · PDF fileOn Discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes Andrew Y. Ng Computer Science

A gentle introduction to the mathematics of biosurveillance: Bayes Rule and Bayes Classifiers Associate Member The RODS Lab University of Pittburgh Carnegie.

Classiﬁers Based on Bayes Decision Theory 1Chapter_1.pdf · 6 CHAPTER 1 Classiﬁers Based on Bayes Decision Theory 1.4 MINIMUM DISTANCE CLASSIFIERS 1.4.1 The Euclidean Distance

DATA MINING: NAÏVE BAYES - storm.cis.fordham.edugweiss/classes... · Bayes Classifiers 27 That was a visual intuition for a simple case of the Bayes classifier, also called: •

Naïve Bayes Classifiereamonn/CE/Bayesian Classification withInsect... · The Naive Bayes classifiers is often represented as this type of graph… Note the direction of the arrows,