CS 59000 Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept. 25 2008.

31
CS 59000 Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept. 25 2008

Transcript of CS 59000 Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept. 25 2008.

CS 59000 Statistical Machine learningLecture 10

Yuan (Alan) QiPurdue CS

Sept. 25 2008

Outline

• Review of Fisher’s linear discriminant, percepton, probabilistic generative models,

• Probabilistic discriminative models: Logistic regressionProbit regression

Fisher Linear Discriminant

Within Class and Between Class Scatter Matrices

Generative eigenvalue problem

Fisher’s Linear Discriminant

Perceptron

Generalized Linear Model

Minimize

where M denotes the set of all misclassified patterns

Stochastic Gradient Descent

Probabilistic Generative Models

Gaussian Class-Conditional DensitiesConditional densities of data:

The posterior distribution for label/class:

Maximum Likelihood Estimation

Linked to Fisher’s linear discriminant

Discrete features

Naïve Bayes classification:

Probabilistic Discriminative Models

Instead of modeling

Model directly

Generative vs Condition Models

Discussion

Logistic Regression

Let

Likelihood function

Maximum Likelihood Estimation

Note that

Newton-Raphson Optimization for Linear Regression

Let H denote Hessian matrix

It converges in one iteration for linear regression.

Newton-Raphson Optimization for Logistic Regression

Gradient and Hessian of the error function:

Newton-Raphson Optimization for Logistic Regression

Iterative reweighted least squares (IRLS):Solving a series of weighted least-square

problems

Other discriminative models

Generative models <-> Logistic regression

How about other discriminative functions?

Probit Regression

Probit function:

Labeling Noise Model

Robust to outliers and labeling errors

Generalized Linear Models

Generalized Linear Models

Generalized linear model: Activation function: Link function:

Canonical Link Function

If we choose the canonical link function:

Gradient of the error function:

Laplace Approximation for Posterior

Gaussian approximation around mode:

Illustration of Laplace Approximation

Evidence Approximation

Bayesian Information Criterion

Approximation of Laplace approximation:

More accurate evidence approximation needed

Bayesian Logistic Regression