Distributed Authorisation Infrastructures – X.509 Attribute Certificates and SAML
Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute...
Transcript of Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute...
![Page 1: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/1.jpg)
Classification How to predict a discrete variable?
CSE 6242 / CX 4242
Based on Parishit Ram’s slides. Pari now at SkyTree. Graduated from PhD from GT. Also based on Alex Gray’s slides.
![Page 2: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/2.jpg)
Songs Label
Some nights
Skyfall
Comfortably numb
We are young
... ...
... ...
Chopin's 5th ???
How will I rate "Chopin's 5th Symphony"?
![Page 3: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/3.jpg)
What tools do you need for classification? 1. Data S = {(xi, yi)}i = 1,...,n
o xi represents each example with d attributes o yi represents the label of each example
2. Classification model f(a,b,c,....) with some parameters a, b, c,... o a model/function maps examples to labels
3. Loss function L(y, f(x)) o how to penalize mistakes
Classification
![Page 4: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/4.jpg)
Features
Song name Label Artist Length ...
Some nights Fun 4:23 ...
Skyfall Adele 4:00 ...
Comf. numb Pink Fl. 6:13 ...
We are young Fun 3:50 ...
... ... ... ... ...
... ... ... ... ...
Chopin's 5th ?? Chopin 5:32 ...
![Page 5: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/5.jpg)
Training a classifier (building the “model”)
Q: How do you learn appropriate values for parameters a, b, c, ... such that
• (Part I) yi = f(a,b,c,....)(xi), i = 1, ..., n o Low/no error on the training set
• (Part II) y = f(a,b,c,....)(x), for any new x o Low/no error on future queries (songs)
Possible A: Minimize with respect to a, b, c,...
![Page 6: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/6.jpg)
Classification loss function
Most common loss: 0-1 loss function
More general loss functions are defined by a m x m cost matrix C such that
where y = a and f(x) = b
T0 (true class 0), T1 (true class 1) P0 (predicted class 0), P1 (predicted class 1)
Class T0 T1
P0 0 C10
P1 C01 0
![Page 7: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/7.jpg)
k-Nearest-Neighbor Classifier
The classifier: f(x) = majority label of the k nearest
neighbors (NN) of x Model parameters: • number of neighbors k • distance function d(.,.)
![Page 8: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/8.jpg)
k-Nearest-Neighbor Classifier If k and d(.,.) are fixed Things to learn: ? How to learn them: ?
If d(.,.) is fixed, but you can change k Things to learn: ? How to learn them: ?
![Page 9: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/9.jpg)
k-Nearest-Neighbor Classifier If k and d(.,.) are fixed Things to learn: Nothing How to learn them: N/A
If d(.,.) is fixed, but you can change k Things to learn: Nothing How to learn them: N/A Selecting k: Try different values of k on
some hold-out set
![Page 10: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/10.jpg)
![Page 11: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/11.jpg)
Cross-validation
Find the best performing k 1. Hold out a part of the data (this part is called “test
set” or “hold out set”) 2. Train your classifier on the rest of the data (called
training set) 3. Computing test error on the test set (You can also
compute training error on the training set) 4. Do this multiple times, once for each k, and pick the
k with best performance o with respect to the error (on hold-out set)
averaged over all hold-out sets
![Page 12: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/12.jpg)
![Page 13: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/13.jpg)
Cross-validation: Holdout sets Leave-one-out cross-validation (LOO-CV) • hold out sets of size 1 K-fold cross-validation • hold sets of size (n / K) • K = 10 is most common (i.e., 10 fold
CV)
![Page 14: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/14.jpg)
Learning vs. Cross-validation
![Page 15: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/15.jpg)
k-Nearest-Neighbor Classifier If k is fixed, but you can change d(.,.) Things to learn: ? How to learn them: ? Cross-validation: ?
Possible distance functions: • Euclidean distance: • Manhattan distance: • …
![Page 16: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/16.jpg)
k-Nearest-Neighbor Classifier
If k is fixed, but you can change d(.,.) Things to learn: distance function d(.,.) How to learn them: optimization Cross-validation: any regularizer you
have on your distance function
![Page 17: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/17.jpg)
Summary on k-NN classifier • Advantages
o Little learning (unless you are learning the distance functions)
o quite powerful in practice (and has theoretical guarantees as well)
• Caveats o Computationally expensive at test time
Reading material: • ESL book, Chapter 13.3
http://www-stat.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf
• Le Song's slides on kNN classifierhttp://www.cc.gatech.edu/~lsong/teaching/CSE6740/lecture2.pdf
![Page 18: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/18.jpg)
Points about cross-validation Requires extra computation, but gives you
information about expected test error LOO-CV: • Advantages
o Unbiased estimate of test error (especially for small n)
o Low variance
• Caveats o Extremely time consuming
![Page 19: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/19.jpg)
Points about cross-validation K-fold CV: • Advantages
o More efficient than LOO-CV
• Caveats o K needs to be large for low variance o Too small K leads to under-use of data, leading to
higher bias
• Usually accepted value K = 10 Reading material: • ESL book, Chapter 7.10
http://www-stat.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf
• Le Song's slides on CV http://www.cc.gatech.edu/~lsong/teaching/CSE6740/lecture13-cv.pdf
![Page 20: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/20.jpg)
Decision trees (DT)
The classifier: fT(x) is the majority class in the leaf in the tree T
containing x Model parameters: The tree structure and size
![Page 21: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/21.jpg)
Decision trees
Things to learn: ? How to learn them: ? Cross-validation: ?
![Page 22: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/22.jpg)
Decision trees
Things to learn: the tree structure How to learn them: (greedily) minimize
the overall classification loss Cross-validation: finding the best sized
tree with K-fold cross-validation
![Page 23: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/23.jpg)
Learning the tree structure
Pieces: 1. best split on the chosen attribute 2. best attribute to split on 3. when to stop splitting 4. cross-validation
![Page 24: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/24.jpg)
Choosing the split Split types for a selected attribute j: 1. Categorical attribute (e.g. `genre')
x1j = Rock, x2j = Classical, x3j = Pop
2. Ordinal attribute (e.g. `achievement') x1j=Gold, x2j=Platinum, x3j=Silver
3. Continuous attribute (e.g. song length) x1j = 235, x2j = 543, x3j = 378
x1,x2,x3
x1 x2 x3
x1,x2,x3
x1 x2 x3
x1,x2,x3
x1,x3 x2
Split on genre Split on achievement Split on length
Rock Classical Pop Plat. Gold Silver
![Page 25: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/25.jpg)
Choosing the split At a node T for a given attribute d, select a split s as following:
mins loss(TL) + loss(TR) where loss(T) is the loss at node T
Node loss functions: • Total loss: • Cross-entropy: where pcT is the proportion
of class c in node T
![Page 26: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/26.jpg)
Choosing the attribute
Choice of attribute: 1. Attribute providing the maximum
improvement in training loss 2. Attribute with maximum information gain
![Page 27: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/27.jpg)
When to stop splitting? 1. Homogenous node (all points in the
node belong to the same class OR all points in the node have the same attributes)
2. Node size less than some threshold 3. Further splits provide no improvement
in training loss (loss(T) <= loss(TL) + loss(TR))
![Page 28: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/28.jpg)
Controlling tree size
In most cases, you can drive training error to zero (how? is that good?)
What is wrong with really deep trees? • Really high "variance” What can be done to control this? • Regularize the tree complexity
o Penalize complex models and prefers simpler models
Look at Le Song's slides on the decomposition of error in bias and variance of the estimator http://www.cc.gatech.edu/~lsong/teaching/CSE6740/lecture13-cv.pdf
![Page 29: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/29.jpg)
Regularization "Regularized training" minimizes
where M() denotes complexity of a function, and C is called the "regularization parameter"
Cross-validate for C selected from a discrete set {C1,...,Cm}
• Compute CV error for each value of Cj
• Select Cj with lowest CV error
![Page 30: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/30.jpg)
Regularization in DT Cost-complexity pruning: M(fT) = # of leaves in T Let S(T) denote the set of leaves L in the
subtree T. Then the regularized cost of the subtree rooted at node T:
If replace the subtree with T as a leaf
![Page 31: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/31.jpg)
Cross-validation
Cross-validation steps: • For each value in the set {C1,...,CN}
1. Train on the non-holdout set and regularize with Cj
2. Compute error on holdout set 3. Pick Cj with the lowest average error
on the holdout sets 4. Prune the tree on the whole training
set with the chosen Cj
![Page 32: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/32.jpg)
Summary on decision trees • Advantages
o Easy to implement o Interpretable o Very fast test time o Can work seamlessly with mixed attributes o ** Works quite well in practice
• Caveats o Can be too simplistic (but OK if it works) o Training can be very expensive o Cross-validation is hard (node-level CV)
![Page 33: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/33.jpg)
Final words on decision trees Reading material: • ESL book, Chapter 9.2
http://www-stat.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf
• Le Song's slideshttp://www.cc.gatech.edu/~lsong/teaching/CSE6740/lecture6.pdf
![Page 34: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/34.jpg)
Bayes classifier
In a Bayes classifier, f(x) = arg maxy P(Y = y|X = x)
By Bayes' rule P(Y|X) = P(Y) P(X|Y) / P(X)
Classification can be done as f(x) = arg maxy P(Y = y) P(X = x|Y = y)
![Page 35: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/35.jpg)
Bayes classifier f(x) = arg maxy P(Y = y) P(X = x|Y = y) Say you have a tool to learn any
probability P() given some observations:
Things to learn: ? How to learn them: ? Cross-validation: ?
![Page 36: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/36.jpg)
Bayes classifier f(x) = arg maxy P(Y = y) P(X = x|Y = y) Say you have a tool to learn any
probability P() given some observations:
Things to learn: P(Y = y), P(X|Y =y) for every class y
How to learn them: Using the tool Cross-validation: None usually
![Page 37: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/37.jpg)
Estimating the probability P(Y = y) are the “class weights” and can
be approximated from the training set
What about P(X|Y = y) ? • Assume
o Maximum-likelihood
• Estimate P(X|Y = y) with no assumptions o Kernel-density estimation
Generally a hard task if d is large!
![Page 38: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/38.jpg)
Naive-Bayes classifier (NBC)
X is d-dimensional (X1,...,Xd)
How to learn P(X|Y = y) for all classes? The "naive" assumption: P(X|Y) = P(X1|Y)*P(X2|Y)*...*P(Xd|Y)
(Usual) further assumption: P(Xi|Y) is a known type of probability
density/mass function
![Page 39: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/39.jpg)
Commonly chosen function
Things to learn: ? How to learn them: ? Cross-validation: ?
![Page 40: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/40.jpg)
Commonly chosen function
Things to learn: How to learn them: Maximizing the log-
likelihood of the observed data Cross-validation: None
(unless you add some regularization to the log-likelihood to get penalized log-likelihood)
![Page 41: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/41.jpg)
Further simplification of NBC
• Every class has the same variance
• Every dimension has the same variance
• Every class and dimension has the same variance
![Page 42: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/42.jpg)
Final words of NBC
• Advantages o Extremely simple -- efficient training o Not many tuning parameters o ** Works quite well for real datasets o Parallelizable
! Each classes estimation can be done separately
• Caveats o Invalid when the assumptions do not hold
• Reading material o Le Song's slides
http://www.cc.gatech.edu/~lsong/teaching/CSE6740/lecture3.pdf
![Page 43: Classificationpoloclub.gatech.edu/cse6242/2015spring/slides/CSE6242-11... · 2. Ordinal attribute (e.g. `achievement') x 1j=Gold, x 2j=Platinum, x 3j=Silver 3. Continuous attribute](https://reader034.fdocuments.net/reader034/viewer/2022042420/5f376e028e8fc329bf097114/html5/thumbnails/43.jpg)
Method Coding Training time
Cross validation
Testing time
Accuracy
kNN classifier
None Can be slow
Slow ??
Naive Bayes classifier
Fast None Fast ??
Decision trees
Slow Very slow Very fast
??