k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest...
Transcript of k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest...
![Page 1: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/1.jpg)
k-NN (k-Nearest Neighbor) Classifier
Aarti Singh
Machine Learning 10-315Oct 14 , 2019
![Page 2: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/2.jpg)
k-NN classifier
2
Sports
Science
Arts
![Page 3: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/3.jpg)
k-NN classifier
3
Sports
Science
Arts
Test document
![Page 4: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/4.jpg)
k-NN classifier (k=5)
4
Sports
Science
Arts
Test document
What should we predict? … Average? Majority? Why?
![Page 5: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/5.jpg)
k-NN classifier
5
• Optimal Classifier:
• k-NN Classifier:
# total training pts of class y
# training pts of class yamongst k NNs of x
P (x|y)
bPkNN (x|y)
bPkNN (x|y) = kyny
![Page 6: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/6.jpg)
1-Nearest Neighbor (kNN) classifier
Sports
Science
Arts
6
![Page 7: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/7.jpg)
2-Nearest Neighbor (kNN) classifier
Sports
Science
Arts
7
![Page 8: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/8.jpg)
3-Nearest Neighbor (kNN) classifier
Sports
Science
Arts
8
![Page 9: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/9.jpg)
5-Nearest Neighbor (kNN) classifier
Sports
Science
Arts
9
![Page 10: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/10.jpg)
What is the best k?
10
K = 1
VoronoiDiagram
1-NN classifier decision boundary
As k increases, boundary becomes smoother (less jagged).
![Page 11: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/11.jpg)
What is the best k?
11
Approximation vs. Stability (aka Bias vs Variance) Tradeoff
• Larger K => predicted label is more stable • Smaller K => predicted label can approximate best classifier
well
![Page 12: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/12.jpg)
Non-parametric methods
Aka Instance-based/Memory-based learners
Ø Decision Trees
Ø k-Nearest Neighbors
![Page 13: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/13.jpg)
Parametric methods• Assume some model (Gaussian, Bernoulli, Multinomial,
logistic, network of logistic units, Linear, Quadratic) with fixed
number of parameters
– Gaussian Bayes, Naïve Bayes, Logistic Regression, Perceptron
• Estimate parameters (µ,s2,q,w,b) using MLE/MAP and plug in
• Pro – need few data points to learn parameters
• Con – Strong distributional assumptions, not satisfied in
practice
13
![Page 14: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/14.jpg)
Non-Parametric methods• Typically don’t make any distributional assumptions• As we have more data, we should be able to learn more
complex models• Let number of parameters scale with number of training data
• Some nonparametric methods – Decision Trees– k-NN (k-Nearest Neighbor) Classifier
14
![Page 15: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/15.jpg)
Summary• Parametric vs Nonparametric approaches
15
Ø Nonparametric models place very mild assumptions on the data distribution and provide good models for complex dataParametric models rely on very strong (simplistic) distributional assumptions
Ø Nonparametric models requires storing and computing with the entire data set. Parametric models, once fitted, are much more efficient in terms of storage and computation.
![Page 16: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/16.jpg)
Judging Overfitting
![Page 17: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/17.jpg)
No
Football Player
Training data
Wei
ght
Height
• A good machine learning algorithm– Does not overfit training data– Generalizes well to test data
Wei
ght
Height
Training Data vs. Test Data
17
Test data
![Page 18: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/18.jpg)
Training error• Training error of a classifier f
Training Data
• What about test error? Can’t compute it.
• How can we know classifier is not overfitting?Hold-out or Cross-validation
18
1
n
nX
i=1
1f(Xi) 6=Yi {Xi, Yi}ni=1
![Page 19: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/19.jpg)
Hold-out method
Can judge test error by using an independent sample of data.
Hold – out procedure:n data points available
1) Randomly split into two sets (preserving label proportion): Training dataset Validation/Hold-out dataset
often m = n/22) Train classifier on DT. Report error on validation dataset DV.
Overfitting if validation error is much larger than training error
![Page 20: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/20.jpg)
Training vs. Validation Error
Training error is no longer a good indicator of validation or test error
fixed # training data
Training error
Validation error
Model
![Page 21: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/21.jpg)
Hold-out method
Drawbacks:
§ May not have enough data to afford setting one subset aside for getting a sense of generalization abilities
§ Validation error may be misleading (bad estimate of test error) if we get an “unfortunate” split
Limitations of hold-out can be overcome by a family of sub-sampling methods at the expense of more computation.
![Page 22: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/22.jpg)
Cross-validationK-fold cross-validation
Create K-fold partition of the dataset.Do K runs: train using K-1 partitions and calculate validation error on remaining partition (rotating validation partition on each run).
Report average validation error
validation
Run 1
Run 2
Run K
training
![Page 23: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/23.jpg)
Cross-validationLeave-one-out (LOO) cross-validation
Special case of K-fold with K=n partitions Equivalently, train on n-1 samples and validate on only one sample per run for n runs
Run 1
Run 2
Run K
training validation
![Page 24: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/24.jpg)
Cross-validationRandom subsampling
Randomly subsample a fixed fraction αn (0< α <1) of the dataset for validation.Compute validation error with remaining data as training data.Repeat K timesReport average validation error
Run 1
Run 2
Run K
training validation
![Page 25: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/25.jpg)
Practical Issues in Cross-validationHow to decide the values for K and α ?§ Large K
+ Validation error can approximate test error well- Observed validation error will be unstable (few validation pts)- The computational time will be very large as well (many experiments)
§ Small K+ The # experiments and, therefore, computation time are reduced+ Observed validation error will be stable (many validation pts)- Validation error cannot approximate test error well
Common choice: K = 10, a = 0.1 J
![Page 26: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/26.jpg)
Model selection
26
![Page 27: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/27.jpg)
Effect of Model Complexity
27Can we select good models using hold-out or cross-validation?
Training error
Validation error
Model
fixed # training data
![Page 28: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/28.jpg)
Examples of Model SpacesModel Spaces with increasing complexity:
• Nearest-Neighbor classifiers with increasing neighborhood sizes k = 1,2,3,…
Small neighborhood => Higher complexity
• Decision Trees with increasing depth k or with k leavesHigher depth/ More # leaves => Higher complexity
• Neural Networks with increasing layers or nodes per layerMore layers/Nodes per layer => Higher complexity
• MAP estimates with stronger priors (larger hyper-parameters βH, βT for Beta distribution or smaller variance for Gaussian prior)
How can we select the right complexity model ?
![Page 29: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/29.jpg)
• Train models of different complexities and evaluate their validation error using hold-out or cross-validation
• Pick model with smallest validation error (averaged over different runs for cross-validation)
29
Model selection using Hold-out/Cross-validation
![Page 30: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/30.jpg)
load ionosphere% UCI dataset% 34 features, 351 samples% binary classificationrng(100)
%Defaulty MinLeafSize = 1tc = fitctree(X,Y);cvmodel = crossval(tc);view(cvmodel.Trained{1},'Mode','graph')kfoldLoss(cvmodel) 30
Validation error = 0.1254
![Page 31: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/31.jpg)
load ionosphere
% UCI dataset
% 34 features, 351 samples
% binary classification
rng(100)
%Defaulty MinLeafSize = 1
tc = fitctree(X,Y, 'MinLeafSize’,2);
cvmodel = crossval(tc);
view(cvmodel.Trained{1},'Mode','graph')
kfoldLoss(cvmodel)31
Validation error = 0.1168
![Page 32: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/32.jpg)
load ionosphere% UCI dataset% 34 features, 351 samples% binary classificationrng(100)
%Defaulty MinLeafSize = 1tc = fitctree(X,Y, 'MinLeafSize',10);cvmodel = crossval(tc);view(cvmodel.Trained{1},'Mode','graph')kfoldLoss(cvmodel) 32
Validation error = 0.1339
![Page 33: k-NN (k-Nearest Neighbor) Classifieraarti/Class/10315_Fall19/lecs/Lecture14.pdf · k-NN (k-Nearest Neighbor) Classifier Aarti Singh Machine Learning 10-315 Oct14 , 2019](https://reader034.fdocuments.net/reader034/viewer/2022052007/601b482c7e1f842f9f4c1df5/html5/thumbnails/33.jpg)
33
fixed # training data
Validation error
MinLeafSize 1MinLeafSize 2MinLeafSize 10
0.1254
0.1168
0.1339
Training error