Examples of Dimensionality Reduction
Transcript of Examples of Dimensionality Reduction
![Page 1: Examples of Dimensionality Reduction](https://reader030.fdocuments.net/reader030/viewer/2022012920/61c74268aee799273208e7c0/html5/thumbnails/1.jpg)
Examples of Dimensionality Reduction
CIS 660 Data Mining
Sunnie Chung
Problem: Curse of Dimensionality
High Dimensionality is a Problem for Machine Learning Algorithms to Classify Data by the
Observed Class Variables (Labels)
![Page 2: Examples of Dimensionality Reduction](https://reader030.fdocuments.net/reader030/viewer/2022012920/61c74268aee799273208e7c0/html5/thumbnails/2.jpg)
How to Deal with High Dimensionality
• Dimensionality Reduction WITHOUT Loss of Information
![Page 3: Examples of Dimensionality Reduction](https://reader030.fdocuments.net/reader030/viewer/2022012920/61c74268aee799273208e7c0/html5/thumbnails/3.jpg)
• Feature Selection
• Feature Extraction
![Page 4: Examples of Dimensionality Reduction](https://reader030.fdocuments.net/reader030/viewer/2022012920/61c74268aee799273208e7c0/html5/thumbnails/4.jpg)
• Identify Highly Positively Correlated Features to Merge/Remove
• Same Information: Correlation of height and urefu (height in Swahili) ~= 1
• Highly Positively Correlated Features X1, X2, X3, X4, X5 to Temperature
![Page 5: Examples of Dimensionality Reduction](https://reader030.fdocuments.net/reader030/viewer/2022012920/61c74268aee799273208e7c0/html5/thumbnails/5.jpg)
• Apply Transformation Algorithms to Reduce/Extract True Dimensions
Only to Reduce Dimensions
![Page 6: Examples of Dimensionality Reduction](https://reader030.fdocuments.net/reader030/viewer/2022012920/61c74268aee799273208e7c0/html5/thumbnails/6.jpg)
• Apply Well Known Data Reduction Methods – PCA or SVD
PCA Procedure
![Page 7: Examples of Dimensionality Reduction](https://reader030.fdocuments.net/reader030/viewer/2022012920/61c74268aee799273208e7c0/html5/thumbnails/7.jpg)
1. Find the eigenvectors of the covariance matrix
2. The eigenvectors define the new space
Project two red points on blue e that preserves greatest variability (range of variance) instead of green e
on which distance of original two red points get the same.
See the rest of the procedure here.
http://eecs.csuohio.edu/~sschung/CIS660/MahalanobisDistance.pdf
https://plot.ly/ipython-notebooks/principal-component-analysis/
https://www.youtube.com/watch?v=IbE0tbjy6JQ&list=PLBv09BD7ez_5_yapAg86Od6JeeypkS4YM
![Page 8: Examples of Dimensionality Reduction](https://reader030.fdocuments.net/reader030/viewer/2022012920/61c74268aee799273208e7c0/html5/thumbnails/8.jpg)
When to Use PCA for Your Data Analytics