Examples of Dimensionality Reduction

Examples of Dimensionality Reduction

CIS 660 Data Mining

Sunnie Chung

Problem: Curse of Dimensionality

High Dimensionality is a Problem for Machine Learning Algorithms to Classify Data by the

Observed Class Variables (Labels)

How to Deal with High Dimensionality

• Dimensionality Reduction WITHOUT Loss of Information

• Feature Selection

• Feature Extraction

• Identify Highly Positively Correlated Features to Merge/Remove

• Same Information: Correlation of height and urefu (height in Swahili) ~= 1

• Highly Positively Correlated Features X1, X2, X3, X4, X5 to Temperature

• Apply Transformation Algorithms to Reduce/Extract True Dimensions

Only to Reduce Dimensions

• Apply Well Known Data Reduction Methods – PCA or SVD

PCA Procedure

1. Find the eigenvectors of the covariance matrix

2. The eigenvectors define the new space

Project two red points on blue e that preserves greatest variability (range of variance) instead of green e

on which distance of original two red points get the same.

See the rest of the procedure here.

http://eecs.csuohio.edu/~sschung/CIS660/MahalanobisDistance.pdf

https://plot.ly/ipython-notebooks/principal-component-analysis/

https://www.youtube.com/watch?v=IbE0tbjy6JQ&list=PLBv09BD7ez_5_yapAg86Od6JeeypkS4YM

When to Use PCA for Your Data Analytics

Examples of Dimensionality Reduction

Documents

Transcript of Examples of Dimensionality Reduction