Dimension reduction(jiten01)
-
Upload
jiten-dhimmar -
Category
Education
-
view
132 -
download
0
description
Transcript of Dimension reduction(jiten01)
Dimension Reduction Technique to Handle Multi-Dimensional Data
1
Prepared by: JITEN DHIMMAR
Roll no: 130870702501
College: Parul Institute of Technology
Mining methodology and user interaction issues
Performance IssuesIssue related to the diversity
of database types
Data Reduction
Major Issues in Data Mining
Modeling issues and Difficulties
2
Data Reduction Techniques
Data cubeaggregation
Dimensionality Reduction
Data CompressionNumerosity Reduction
Discretization and Concept Hierarchy
Generation
3
Different Images 4
Figure-1.1
Representation of Images in Dimension Cell 5
Figure-1.2
Representation of Images in Three Dimensional Cube 6
Figure-1.3
sale prodId storeId amtp1 s1 12p2 s1 11p1 s3 50p2 s2 8
s1 s2 s3p1 12 50p2 11 8
Fact table view: Multi-dimensional cube:
dimensions = 2
Representation of Fact Table to Multi-dimensional Cube 7
Table-1.1Table-1.2
dimensions = 3
Multi-dimensional cube:Fact table view:
sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4
day 2s1 s2 s3
p1 44 4p2
s1 s2 s3p1 12 50p2 11 8
day 1
3-D Cube 8
Table-2.1 Table-2.2
day 2 s1 s2 s3p1 44 4p2 s1 s2 s3
p1 12 50p2 11 8
day 1
s1 s2 s3p1 56 4 50p2 11 8
s1 s2 s3sum 67 12 50
sump1 110p2 19
129
drill-down
rollup
Example: computing sums
Rollup and Drill-down Operation 9
Table-2.2
Table-2.3
Table-2.4
Table-2.5
s1 s2 s3 *p1 56 4 50 110p2 11 8 19* 67 12 50 129day 2 s1 s2 s3 *
p1 44 4 48p2* 44 4 48s1 s2 s3 *
p1 12 50 62p2 11 8 19* 23 8 50 81
day 1
*
Extended Cube 10
Table-3.1
Dimension reduction is a process which removes these attributes which results in smaller dataset size.
This help in reducing the amount of time and memory required by the data mining techniques.
Visualization of data becomes easy. It also helps in eliminating inappropriate features or reducing noise.Reduce time and space required in data mining.
Dimensionality Reduction 11
Data Reduction Strategies
Why data reduction? — A database/data warehouse may store terabytes of data. Complex data analysis may take a very long time to run on the complete data set.
Data reduction strategiesDimensionality reduction, e.g., remove unimportant attributesWavelet transformsPrincipal Components Analysis (PCA)Attribute subset selection
12
Attribute Subset Selection
Attribute subset selection of finding a minimum set of attributes, which would result in probability distribution of data classes would be as close to the original distribution obtained using all attributes.
Redundant attributes Duplicate much or all of the information contained in one or more other
attributesE.g., purchase price of a product and the amount of sales tax paid
Irrelevant attributesContain no information that is useful for the data mining task at handE.g., students' ID is often irrelevant to the task of predicting students' CGPA
13
Heuristic Search in Attribute Selection
Typical heuristic attribute selection methods:Best step-wise feature selection:The best single-attribute is picked firstThen next best attribute condition to the first, ...
Step-wise attribute elimination:Repeatedly eliminate the worst attribute
Best combined attribute selection and elimination
• Create new attributes (features) that can capture the important information in a data set more effectively than the original ones
14
15