Dimension reduction(jiten01)

Post on 25-May-2015

132 views 0 download

Tags:

description

dimension reduction technique to handle multi-dimensional data

Transcript of Dimension reduction(jiten01)

Dimension Reduction Technique to Handle Multi-Dimensional Data

1

Prepared by: JITEN DHIMMAR

Roll no: 130870702501

College: Parul Institute of Technology

Mining methodology and user interaction issues

Performance IssuesIssue related to the diversity

of database types

Data Reduction

Major Issues in Data Mining

Modeling issues and Difficulties

2

Data Reduction Techniques

Data cubeaggregation

Dimensionality Reduction

Data CompressionNumerosity Reduction

Discretization and Concept Hierarchy

Generation

3

Different Images 4

Figure-1.1

Representation of Images in Dimension Cell 5

Figure-1.2

Representation of Images in Three Dimensional Cube 6

Figure-1.3

sale prodId storeId amtp1 s1 12p2 s1 11p1 s3 50p2 s2 8

s1 s2 s3p1 12 50p2 11 8

Fact table view: Multi-dimensional cube:

dimensions = 2

Representation of Fact Table to Multi-dimensional Cube 7

Table-1.1Table-1.2

dimensions = 3

Multi-dimensional cube:Fact table view:

sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4

day 2s1 s2 s3

p1 44 4p2

s1 s2 s3p1 12 50p2 11 8

day 1

3-D Cube 8

Table-2.1 Table-2.2

day 2 s1 s2 s3p1 44 4p2 s1 s2 s3

p1 12 50p2 11 8

day 1

s1 s2 s3p1 56 4 50p2 11 8

s1 s2 s3sum 67 12 50

sump1 110p2 19

129

drill-down

rollup

Example: computing sums

Rollup and Drill-down Operation 9

Table-2.2

Table-2.3

Table-2.4

Table-2.5

s1 s2 s3 *p1 56 4 50 110p2 11 8 19* 67 12 50 129day 2 s1 s2 s3 *

p1 44 4 48p2* 44 4 48s1 s2 s3 *

p1 12 50 62p2 11 8 19* 23 8 50 81

day 1

*

Extended Cube 10

Table-3.1

Dimension reduction is a process which removes these attributes which results in smaller dataset size.

This help in reducing the amount of time and memory required by the data mining techniques.

Visualization of data becomes easy. It also helps in eliminating inappropriate features or reducing noise.Reduce time and space required in data mining.

Dimensionality Reduction 11

Data Reduction Strategies

Why data reduction? — A database/data warehouse may store terabytes of data. Complex data analysis may take a very long time to run on the complete data set.

Data reduction strategiesDimensionality reduction, e.g., remove unimportant attributesWavelet transformsPrincipal Components Analysis (PCA)Attribute subset selection

12

Attribute Subset Selection

Attribute subset selection of finding a minimum set of attributes, which would result in probability distribution of data classes would be as close to the original distribution obtained using all attributes.

Redundant attributes Duplicate much or all of the information contained in one or more other

attributesE.g., purchase price of a product and the amount of sales tax paid

Irrelevant attributesContain no information that is useful for the data mining task at handE.g., students' ID is often irrelevant to the task of predicting students' CGPA

13

Heuristic Search in Attribute Selection

Typical heuristic attribute selection methods:Best step-wise feature selection:The best single-attribute is picked firstThen next best attribute condition to the first, ...

Step-wise attribute elimination:Repeatedly eliminate the worst attribute

Best combined attribute selection and elimination

• Create new attributes (features) that can capture the important information in a data set more effectively than the original ones

14

15