Introduction to Data Mining

17
EVALUATION AND VISUALIZATION OF EVALUATION AND VISUALIZATION OF DIFFERENT DATA MINING TECHNIQUES DIFFERENT DATA MINING TECHNIQUES INTRODUCTION TO DATAMINING BY SUMAIRA S.

description

An introduction to data mining process and three algorithms implementation.

Transcript of Introduction to Data Mining

Page 1: Introduction to Data Mining

EVALUATION AND VISUALIZATION OF EVALUATION AND VISUALIZATION OF DIFFERENT DATA MINING DIFFERENT DATA MINING TECHNIQUESTECHNIQUES

INTRODUCTION TO DATAMINING BY SUMAIRA S.

Page 2: Introduction to Data Mining

Data Mining ProcessData Mining Process

INTRODUCTION TO DATAMINING BY SUMAIRA S.

Page 3: Introduction to Data Mining

The purpose of this project is to gain an understanding of the process of data mining by

Implementing one or more data mining algorithms Visualizing them Comparing their performance on datasets Another aspect was to provide visual tutorials and detailed help about

these algorithms

INTRODUCTION TO DATAMINING BY SUMAIRA S.

Page 4: Introduction to Data Mining

WHAT IS DATA MINING?

Originally developed to act as expert systems to solve problems

Data Mining can be utilized in any organization that needs to find patterns or relationships in their data.

Different types of Data Mining

INTRODUCTION TO DATAMINING BY SUMAIRA S.

Page 5: Introduction to Data Mining

BASIC FEATURES OF THE PROJECT Handling different types of data Pre processing of data Algorithms implementation Visualization of data mining model Comparison of different data mining algorithms Help and visual tutorials

INTRODUCTION TO DATAMINING BY SUMAIRA S.

Page 6: Introduction to Data Mining

HANDLING DIFFERENT DATA FORMATS

System supports following types of data files Text Data File Handling

CSV (Comma Separated Value) File

Any User Defined Format

Database Data File Handling MS Access Data File

MS SQL Data File

XML Data File Handling XML Data File

INTRODUCTION TO DATAMINING BY SUMAIRA S.

Page 7: Introduction to Data Mining

PRE PROCESSING OF DATA

Pre processing of data includes Filling of missing values

Ignore row

INTRODUCTION TO DATAMINING BY SUMAIRA S.

Page 8: Introduction to Data Mining

ALGORITHMS’ IMPLEMENTATION

Clustering Partitional Clustering Algorithm

K-Means Algorithm

Hierarchical Clustering Algorithms Single Linkage Algorithm

Weighted Average Algorithm

Complete Linkage Algorithm

INTRODUCTION TO DATAMINING BY SUMAIRA S.

Page 9: Introduction to Data Mining

VISUALIZATION OF DATA MINING MODEL

XYScatter Chart Visualization Dendrogram Pie Chart Curve Graph

INTRODUCTION TO DATAMINING BY SUMAIRA S.

Page 10: Introduction to Data Mining

COMPARISON OF DIFFERENT DATA MINING ALGORITHMS Data File Comparison

Running time

Memory Usage

CPU Usage

Precision/Recall

INTRODUCTION TO DATAMINING BY SUMAIRA S.

Page 11: Introduction to Data Mining

K-MEAN ALGORITHM

K-mean was introduced by MC Queen in 1967

INTRODUCTION TO DATAMINING BY SUMAIRA S.

Page 12: Introduction to Data Mining

THE K-MEANS CLUSTERING METHOD

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

K=2

Arbitrarily choose K objects as initial cluster center

Assign each of the objects to most similar center

Update the cluster means

Update the cluster means

reassignreassign

INTRODUCTION TO DATAMINING BY SUMAIRA S.

Page 13: Introduction to Data Mining

SINGLE LINKAGE HIERARCHICAL CLUSTERING

1. Say “Every point is its own cluster”

2. Find “most similar” pair of clusters

INTRODUCTION TO DATAMINING BY SUMAIRA S.

Page 14: Introduction to Data Mining

SINGLE LINKAGE HIERARCHICAL CLUSTERING

1. Say “Every point is its own cluster”

2. Find “most similar” pair of clusters

3. Merge it into a parent cluster

INTRODUCTION TO DATAMINING BY SUMAIRA S.

Page 15: Introduction to Data Mining

SINGLE LINKAGE HIERARCHICAL CLUSTERING

1. Say “Every point is its own cluster”

2. Find “most similar” pair of clusters

3. Merge it into a parent cluster

4. Repeat

INTRODUCTION TO DATAMINING BY SUMAIRA S.

Page 16: Introduction to Data Mining

SINGLE LINKAGE HIERARCHICAL CLUSTERING

1. Say “Every point is its own cluster”

2. Find “most similar” pair of clusters

3. Merge it into a parent cluster

4. Repeat

INTRODUCTION TO DATAMINING BY SUMAIRA S.

Page 17: Introduction to Data Mining

THANK YOUTHANK YOU

Presentation By:

Sumaira Sohail.

INTRODUCTION TO DATAMINING BY SUMAIRA S.