Introduction to Data Mining
-
Upload
snoreen -
Category
Technology
-
view
220 -
download
3
description
Transcript of Introduction to Data Mining
EVALUATION AND VISUALIZATION OF EVALUATION AND VISUALIZATION OF DIFFERENT DATA MINING DIFFERENT DATA MINING TECHNIQUESTECHNIQUES
INTRODUCTION TO DATAMINING BY SUMAIRA S.
Data Mining ProcessData Mining Process
INTRODUCTION TO DATAMINING BY SUMAIRA S.
The purpose of this project is to gain an understanding of the process of data mining by
Implementing one or more data mining algorithms Visualizing them Comparing their performance on datasets Another aspect was to provide visual tutorials and detailed help about
these algorithms
INTRODUCTION TO DATAMINING BY SUMAIRA S.
WHAT IS DATA MINING?
Originally developed to act as expert systems to solve problems
Data Mining can be utilized in any organization that needs to find patterns or relationships in their data.
Different types of Data Mining
INTRODUCTION TO DATAMINING BY SUMAIRA S.
BASIC FEATURES OF THE PROJECT Handling different types of data Pre processing of data Algorithms implementation Visualization of data mining model Comparison of different data mining algorithms Help and visual tutorials
INTRODUCTION TO DATAMINING BY SUMAIRA S.
HANDLING DIFFERENT DATA FORMATS
System supports following types of data files Text Data File Handling
CSV (Comma Separated Value) File
Any User Defined Format
Database Data File Handling MS Access Data File
MS SQL Data File
XML Data File Handling XML Data File
INTRODUCTION TO DATAMINING BY SUMAIRA S.
PRE PROCESSING OF DATA
Pre processing of data includes Filling of missing values
Ignore row
INTRODUCTION TO DATAMINING BY SUMAIRA S.
ALGORITHMS’ IMPLEMENTATION
Clustering Partitional Clustering Algorithm
K-Means Algorithm
Hierarchical Clustering Algorithms Single Linkage Algorithm
Weighted Average Algorithm
Complete Linkage Algorithm
INTRODUCTION TO DATAMINING BY SUMAIRA S.
VISUALIZATION OF DATA MINING MODEL
XYScatter Chart Visualization Dendrogram Pie Chart Curve Graph
INTRODUCTION TO DATAMINING BY SUMAIRA S.
COMPARISON OF DIFFERENT DATA MINING ALGORITHMS Data File Comparison
Running time
Memory Usage
CPU Usage
Precision/Recall
INTRODUCTION TO DATAMINING BY SUMAIRA S.
K-MEAN ALGORITHM
K-mean was introduced by MC Queen in 1967
INTRODUCTION TO DATAMINING BY SUMAIRA S.
THE K-MEANS CLUSTERING METHOD
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
K=2
Arbitrarily choose K objects as initial cluster center
Assign each of the objects to most similar center
Update the cluster means
Update the cluster means
reassignreassign
INTRODUCTION TO DATAMINING BY SUMAIRA S.
SINGLE LINKAGE HIERARCHICAL CLUSTERING
1. Say “Every point is its own cluster”
2. Find “most similar” pair of clusters
INTRODUCTION TO DATAMINING BY SUMAIRA S.
SINGLE LINKAGE HIERARCHICAL CLUSTERING
1. Say “Every point is its own cluster”
2. Find “most similar” pair of clusters
3. Merge it into a parent cluster
INTRODUCTION TO DATAMINING BY SUMAIRA S.
SINGLE LINKAGE HIERARCHICAL CLUSTERING
1. Say “Every point is its own cluster”
2. Find “most similar” pair of clusters
3. Merge it into a parent cluster
4. Repeat
INTRODUCTION TO DATAMINING BY SUMAIRA S.
SINGLE LINKAGE HIERARCHICAL CLUSTERING
1. Say “Every point is its own cluster”
2. Find “most similar” pair of clusters
3. Merge it into a parent cluster
4. Repeat
INTRODUCTION TO DATAMINING BY SUMAIRA S.
THANK YOUTHANK YOU
Presentation By:
Sumaira Sohail.
INTRODUCTION TO DATAMINING BY SUMAIRA S.