Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

34
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

description

Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU. What is Data Mining??. Extraction of knowledge from data exploration and analysis of large quantities of data to discover meaningful pattern from data. Discover Knowledge. Why datamining. Datamining is used in: - PowerPoint PPT Presentation

Transcript of Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Page 1: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Data MiningKnowledge on rough set theory

SUSHIL KUMAR SAHU

Page 2: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

What is Data Mining??

Extraction of knowledge from data

exploration and analysis of large quantities of data to discover meaningful pattern from data.

Discover Knowledge

Page 3: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Why datamining

Datamining is used in: pattern matching and restore the original

picture from a noisy one. Medical Business etc What datamining do: Finds relationship and make prediction.

Page 4: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Types of data mining Relational data mining: It is the data mining

technique for relational databases. Unlike traditional data mining algorithms, which look for patterns in a single table , relational data mining algorithms look for patterns among multiple tables (relational patterns).

Web mining: - is the application of data mining techniques to discover patterns from the Web.

Page 5: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Software Mining and Data Mining: Instead of mining individual data sets,

software mining focuses on metadata, such as database schemas. Knowledge Discovery from software systems addresses structure, behavior as well as the data processed by the software system.

Page 6: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

OLAP

OLAP deals with tools and technique for data analysis that can give nearly instantaneous answer to queries.

OLAP use multidimensional array that allow user to analyze the data.

Datamining server must be integrated with data warehouse and OLAP server.

Page 7: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Data Mining : Motivation

Huge amounts of dataImportant need for turning data into useful informationFast growing amount of data, collected and stored in large and numerous databases exceeded the human ability for comprehension without powerful tools

Page 8: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Data Mining Techniques

Decision TreesNeural NetworkGenetic AlgorithmsFuzzy Set TheoryRough Set Theory

Page 9: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

DATA MINING TECHNIQUES

Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure.

Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset.

Page 10: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Genetic algorithms: Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of evolution.

Page 11: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

THE ROUGH SET THEORY

One of the new data mining theories is the rough set theory that can be used for

(1) Reduction of data sets (2) Finding hidden data patterns

(3) Generation of decision rules

Page 12: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

What is rough set

A rough set is a formal approximation of a crisp set in terms of a pair of sets which give the lower and the upper approximation of the original set.

The tuple composed of the lower and upper approximation is called a rough set.The accuracy is perfect if αP(X) = 1

Page 13: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Reduct and Core

Reduct is a subset of attributes which by itself can fully characterize the knowledge in the database.

The set of attributes which is common to all reducts is called the core.

Page 14: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Data mining processStage-1 Stage-2

Stage-3

Stage-4

RAW DATA

K-MEANS ALGORITHM

SYMBOLIC RULES

QUICK REDUCT

Page 15: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Data preparation : Here data are prepared from the database warehouse. Data is stored using MATLAB.

K-means algorithm: Data attribute obtained from stage 1 is partitioned into k clusters where each cluster comprises with data-vectors with similar inherent characteristics

Page 16: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

The K-Means Algorithm Process:

The dataset is partitioned into K clusters and the data points are randomly assigned to the clusters resulting in clusters that have roughly the same number of data points .

For each data point, calculate the distance from the data point to each cluster.

If the data point is closest to its own cluster leave it where it is. If the data point is not closest to its own cluster, move it into the closest cluster.

Repeat the above step until a complete pass through all the data points results in no data point moving from one cluster to another. At this point the clusters are stable and the clustering process ends.

Page 17: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Quick-reduct algorithm: Quick-reduct algorithm is used to compute a minimal

reduct without exhaustively generating all possible subsets.

The reduction of attribute is achieved by comparing equivalence relations generated by set of attributes so that the reduced set provides the same predictive capability of the decision feature as the original.

Page 18: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

QUICKREDUCT(C,D)

C ->set of all conditional features; D -> set of decision features. (a) R ← {} (b) Do (c) T ← R (d) x (C-R) ∀ ∈ (e) if γ R {x}(D) > γT(D)∪ where γR(D)=card(POSR(D)) / card(U) (f) T ← R {x}∪ (g) R ← T (h) until γR(D) = = γC(D) (i) return R  

Page 19: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Rule extraction:

It uses the following Heuristic Approach– Merge identical rows that has similar condition and

decision attribute– Compute the core of every row– Merge duplicate rows and compose a table with

reduct value

Page 20: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

EXAMPLE

Substitute LOW=1, MEDIUM=2, HIGH=3, COM=1 and SUB=2. Applying K-Means clustering algorithm with K=2. The clustered rows are {1, 3, 5, 6} and {2, 4, 7, 8}. Then the above table is reconstructed using the clustered rows as the decision value, presented in Table 1.

 

Object Weight Door Size Cylinder1 Low 2 Com 42 Low 4 Sub 63 Medium 4 Cum 4

4 High 2 Cum 65 High 4 Cum 46 Low 4 Cum 47 High 4 Sub 68 Low 2 Sub 6

Page 21: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Table-2 Data set after K-means clustering

Applying the Quickreduct algorithm in Table 2, the final reduct attributes {WEIGHT, DOOR, SIZE} is obtained. Hence, Table 2 can be reduced into Table 3 using the attribute reduct {WEIGHT, DOOR, SIZE}.

Object Weight Door Syze Cylinder

Mileage1 1 2 1 4 1

2 1 4 2 6 2

3 2 4 1 4 1

4 3 2 1 6 2

5 3 4 1 4 1

6 1 4 1 4 1

7 3 4 2 6 2

8 1 2 2 6 2

Page 22: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Table-3 Attribute Reduction

Object Weight Door Size Mileage1 1 2 1 1

2 1 4 2 2

3 2 4 1 1

4 3 2 1 2

5 3 4 1 1

6 1 4 1 1

7 3 4 2 2

8 1 2 2 2

Page 23: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Rule extraction

Merge identical objects of Table 3. otherwise compute the core of every object in Table 3 and present it as in Table -4.

Object Weight Door Size Mileage 1 1 * 1 1

2 1 * 2 2

3 * 4 1 1

4 3 * * 2

5 * 4 1 1

6 1 * 1 1

7 3 * * 2

8 1 * 2 2

Page 24: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Merge duplicate objects with same decision value and compose a table with the reduct value. That is, the merged rows are {1, 6},{2, 8}, {3,5}and{4, 7}.

Merged table

Object Weight Door Size Mileage1 1 * 1 1

2 1 * 2 2

3 * 4 1 1

4 3 * * 2

Page 25: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

The decision obtained from the above example

Decision rules are often presented as implications and are often called “if….then…” rules. We can express the rules as follows:

If SIZE = 1 THEN MILEAGE = 1 If SIZE = 2 THEN MILEAGE = 2 If DOOR = 4 and SIZE = 1 THEN MILEAGE = 1 If WEIGHT = 3 THEN MILEAGE = 2

Page 26: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Classification of Data Mining Systems

Techniques used

DB oriented techniquesStatisticMachine learningPattern recognitionNeural NetworkRough Set etc

Application adapted

FinanceMarketingMedicalStockTelecommunication, etc

Page 27: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Kinds of DB

RelationalData warehouseTransactional DBAdvanced DB systemFlat filesWWW

Kinds of Knowledge

ClassificationAssociationClusteringPrediction……

Classification of Data Mining Systems

Page 28: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Data Mining as a Step of KDD

Patterns

DataWarehouse

Databases Flat files

Selection and Transformation

Data Mining

Evaluation & Presentation

Cleaning and Intergration

Knowledge

Page 29: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

WHY MATLAB FOR DATA MINING?

As a programming language, MATLAB is very like other procedural languages such as Fortran or C.

Graphing capability in MATLAB is among the best in the business, and all MATLAB graphs are compeltely configurable through software.

Page 30: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Data Mining : Problems and Challenges

Noisy data

Difficult Training

Set

Dynamic Databases

Large Databases

Incomplete Data

Page 31: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Performance Issues

Cost of the Learning

Set

Time and Memory Constraint

Predictive Ability

Page 32: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

Conclusion

Data Mining is an analytic process designed to explore data in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data.

The ultimate goal of data mining is prediction.Application of rough set theory in data mining is used for

time sequence analysis of electrical signal. It is also used in medical diagnosis.

It is very effective due to its less time complexity, less cost , accuracy, cost of learning is less.

Page 33: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

References www.google.com www.icgst.com http://en.wikipedia.org/wiki/Rough_set  http://en.wikipedia.org/wiki/Concept_mining www.ieee.com www.kurth.com www.gosephtechnology.com

Page 34: Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

THANKS!!!

QUESTIONS??