Data Warehouse and Data Mining

22
Outline Data Mining Data Warehousing

Transcript of Data Warehouse and Data Mining

Outline Data Mining Data Warehousing

Historical Perspective

1960s: Data collection, database creation, IMS

and network DBMS 1970s:

Relational data model, relational DBMS implementation

1980s: RDBMS, advanced data models (extended-

relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientif ic, engineering, etc.)

1990s—2000s: Data mining and data warehousing,

mult imedia databases, and Web databases

Data Mining

Definition

Data mining automates the process of locating and extracting the hidden patterns and knowledge

In simple words Searching for new knowledge

Why we need data mining

Data explosion problem

Automated data collection tools and mature database technology

lead to tremendous amounts of data stored in databases, data

warehouses and other information repositories

We are drowning in data, but starving for knowledge!

Solution: Data mining

Data warehousing and on-line analytical processing

Extraction of interesting knowledge (rules, regularities, patterns,

constraints) from data in large databases

Data Mining Models

Predictive Model

Descriptive Model

Predictive Model

Prediction determining how certain attributes will behave in the future

Regression mapping of data item to real valued prediction variable

Classification categorization of data based on combinations of attributes

Time Series analysis examining values of attributes with respect to time

Descriptive Model

Clustering most closely data clubbed together into clusters

Data Summarization extracting representative information about database

Association Rules associativity defined between data items to form relationship

Sequence Discovery it is used to determine sequential patterns in data based on

time sequence of action

Data mining process

Problem Definition

Creating Database

Exploring database

Preparation for creating a data mining model

Building Data Mining Model

Evaluation Phase

Deploying the Data Mining model

Fig. General Phases of Data Mining Process

Who needs data mining?

Whoever has information fastest and uses it wins Don McKeough former president of Coke Cola

Businesses are looking for new ways to let end users find the data they need to:

make decisions Serve customers Gain the competitive edge

Applications

Business analysis and management Computer security Customer relationships analysis and management Telecommunication analysis and management News and entertainment Bioinformatics and Healthcare analysis

Summary

Need of data mining Data mining models Process of data mining Some applications

Data Warehousing

Data Warehousing Data Warehouse

What is Data Warehouse? Database & Data Warehouse.

How to distinguish? Purpose

Database : Transactional Data Warehouse :Intended for Decision Supporting

Applications. Functionality

Optimized for data retrieval, not routine transaction processing.

Structure Performance

Data Warehousing Modern Organization’s needs ?

Companies spread world wide. Have

So many Data Sources Different Operational Systems Different Schemas

Need Data for Complex Analysis Knowledge Discovery Decision Making.

Solution ???

Data Warehousing Solution…Data Warehouse. Data Warehouse . Definition ??

No single definition…. Data Warehouse

Collection of Information gathered from multiple sources, stored under unified schema, at a single site & mainly intended for decision support applications.

A subject oriented, integrated, nonvolatile, time-variant, collection of data in support of management’s decision. ~ W.H. Inmon

Warehouses are Very Large Databases

35%

30%

25%

20%

15%

10%

5%

0%5GB

5-9GB

10-19GB 50-99GB 250-499GB

20-49GB 100-249GB 500GB-1TB

Init ialProjected 2Q96

Source: META Group, Inc.

Res

pond

ents

Data Warehousing Data Warehouse - Architecture

Data Warehousing Data Warehouse building

When & how to gather data Source-driven architecture Destination-driven architecture

What schema to use Data Cleansing

Task of correcting and processing data How to propagate updates What data to summarize And many more……

Summary What is Data Warehousing? Data Warehouse. Data Warehouse – Architecture Data Warehouse vs. Data Mining

Conclusion Your data is full of undiscovered gems;

start digging!

References Data Mining Introductory and advanced Topics Margaret H. Dunham Modern Data Warehousing, Mining, and visualization

George M. Marakas Data Mining

BPB Publications Database System Concepts Silbershatz, Korth,

Sudarshan www.statoo.info/ www.crm2day.com/ www.tri l l iumsoftware.com/