Data Warehouse and Data Mining
-
Upload
ranak-ghosh -
Category
Education
-
view
528 -
download
2
Transcript of Data Warehouse and Data Mining
Historical Perspective
1960s: Data collection, database creation, IMS
and network DBMS 1970s:
Relational data model, relational DBMS implementation
1980s: RDBMS, advanced data models (extended-
relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientif ic, engineering, etc.)
1990s—2000s: Data mining and data warehousing,
mult imedia databases, and Web databases
Definition
Data mining automates the process of locating and extracting the hidden patterns and knowledge
In simple words Searching for new knowledge
Why we need data mining
Data explosion problem
Automated data collection tools and mature database technology
lead to tremendous amounts of data stored in databases, data
warehouses and other information repositories
We are drowning in data, but starving for knowledge!
Solution: Data mining
Data warehousing and on-line analytical processing
Extraction of interesting knowledge (rules, regularities, patterns,
constraints) from data in large databases
Predictive Model
Prediction determining how certain attributes will behave in the future
Regression mapping of data item to real valued prediction variable
Classification categorization of data based on combinations of attributes
Time Series analysis examining values of attributes with respect to time
Descriptive Model
Clustering most closely data clubbed together into clusters
Data Summarization extracting representative information about database
Association Rules associativity defined between data items to form relationship
Sequence Discovery it is used to determine sequential patterns in data based on
time sequence of action
Data mining process
Problem Definition
Creating Database
Exploring database
Preparation for creating a data mining model
Building Data Mining Model
Evaluation Phase
Deploying the Data Mining model
Fig. General Phases of Data Mining Process
Who needs data mining?
Whoever has information fastest and uses it wins Don McKeough former president of Coke Cola
Businesses are looking for new ways to let end users find the data they need to:
make decisions Serve customers Gain the competitive edge
Applications
Business analysis and management Computer security Customer relationships analysis and management Telecommunication analysis and management News and entertainment Bioinformatics and Healthcare analysis
Data Warehousing Data Warehouse
What is Data Warehouse? Database & Data Warehouse.
How to distinguish? Purpose
Database : Transactional Data Warehouse :Intended for Decision Supporting
Applications. Functionality
Optimized for data retrieval, not routine transaction processing.
Structure Performance
Data Warehousing Modern Organization’s needs ?
Companies spread world wide. Have
So many Data Sources Different Operational Systems Different Schemas
Need Data for Complex Analysis Knowledge Discovery Decision Making.
Solution ???
Data Warehousing Solution…Data Warehouse. Data Warehouse . Definition ??
No single definition…. Data Warehouse
Collection of Information gathered from multiple sources, stored under unified schema, at a single site & mainly intended for decision support applications.
A subject oriented, integrated, nonvolatile, time-variant, collection of data in support of management’s decision. ~ W.H. Inmon
Warehouses are Very Large Databases
35%
30%
25%
20%
15%
10%
5%
0%5GB
5-9GB
10-19GB 50-99GB 250-499GB
20-49GB 100-249GB 500GB-1TB
Init ialProjected 2Q96
Source: META Group, Inc.
Res
pond
ents
Data Warehousing Data Warehouse building
When & how to gather data Source-driven architecture Destination-driven architecture
What schema to use Data Cleansing
Task of correcting and processing data How to propagate updates What data to summarize And many more……
Summary What is Data Warehousing? Data Warehouse. Data Warehouse – Architecture Data Warehouse vs. Data Mining