Data Warehousing & Data Mining Slides
Transcript of Data Warehousing & Data Mining Slides
-
7/29/2019 Data Warehousing & Data Mining Slides
1/23
GROUP NO - 6
-
7/29/2019 Data Warehousing & Data Mining Slides
2/23
Introduction
Data Warehousing and Data Mining :
WHAT and WHY ?
-
7/29/2019 Data Warehousing & Data Mining Slides
3/23
What is a Data Warehouse?
A single, complete andconsistent store of dataobtained from a variety of
different sources madeavailable to end users inwhat they can understandand use in a businesscontext.
-
7/29/2019 Data Warehousing & Data Mining Slides
4/23
Data Warehousing --It is a process
Technique for assembling andmanaging data from varioussources for the purpose of
answering business questions.Thus making decisions that werenot previously possible
A decision support database
maintained separately from theorganizations operationaldatabase
-
7/29/2019 Data Warehousing & Data Mining Slides
5/23
A Producer wants to know
Which are ourlowest/highest margin
customers ?
Who are my customersand what products
are they buying?
Which customers
are most likely to goto the competition ?
What impact willnew products/services
have on revenue
and margins?
What product prom-
-otions have the biggestimpact on revenue?
What is the mosteffective distribution
channel?
-
7/29/2019 Data Warehousing & Data Mining Slides
6/23
What are the userssaying...
Data should be integratedacross the enterprise
Summary data has a realvalue to the organization
Historical data holds the key
to understand data over time
-
7/29/2019 Data Warehousing & Data Mining Slides
7/23
Problems
I cant find the data I need Data is scattered over the network
Many versions, subtle differences
I cant get the data I need
Need an expert to get the data
I cant understand the data I found
Available data poorly documented
I cant use the data I found
Results are unexpected
Data needs to be transformedfrom one form to other
-
7/29/2019 Data Warehousing & Data Mining Slides
8/23
What is Data Warehousing ?
A process of transformingdata into information andmaking it available tousers in a timely enoughmanner to make adifference.
Data
Information
-
7/29/2019 Data Warehousing & Data Mining Slides
9/23
Evolution
60s: Batch reports hard to find and analyze information
inflexible and expensive, reprogram every new request
70s: Terminal-based DSS and EIS (executive
information systems) still inflexible, not integrated with desktop tools
80s: Desktop data access and analysis tools query tools, spreadsheets, GUIs
easier to use, but only access operational databases
90s: Data warehousing with integrated OLAPengines and tools
-
7/29/2019 Data Warehousing & Data Mining Slides
10/23
Very Large Data Bases
Terabytes -- 10^12 bytes:
Petabytes -- 10^15 bytes:
Exabytes -- 10^18 bytes:
Zettabytes -- 10^21 bytes:
Zottabytes -- 10^24 bytes:
Walmart -- 24 Terabytes
Geographic Information
SystemsNational Medical Records
Weather images
Intelligence Agency Videos
-
7/29/2019 Data Warehousing & Data Mining Slides
11/23
Data Warehouse
A data warehouse is a
subject-oriented
integrated
time-varying
non-volatile
Collection of data that is used primarily in
organizational decision making.
-
7/29/2019 Data Warehousing & Data Mining Slides
12/23
Explorers, Farmers andTourists
Farmers: Harvest informationfrom known access paths.
Explorers: Seek out the unknownand previously unsuspectedrewards hiding in the detailed data.
Tourists: Browse informationharvested by farmers .
-
7/29/2019 Data Warehousing & Data Mining Slides
13/23
Data Warehouse forDecision Support
Putting Information technology to help the
knowledge worker make faster and better
decisionsWhich of my customers are most likely to go to
the competition?
What product promotions have the biggest
impact on revenue?
How did the share price of software companies
correlate with profits over last 10 years?
-
7/29/2019 Data Warehousing & Data Mining Slides
14/23
Decision Support
Used to manage and control business
Data is historical or point-in-time
Optimized for inquiry rather than update
Use of the system is loosely defined and
can be ad-hoc
Used by managers and end-users to
understand the business and make
judgements
-
7/29/2019 Data Warehousing & Data Mining Slides
15/23
Data Mining works withWarehouse Data
Data Warehousing provides theEnterprise with a memory.
Data Mining provides the Enterprise withintelligence.
-
7/29/2019 Data Warehousing & Data Mining Slides
16/23
Problems
Given a database of 100,000 names, which persons are theleast likely to default on their credit cards?
Which types of transactions are likely to be fraudulent giventhe demographics and transactional history of a particularcustomer?
If I raise the price of my product by Rs. 2, what is the effecton my ROI?
If I offer only 2,500 airline miles as an incentive to purchaserather than 5,000, how many lost responses will result?
If I emphasize ease-of-use of the product as opposed to itstechnical capabilities, what will be the net effect on myrevenues?
Which of my customers are likely to be the most loyal?
Data Mining helps extract such information
-
7/29/2019 Data Warehousing & Data Mining Slides
17/23
Areas of Application
Industry Application
Finance Credit Card Analysis
Insurance Claims, Fraud AnalysisTelecommunication Call record analysis
Transport Logistics management
Consumer goods promotion analysis
Data Service providers Value added data
Utilities Power usage analysis
-
7/29/2019 Data Warehousing & Data Mining Slides
18/23
Data Mining in Use
The US Government uses Data Mining totrack fraud
Basketball teams use it to track game
strategy Warranty Claims Routing
Holding on to Good Customers
Weeding out Bad Customers
-
7/29/2019 Data Warehousing & Data Mining Slides
19/23
What makes data miningpossible?
Advances in the following areas aremaking data mining deployable:
data warehousing
better and more data (i.e., operational,behavioral, and demographic)
the emergence of easily deployed datamining tools and
the advent of new data mining techniques.
-- Gartner Group
-
7/29/2019 Data Warehousing & Data Mining Slides
20/23
Difference between data
Mining and Data Warehousing
Data Mining- Data Warehousing-
Data mining is theprocess of findingpatterns in a given
data set.
Data warehousing can besaid to be the processof centralizing oraggregating data from
multiple sources into onecommon repository.
-
7/29/2019 Data Warehousing & Data Mining Slides
21/23
Difference between Data
Mining and Data WarehousingData Mining Data Warehousing
Men bought diapers on
Thursdays and Saturdays,they also had a strongtendency to buy beer. Thegrocery store could haveused this valuableinformation to increasetheir profits. This is datamining in actionextracting meaningful data
from a huge data set.
Facebook basically
gathers all of your datayour friends, your likes,who you stalk, etc andthen stores that data into
one central repository.
-
7/29/2019 Data Warehousing & Data Mining Slides
22/23
Difference between Data
Mining and Data Warehousing
Datamining Data warehousing
-
7/29/2019 Data Warehousing & Data Mining Slides
23/23
Difference between Data
Mining and Data Warehousing
Data mining Data warehousing