Week2
-
Upload
adimas-wisnu-wardana -
Category
Documents
-
view
216 -
download
0
description
Transcript of Week2
2
What is Data Warehouse?
• Defined in many different ways, but not rigorously.• A decision support database that is maintained separately from the
organization’s operational database• Support information processing by providing a solid platform of
consolidated, historical data for analysis.
• “A single, complete, and consistent source of data obtained from a variety of sources and made available to end users in a way that they can understand and use in business context” – Barry Devlin
• “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.”—W. H. Inmon
• Data warehousing:• The process of constructing and using data warehouses
3
DW – Subject Oriented
Operational Systems Data Warehouse System
Sales system
Payroll system
Purchasing system
Customer data
Employee data
Vendor data
4
DW – Subject Oriented• Oriented to the major subject areas of the organization
defined in the data model.• Insurance company: customer, product, claim, account, etc
• Operational database organized differently• Based on type of insurance: auto, life, medical, etc
• Giving information about a particular subject rather than the details regarding the on-going operations of the company
• Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing.
5
DW – Integrated
Operational Systems Data Warehouse System
Marketing system
Order system
Billing system
Customer data
6
DW - Integrated• Constructed by integrating multiple, heterogeneous data
sources• relational databases, flat files, on-line transaction
records• Data cleaning and data integration techniques are
applied.• Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data sources• E.g., Hotel price: currency, tax, breakfast covered, etc.
• When data is moved to the warehouse, it is converted.
7
DW – Time Variant
Operational Systems Data Warehouse System
60-90 days 5-10 years
Order system Customer data
8
DW – Time Variant• The time horizon for the data warehouse is significantly
longer than that of operational systems.• Operational database: current value data.• Data warehouse data: provide information from a
historical perspective (e.g., past 5-10 years)• Every key structure in the data warehouse
• Contains an element of time, explicitly or implicitly
9
DW – Non Volatile
Operational Systems Data Warehouse System
Order system
Customer data
create
insert
deleteupdate
load access
10
DW – Non Volatile• A physically separate store of data transformed from the
operational environment.
• Operational update of data does not necessarily occur in the data warehouse environment.
• Often requires only two operations in data accessing: • initial loading of data and access of data.
11
Whyuse Data Warehouse?
• “We collect tons of data, but we can’t access it.”• “We need to slice and dice the data every which way.”• “Business people need to get at data easily.”• “Just show me what is important.”• “We spend entire meeting arguing about who has the right
number rather than making decisions.”• “We want people to use information to support more fact-
based decision making.”
12
The Data Related Problem• Data in organizations often has the following
characteristics:• Massive volume• Dispersed• Difficult to access• Badly integrated• Complex data structures• Not suitable for high level business queries
13
The Information Needs Behind the Data Warehouse
• Organization need information which is:• More holistic in its coverage of the business• Selected and enriched• Easily accessible• More easily understandable• High quality• Directly applicable to the decision situation
14
Data Sources• Production data
Data from transactional processes• Internal data
Spreadsheets, document, customer profiles, transactional databases
• Archived data• External data
Data from external system
15
Data Warehouse vs. Heterogeneous DBMS
• Traditional heterogeneous DB integration: • Build wrappers/mediators on top of heterogeneous databases • Query driven approach
• A query posed to a client site is translated into queries appropriate for individual heterogeneous sites; The results are integrated into a global answer set
• Involving complex information filtering• Competition for resources at local sources
• Data warehouse: update-driven, high performance• Information from heterogeneous sources is integrated in advance
and stored in warehouses for direct query and analysis
16
The integration problem
17
The Integrated Data Warehouse
DataWarehouse
18
Data Warehouse vs. Operational DBMS
• OLTP (on-line transaction processing)• Major task of traditional relational DBMS• Day-to-day operations: purchasing, inventory, banking,
manufacturing, payroll, registration, accounting, etc.• OLAP (on-line analytical processing)
• Major task of data warehouse system• Data analysis and decision making
• Distinct features (OLTP vs. OLAP):• User and system orientation: customer vs. market• Data contents: current, detailed vs. historical, consolidated• Database design: ER + application vs. star + subject• View: current, local vs. evolutionary, integrated• Access patterns: update vs. read-only
19
OLTP VS OLAP OLTP OLAP users clerk, IT professional knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current, up-to-date
detailed, flat relational isolated
historical, summarized, multidimensional integrated, consolidated
usage repetitive ad-hoc access read/write
index/hash on prim. key lots of scans
unit of work short, simple transaction complex query # records accessed tens millions #users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response
20
Why Separate Data Warehouse?
• High performance for both systems• DBMS— tuned for OLTP: access methods, indexing,
concurrency control, recovery• Warehouse—tuned for OLAP: complex OLAP queries,
multidimensional view, consolidation.
• Different functions and different data:• Missing data: Decision support requires historical data which
operational DBs do not typically maintain• Data consolidation: Decision Support requires consolidation
(aggregation, summarization) of data from heterogeneous sources
• Data quality: Different sources typically use inconsistent data representations, codes and formats which have to be reconciled
21
Requirement• The DW system must make information easily accessible.• The DW system must present information consistently.• The DW system must adapt to change• The DW system must present information in a timely way• The DW system must be a secure bastion that protect the
information assets• The DW system must serve as the authoritative and
trustworthy foundation for improved decision making.• The business community must accept the DW system to
deem it successful.
22
23
What to learn next?• Multi-dimensional data model
• Cube• Scheme
• Architecture DW
24
Individual AssignmentCreate a report that explain: (A4 page)
• Multidimensional data model (cube, fact table, dimension, etc)• Scheme (star, snowflake, etc)• Architecture data warehouse system
• Print the report, and bring it to next class.