8/12/2019 CH1 Data Warehouse Design
1/33
Copyright 2005, Oracle. All rights reserved.
Data Warehouse Design
8/12/2019 CH1 Data Warehouse Design
2/33
1-2 Copyright 2005, Oracle. All rights reserved.
Objectives
After completing this lesson, you should be able to do
the following:
Differentiate OLTP and data warehousing design
techniques Describe effective data warehouse design
Identify data warehousing schemas
Explain implementation models
List data warehousing objects
8/12/2019 CH1 Data Warehouse Design
3/33
1-3 Copyright 2005, Oracle. All rights reserved.
Characteristics of a Data Warehouse
A data warehouse is a database designed for
querying, reporting, and analysis.
A data warehouse contains historical data derived
from transaction data. Data warehouses separate analysis workload from
transaction workload.
A data warehouse is primarily
an analytical tool.
8/12/2019 CH1 Data Warehouse Design
4/33
1-4 Copyright 2005, Oracle. All rights reserved.
Comparing OLTP and Data Warehouses
OLTP
Many
Comparativelylower
NormalizedDBMS
Rare
Some
Largeamount
DenormalizedDBMS
Common
DataWarehouse
Data accessed
by queries
Joins
Duplicated
data
Derived dataand
aggregates
8/12/2019 CH1 Data Warehouse Design
5/33
1-6 Copyright 2005, Oracle. All rights reserved.
Data Warehouse Architectures
Basic Data
Warehouse
Analysis
Reporting
Data mining
Operational
systems
Flat files
Materialized
views
Metadata
Raw data
8/12/2019 CH1 Data Warehouse Design
6/33
1-7 Copyright 2005, Oracle. All rights reserved.
Data Warehouse Architectures
Data Warehouse
with Staging Area
Analysis
Reporting
Data miningFlat files
Materialized
views
Metadata
Raw data
Operationalsystems
Stagingarea
8/12/2019 CH1 Data Warehouse Design
7/331-8 Copyright 2005, Oracle. All rights reserved.
Data Warehouse Architectures
Data Warehouse
with Staging Area
Reporting
Data miningFlat files
Materialized
views
Metadata
Rawdata
Operationalsystems
Stagingarea
Sales
Purchasing
Inventory
Analysis
8/12/2019 CH1 Data Warehouse Design
8/331-9 Copyright 2005, Oracle. All rights reserved.
Data Warehouse Design
Key data warehouse design considerations:
Identify the specific data content.
Recognize the critical relationships within and
between groups of data. Define the system environment
supporting your data warehouse.
Identify the required data
transformations.
Calculate the frequency at which
the data must be refreshed.
8/12/2019 CH1 Data Warehouse Design
9/331-10 Copyright 2005, Oracle. All rights reserved.
Logical Design
A logical design is conceptual and
abstract.
Entity-relationship (ER) modeling
is useful in identifying logicalinformation requirements.
An enti tyrepresents a chunk of data.
The properties of entities are known as attr ibutes.
The links between entities and attributes are known
as re lat ionships.
Dimensional modeling is a specialized
type of ER modeling useful in data warehouse
design.
8/12/2019 CH1 Data Warehouse Design
10/331-12 Copyright 2005, Oracle. All rights reserved.
Oracle Warehouse Builder
Oracle Database provides tools to implement the
ETL process.
Oracle Warehouse Builder is a tool to help in this
process.
Oracle Warehouse Builder generates the following
types of code:
SQL data definition language (DDL) scripts
PL/SQL programs
SQL*Loader control files
XML Processing Description Language (XPDL)
ABAP code (used to extract data from SAP
systems)
8/12/2019 CH1 Data Warehouse Design
11/331-13 Copyright 2005, Oracle. All rights reserved.
Data Warehousing Schemas
Objects can be arranged in data warehousing
schema models in a variety of ways:
Star schema
Snowflake schema
Third normal form (3NF) schema
Hybrid schemas
The source data model and user
requirements should steer the data
warehouse schema.
Implementation of the logical model may require
changes to enable you to adapt it to your physical
system.
8/12/2019 CH1 Data Warehouse Design
12/331-14 Copyright 2005, Oracle. All rights reserved.
Schema Characteristics
Star schema
Characterized by one or more large fact tables and
a number of much smaller dimension tables
Each dimension table joined to the fact table using
a primary key to foreign key join
Snowflake schema
Dimension data grouped into multiple tables
instead of one large table
Increased number of dimension tables, requiringmore foreign key joins
Third normal form (3NF) schema
A classical relational-database model that
minimizes data redundancy through normalization
8/12/2019 CH1 Data Warehouse Design
13/331-16 Copyright 2005, Oracle. All rights reserved.
Data Warehousing Objects
Fact tables
Fact tables are the large tables that store business
measurements.
Dimension tables A dimension is a structure composed of one or
more hierarchies that categorizes data.
Unique identifiers are specified for one distinct
record in a dimension table.
Relationships Relationships guarantee
integrity of business
information.
8/12/2019 CH1 Data Warehouse Design
14/331-17 Copyright 2005, Oracle. All rights reserved.
Fact Tables
A fact table must be defined for each star schema.
Fact tables are the large tables that store business
measurements.
A fact table contains either detail-level oraggregated facts.
A fact table usually contains facts with the same
level of aggregation.
The primary key of the fact table is
usually a composite key made up
of all its foreign keys.
8/12/2019 CH1 Data Warehouse Design
15/331-18 Copyright 2005, Oracle. All rights reserved.
Dimensions and Hierarchies
A dimension is a structure
composed of one or more
hierarchies that categorizes data.
Dimensional attributes help to
describe the dimensional value.
Dimension data is collected at the
lowest level of detail and aggregated
into higher level totals.
Hierarchies are structures that useordered levels to organize data.
In a hierarchy, each level is
connected to the levels above and
below it.
STATE
COUNTRY
SUBREGION
REGION
CUSTOMERSdimens ion
hierarch y (by level)
CITY
CUSTOMER
8/12/2019 CH1 Data Warehouse Design
16/331-19 Copyright 2005, Oracle. All rights reserved.
Dimensions and Hierarchies
Dimension table Dimension table
TIMES CHANNELS
CUSTOMERS#cust_idcust_last_name
cust_city
cust_state_province
PRODUCTS#prod_id
Fact table
PROMOTIONS
Dimension table
SALEScust_idprod_id Hierarchy
Unique identifier
Relationship
8/12/2019 CH1 Data Warehouse Design
17/331-20 Copyright 2005, Oracle. All rights reserved.
Physical Design
Relationships
Uniqueidentifiers
Attributes
Entities Tables
Integrityconstraints
- Primary key- Foreign key- Not null
Columns
Indexes
Materializedviews
Dimensions
Logical Physical (Tablespaces)
8/12/2019 CH1 Data Warehouse Design
18/331-21 Copyright 2005, Oracle. All rights reserved.
Data Warehouse Physical Structures
Tables and partitioned tables
Partitioned tables enable you to split
large data volumes into smaller,
more manageable pieces.
Expect performance benefits from:
Partition pruning
Intelligent parallel processing
Compressed tables offer scaleup opportunities for
read-only operations.
Table compression saves disk space.
8/12/2019 CH1 Data Warehouse Design
19/331-22 Copyright 2005, Oracle. All rights reserved.
Data Warehouse Physical Structures
Views:
Are tailored presentations of data contained in oneor more tables or views
Do not require any space in the database
Materialized views:
Are query results that have been stored in advance
(Like indexes) are used transparently and improveperformance
Integrity constraints: Are used in data warehouses for query rewrite
Dimensions:
Are containers of logical relationships and do notrequire any space in the database
8/12/2019 CH1 Data Warehouse Design
20/331-23 Copyright 2005, Oracle. All rights reserved.
Managing Large Volumes of Data
Work smarterin your data warehouse:
Partitioning
Bitmap indexes/Star transformation
Data compression
Query rewrite
Work harderin your data warehouse:
Parallelism for all operations
DBA tasks, such as loading, index creation, tablecreation, data modification, backup and recovery
End-user operations, such as queries
Unbounded scalability: Real Application Clusters
8/12/2019 CH1 Data Warehouse Design
21/331-24 Copyright 2005, Oracle. All rights reserved.
I/O Performance in Data Warehouses
I/O is typically the primary determinant of data
warehouse performance.
Data warehouse storage configurations should be
chosen by I/O bandwidth, not storage capacity.
Every component of the I/O
subsystem should provide
enough bandwidth:
Disks
I/O channels
I/O adapters
In data warehouses, maximizing
sequential I/O throughput is critical.
8/12/2019 CH1 Data Warehouse Design
22/331-25 Copyright 2005, Oracle. All rights reserved.
Performance of Sequential I/Os
In data warehouses, drive arrays generally seerandom large I/Os (1 MB) spread across thedevices.
This is known as multiuser sequential workload.
The host operating system, device drivers, orstorage array may fracture large I/Os into smallerI/Os.
It is common in default Linux configurations to
fracture large I/Os into smaller ones (up to 32 KB). This level of I/O fracturing can have a disastrous
effect on the total throughput.
The implementation of query rewrite has a positiveeffect on minimizing I/O requests.
8/12/2019 CH1 Data Warehouse Design
23/331-26 Copyright 2005, Oracle. All rights reserved.
SELECT sum(sales_amount)FROM salesWHERE sales_dateBETWEEN 01-MAR-2005 AND 31-MAY-2005;
Minimizing I/O Requests
Only the relevant partitions are accessed.
Optimizer knows or finds the relevant
partitions.
Static pruning uses known values in advance.
Dynamic pruning uses internal recursive SQL
to find the relevant partitions.
It provides order of magnitude performance
gains.
Part i t ion p runing
SALES
2005-JAN
2005-FEB
2005-MAR
2005-APR
2005-MAY
2005-JUN
8/12/2019 CH1 Data Warehouse Design
24/331-27 Copyright 2005, Oracle. All rights reserved.
Minimizing I/O Requests
Bitmap indexes are usually 3 to 20 times
smaller than B-tree indexes.
They are ideal for set-based operations.
Star transformation uses bitmap indexes to
identify base table records of interest.
Full table access is replaced with bitmap
index access.
Bitmap indexes minimize I/O.
Bitmap indexes
8/12/2019 CH1 Data Warehouse Design
25/33
8/12/2019 CH1 Data Warehouse Design
26/33
1-30 Copyright 2005, Oracle. All rights reserved.
I/O Scalability
Reduces response time for data-intensive operationson large databases
Benefits systems with the following characteristics:
Multiprocessors, clusters, or massively parallel systems
Sufficient I/O bandwidth
Sufficient memory to support memory-intensiveprocesses such as sorts, hashing, and I/O buffers
Data on disk
Query serversCoordinator
Dispatchwork
Sort Q4
Sorters (Aggregators)Scanners
Paral lel execu tion:
Sort Q3
Sort Q2
Sort Q1Scan
Scan
Scan
Scan
8/12/2019 CH1 Data Warehouse Design
27/33
1-31 Copyright 2005, Oracle. All rights reserved.
I/O Scalability
Au tom atic Storage Management (ASM)
Configuring storage for a DB depends on many
variables:
Which data to put on which disk Logical unit number (LUN) configurations
DB types and workloads; data warehouse, OLTP,
DSS
Trade-offs between available options
ASM provides solutions to storage issuesencountered in data warehouses.
8/12/2019 CH1 Data Warehouse Design
28/33
1-32 Copyright 2005, Oracle. All rights reserved.
I/O Scalability
Au tom atic Storage Management: Overview
Portable and high-performance
cluster file system
Manages Oracle database files Data spread across disks
to balance load
Integrated mirroring across
disks
Solves many storage
management challenges
ASM
File
system
Volume
manager
Operating system
Application
Database
8/12/2019 CH1 Data Warehouse Design
29/33
1-33 Copyright 2005, Oracle. All rights reserved.
I/O Scalability
ASM benefi ts
Stripes files rather thanlogical volumes
Online disk reconfigurationand dynamic rebalancing
Provides redundancy on afile basis
Automatic database filemanagement
EM-based graphicalmanagement interface
Hot spots and manual I/Otuning eliminated
8/12/2019 CH1 Data Warehouse Design
30/33
1-34 Copyright 2005, Oracle. All rights reserved.
I/O Scalability
Real App l icat ion Clusters
Real Application Clusters (RAC) provides linear
scalability and availability for data warehouses.
RAC provides redundancy so that if a node goesdown, the other nodes will continue to execute.
RAC nodes can share all work equally or perform
dedicated tasks such as ETL or query processing.
8/12/2019 CH1 Data Warehouse Design
31/33
1-35 Copyright 2005, Oracle. All rights reserved.
Typical Data Warehouse Cluster
16-port switch
16-port switch
1 Gigabit Ethernet interconnects
Sixteen storage arrays,
each with 1020 disks
Four nodes, each
with four 2 GHz
CPUs
8/12/2019 CH1 Data Warehouse Design
32/33
1-36 Copyright 2005, Oracle. All rights reserved.
Parallel Execution with RAC
Execution slaves have node affinity with the execution
coordinator, but will expand if needed.
Executioncoordinator
Parallel
execution
server
Shared disks
Node 4Node 1 Node 2 Node 3
8/12/2019 CH1 Data Warehouse Design
33/33
Summary
In this lesson, you should have learned how to:
Differentiate OLTP and data warehousing design
techniques
Describe effective data warehouse design Identify data warehousing schemas
Explain implementation models
List data warehousing objects